Does Wolfram significantly improve GPT-4's MATH performance (more than 10%)? | Manifold

Does Wolfram significantly improve GPT-4's MATH performance (more than 10%)?

15

390Ṁ518

resolved Jan 1

Resolved

N/A

1H

6H

1D

1W

1M

ALL

Current SotA is 54%. Does GPT-4 with Wolfram score >= 64%?

If the May 2023 version of GPT-4 with Wolfram becomes unavailable before anyone conducts this test, this question resolves N/A.

Technical AI Timelines

New Year's Resolutions 2024

Get

1,000

to start trading!

Sort by:

https://arxiv.org/pdf/2308.07921v1.pdf finds a 13%-ish improvement using code interpreter from a skim over it. That's with a significantly updated model compared to GPT-4 at time of question writing.

I've sold my stake in anticipation of having to resolve this question N/A -- to avoid conflict of interests. OAI has not specified when, but the docs specify the gpt-4-0314 may be removed at any time.

@JacobPfau FWIW my credence in a similar strategy to the above linked paper getting >10% performance boost out of gpt-4-0314 using Wolfram is around 50%.

Relevant previous work:
https://arxiv.org/pdf/2305.12524.pdf
https://arxiv.org/pdf/2211.12588.pdf
https://arxiv.org/pdf/2211.10435.pdf
AFAICT from skimming, none used wolfram for intermediate steps. Mostly Python. Also none evaluate on MATH.

People are also trading

Will Gemini outperform GPT-4 at mathematical theorem-proving?

What will the aggregate improvement of GPT5 be over GPT4 in terms of metrics?

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

Will OpenAI's next major LLM (after GPT-4) surpass 74% accuracy on the GPQA benchmark?

Will OpenAI's next major LLM (after GPT-4) surpass 70% accuracy on the GPQA benchmark?

Will the performance jump from GPT4->GPT5 be less than the one from GPT3->GPT4?

Will any open source LLM with <20 billion parameters outperform GPT-4 on most language benchmarks by the end of 2024?

What is the main reason behind GPT-4o speed improvement relative to GPT-4 base model?

Will GPT-4 be trained (roughly) compute-optimally using the best-known scaling laws at the time?

Related questions

Will Gemini outperform GPT-4 at mathematical theorem-proving?

What will the aggregate improvement of GPT5 be over GPT4 in terms of metrics?

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

Will OpenAI's next major LLM (after GPT-4) surpass 74% accuracy on the GPQA benchmark?

Will OpenAI's next major LLM (after GPT-4) surpass 70% accuracy on the GPQA benchmark?

Will the performance jump from GPT4->GPT5 be less than the one from GPT3->GPT4?

Will any open source LLM with <20 billion parameters outperform GPT-4 on most language benchmarks by the end of 2024?

What is the main reason behind GPT-4o speed improvement relative to GPT-4 base model?

Will GPT-4 be trained (roughly) compute-optimally using the best-known scaling laws at the time?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules