Does Wolfram significantly improve GPT-4's MATH performance (more than 10%)?
15
390Ṁ518
resolved Jan 1
Resolved
N/A

Current SotA is 54%. Does GPT-4 with Wolfram score >= 64%?

If the May 2023 version of GPT-4 with Wolfram becomes unavailable before anyone conducts this test, this question resolves N/A.

Get
Ṁ1,000
to start trading!
Sort by:

https://arxiv.org/pdf/2308.07921v1.pdf finds a 13%-ish improvement using code interpreter from a skim over it. That's with a significantly updated model compared to GPT-4 at time of question writing.

I've sold my stake in anticipation of having to resolve this question N/A -- to avoid conflict of interests. OAI has not specified when, but the docs specify the gpt-4-0314 may be removed at any time.

@JacobPfau FWIW my credence in a similar strategy to the above linked paper getting >10% performance boost out of gpt-4-0314 using Wolfram is around 50%.

Relevant previous work:
https://arxiv.org/pdf/2305.12524.pdf
https://arxiv.org/pdf/2211.12588.pdf
https://arxiv.org/pdf/2211.10435.pdf
AFAICT from skimming, none used wolfram for intermediate steps. Mostly Python. Also none evaluate on MATH.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules