Will an LLM solve this integral by end of 2025 without plugins?
21
410Ṁ1172
resolved Apr 23
Resolved as
42%

Will an LLM be able to solve this integral (inputted as an image) by the end of 2025 without using plugins? (Note: Wolfram Alpha can already solve this).

I made this problem up myself, so it is unlikely to appear in training corpuses with the answer.


It has to be a publicly available general purpose LLM, not just one someone trained on this problem specifically to win the market or anything.

To avoid conflicts of interest, I will not bet in this market.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ23
2Ṁ13
3Ṁ4
4Ṁ0
5Ṁ0
Sort by:

In light of Manifold's "pivot", I don't feel comfortable using this site, so I'm leaving and resolving all my open markets to current probability. Sorry everyone.

"It has to be a publicly available general purpose LLM, not just one someone trained on this problem specifically to win the market or anything."

What counts as "or anything"?

Can one be trained on similar integrals? On integrals in general? On calculus problems in general? etc

@euclaise The intent of the question is to capture mainstream LLMs like GPT, Bard, Claude, Llama, etc. but exclude finetunings of these that are done just to win this market in an uninteresting way. I guess there's a lot of grey area in between. What do you have in mind?

@ThisProfileDoesntExist How about finetunes like MetaMath/Mammoth/Goat?

@euclaise I'm not familiar with those. When I search for MetaMath, it seems like just a proof language, not an LLM. But in general, anything created before this market or not created for the purposes of winning this market should be fine.

Similar markets on GPT-4V's performance on math questions:

Just tested Bing. It didn't even attempt to solve the problem. I expect that it will be very a long time before LLMs can solve this, but I figured I might as well check.

Does it have to be able to realize that you most likely meant $\ln x$ rather than $l \times n \times x$ from the picture or is it okay if it interprets it as the latter (or pretends to as a smart-alek pedant)? :-)

@ArmandodiMatteo If it somehow failed to correctly interpret the notation, that would count as a failure as well. I doubt that would happen though.

This would include an image to text transformation. LLMs per definition cannot do that. So I assume that the LLM is allowed to outsource that step to an image processing module, but it is not allowed to outsource the maths task to a maths module?

<deleted>

© Manifold Markets, Inc.TermsPrivacy