Will an LLM be able to solve confusing but elementary geometric reasoning problems in 2024? (strict LLM version)

1kṀ27k

resolved Jan 2

Resolved

ALL

This is a variant of the following market:

https://manifold.markets/dreev/will-an-llm-be-able-to-solve-confus

In this version, the problem has to be solved purely by the LLM itself.

Open question: Does GPT-o1 count as "strictly an LLM"? Seems super ambiguous to me. I've sold my shares in this market so I can just make a judgment call. The default is yes, it counts, but if I hear a compelling counterargument in the comments, I'll make an update.

LLMs

Math

Mathematics

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ690
2		Ṁ536
3		Ṁ329
4		Ṁ328
5		Ṁ155

People are also trading

In 2025, will I be able to play Civ against an LLM?

9% chance

Will an LLM beat me in a game of chess by the end of 2027?

68% chance

Will an LLM be able to solve Raven's Progressive Matrices from an image in 2025?

30% chance

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

66% chance

Will an LLM consistently create 5x5 word squares by 2026?

85% chance

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

21% chance

Can LLM generate a Lonpos puzzle solution before the end of 2025?

25% chance

Will LLMs be the best reasoning models on these dates?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

63% chance

Will an LLM do a task that the user hadn't requested in a notable way before 2026?

Sort by:

sold Ṁ768 NO

I've just sold my position in this market. Hopefully that means I can just make the judgment call on, assuming ChatGPT o1 can do this, whether that counts as being solved "purely by the LLM itself". Can I hear people's arguments for and against?

Sanity check: if GPT-o1 were to pull this off in time, that still counts as a strict LLM, right?

@dreev i think so

bought Ṁ250 NO

@dreev this will be increasingly challenging as more and more models are integrated into a single system, which is in part why I don't bet much on the "Will LLMs do [task] by [future year]?" markets, but yeah, I think it's reasonable to call GPT-o1 an LLM for the rest of 2024.

@Jacy Thank you, that makes a ton of sense. I shall avoid trying to single out LLMs in the future and hope that this one won't turn out too painful to adjudicate over the remaining 3 months in 2024. If anyone has any counterpoints about GPT-o1, chime in! (Not that it matters so far, with GPT-o1 failing our flagship geometric reasoning problem so far, but it does seem to be getting closer... 😬)

bought Ṁ100 YES

@dreev bet up on this one since if o1 counts as an LLM I think this also resolves yes?

@JohnCarpenter Oh, yeah, great question. Does o1 count as purely an LLM? [ps, ha, originally when i replied here i didn't notice that this was in fact exactly the original question at the top of this thread. i'm still highly uncertain but we'll go with yes unless anyone has a good counterargument]

@dreev /shrug idk what the word pure means. It's one model that queries itself a lot of times in a row

@JohnCarpenter Yeah this is very nonobvious to me. There's presumably additional code for constructing the chain-of-reasoning prompts.

@dreev I'd say o1 counts unless it uses code execution to help it solve the problem.

@MugaSofer That's sounding reasonable to me. If anyone has counterarguments, now's the time to make them. At this point the default is that this is going to resolve the same way the parent market does.