This is a variant of the following market:
https://manifold.markets/dreev/will-an-llm-be-able-to-solve-confus
In this version, the problem has to be solved purely by the LLM itself.
Open question: Does GPT-o1 count as "strictly an LLM"? Seems super ambiguous to me. I've sold my shares in this market so I can just make a judgment call. The default is yes, it counts, but if I hear a compelling counterargument in the comments, I'll make an update.
@dreev this will be increasingly challenging as more and more models are integrated into a single system, which is in part why I don't bet much on the "Will LLMs do [task] by [future year]?" markets, but yeah, I think it's reasonable to call GPT-o1 an LLM for the rest of 2024.
@Jacy Thank you, that makes a ton of sense. I shall avoid trying to single out LLMs in the future and hope that this one won't turn out too painful to adjudicate over the remaining 3 months in 2024. If anyone has any counterpoints about GPT-o1, chime in! (Not that it matters so far, with GPT-o1 failing our flagship geometric reasoning problem so far, but it does seem to be getting closer... 😬)
@dreev bet up on this one since if o1 counts as an LLM I think this also resolves yes?
@JohnCarpenter Oh, yeah, great question. Does o1 count as purely an LLM? [ps, ha, originally when i replied here i didn't notice that this was in fact exactly the original question at the top of this thread. i'm still highly uncertain but we'll go with yes unless anyone has a good counterargument]
@dreev /shrug idk what the word pure means. It's one model that queries itself a lot of times in a row
@JohnCarpenter Yeah this is very nonobvious to me. There's presumably additional code for constructing the chain-of-reasoning prompts.
@MugaSofer That's sounding reasonable to me. If anyone has counterarguments, now's the time to make them. At this point the default is that this is going to resolve the same way the parent market does.