Will Q* (Q Star) be a significant breakthrough in AI/ML research or engineering?
34
1kṀ4742
resolved Jan 3
Resolved
YES

As judged by me following general expert consensus (not just claims by OpenAI)

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ534
2Ṁ499
3Ṁ325
4Ṁ101
5Ṁ91
Sort by:

since Q* (now o1) pioneered inference time compute, which is certainly a paradigm shift when it comes to LLMs (and imo also going to be important in other models, when we figure out how to train models that can generate more generalized intermediate representations of arbitrary length), i am tempted to resolve this YES.
the strongest argument for "NO" i can think of is that "breakthrough" sure is a big word and o1 is more a first step into a direction that is going to involve many incremental steps, rather than a breakthrough. Any thoughts?

I think the o1 architecture and test time compute is a huge breakthrough personally. went from 2% on frontiermath to 25% for example. huge paradigm shift. just my view though as someone predicting ai advances and thinking the existence of these reasoning models changes a lot of my predictions, maybe someone in ML would disagree? idk

bought Ṁ50 NO

There is no reason that o1 should be considered a „breakthrough“

@Philip3773733 well, they did some kind of new chain-of-thought training paradigm, but it is true that we have as far as I can see not much information on what exactly. You could characterise “breakthrough” either as “how large is the performance gain” or as “how clever/novel is the method”

@Donald This benchmark shows it only marginally improves the score. I mean sure it is better, but it also thinks way longer. Comparing to traditional benchmarks is also misleading, because it uses multi-step thinking, which could be trivially added to e.g. Claude as well using Auto GPT or similar, would be interesting to see a comparison then.

https://aider.chat/2024/09/12/o1.html

@Philip3773733 ill take a more comprehensive look at the different benchmarks a bit closer to the resolution date. for example those academic benchmarks that where provided by openai had some accuracy increases of 30% or so which is pretty huge, although of course relying on openai to benchmark their own product is not how i will resolve this market.

bought Ṁ700 YES

@Donald Do you agree that this resolves YES? have you seen the new o1 family of models (which is the new name for Q* it seems like)

@Bayesian from the information we have so far, I’d say it seems likely. I’m not sure we will get more specific information on the o1 model architecture and Q*, which would be nice for a clean decision

@Donald We probably won’t

© Manifold Markets, Inc.TermsPrivacy