Which major AI lab will be the first to release a model that "thinks before it responds" like o1 from OpenAI?
➕
Plus
10
Ṁ597
resolved Nov 22
100%70%Other
7%
Google DeepMind
12%
Anthropic
5%
Meta
6%
xAI

OpenAI o1 blog post says:

We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user.

Which major AI lab will be the first to release a model that "thinks before it responds" like o1 from OpenAI?

Resolution criteria:

  1. The model should be widely available to the public beyond a small group of testers/red-teamers.

  2. It should be a dedicated model specifically designed for reasoning tasks, like o1.

  3. The accompanying blog post or paper must include "thinks before it answers", "test-time compute", "long chain-of-thought", or an equivalent phrase.

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ50 Other YES

Deepseek released r1 model finetuned for reasoning. @NeuralBets

@PeterBuyukliev I'm unsure whether DeepSeek counts as a 'major AI lab'. I should not have included 'other' as option in retrospect.

@PeterBuyukliev Though the spirit of the question *was* about a sincere attempt to replicate o1, and r1 from DeepSeek does look good on benchmarks, so I'm inclined towards resolving this as 'other' if other traders don't object. I think DeepSeek is credible enough and we can take their benchmark numbers on face value.

bought Ṁ100 Other YES

@NeuralBets Yeah, the 'major lab' does leave a lot of room for interpretation. If you do decide to not resolve immediately, can you define which labs are major? Because there are unconfirmed rumours that other Chinese companies are also soon releasing a reasoning fine tuned models.

@NeuralBets I think you could argue that any lab to create a near SOTA LLM inherently counts as "major"

@PeterBuyukliev I specifically included 'major lab' as criterion to screen out scams like Reflection-70B, so my subjective definition of a 'major lab' would be any lab which cannot afford a PR disaster of such a glaring false claim. DeepSeek does not have any VCs to impress and as @JaundicedBaboon noted, "creating a near SOTA LLM inherently counts as 'major'".

I'll wait until November 22, 2024, 1:30 AM PST for anybody else who may want to weigh in. If nobody comes up with a good argument for opposite position by then, I am going to resolve this as 'other'.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules