Which new AI models will be released in February 2025?
210
160kṀ1.4m
Mar 1
2%
Open AI o3
7%
OpenAI video generation
9%
OpenAI image generation
43%
Anthropic flagship language model**
80%
Anthropic reasoning language model***
24%
Anthropic (other)****
30%
Midjourney
18%
Amazon language model
14%
XAI image or video generation
78%
Deepseek language model

Released = available to some portion of the public (including a subset of subscribers or a limited number of API developers from members of the public). Released only for safety testing does not count.

New model = Either announced by the company as a new model, is clear from numbering/naming it is a distinct model, or able to be selected from some sort of menu as a distinct model. Something like "o1 extra mini" would count as while it is part of o1 it can be considered a distinct model in this market.

Must be publically released for the first time between February 1st 00:00am PST and February 28th 11:59pm PST. If it is announced but not yet released to any members of the public it will not count.

For answers where no specific model type is specified alongside the company, then any type of generative AI model will cause it to resolve yes.

*OpenAI (other) refers to any model that is not their new flagship model (eg. GPT 5), o3, a video generator, or an image generator. It could be a derivative of another language model or some other type of model such as a voice generator.

**Anthropic flagship language model refers to a model comparable to claude 3.5 or gpt-4o that should outperform claude 3.5 sonnet on a majority of performance benchmarks. This should not be a reasoning model.

***Anthropic reasoning model refers to a model that is not considered their everyday task model and is akin to what OpenAI's O1 is to gpt-4o.

****Anthropic (any other) refers to any model that is not a reasoning model nor their new flagship model. For example, it could be a derivative of an existing language model or a different type of AI model entirely.

Get
Ṁ1,000
to start trading!
Sort by:
filled a Ṁ1,000 YES at 45% order

Deepseek open sourcing something next week

https://x.com/deepseek_ai/status/1892786555494019098?s=19

filled a Ṁ4,000 YES at 82% order
bought Ṁ750 YES
bought Ṁ5,000 YES

@MingCat what's the difference between reasoning and flagship llm

@typeofemale see the desc, there needs to be a new non-reasoning base model to count as flagship

boughtṀ0 YES
filled a Ṁ7,000 YES at 63% order

Mistral release Mistral Saba 24B

https://mistral.ai/en/news/mistral-saba

@mods can u resolve this

bought Ṁ753 YES
filled a Ṁ1,000 YES at 70% order

What if Anthropic releases a model that will be able to not reason (and even then will be better than 3.5 Sonnet on majority of benchmarks), but will have a parameter that allows reasoning to be turned on? Will both options "Anthropic flagship language model" and "Anthropic reasoning model" be resolved, or maybe "Anthropic (any other)"?

@JanPydych
Unfortunately, we are in a position where the norms of AI companies are rapidly changing so I'm going to try and be as fair to the spirit of the question as possible.

Here are my tentative thoughts:

If there is a toggle then that would be sufficient for both the flagship and reasoning model to resolve to yes. At the time this market was created the norm was for AI companies to label such toggles as distinct models a user can choose from.

If the LLM dynamically decides whether it should reason and there is no toggle then this is where I would consider things to become a bit more unclear. As Bayesian said it probably would be fairest to resolve both to YES.

Thank you for the info

How can I increase my net worth in this market?

I don't have any inside info.

I think the probabilities are all too high, (what is an Amazon model? some 8B thing I haven't heard much about? Microsoft? Copilot? Haven't seen it updated)

opened a 𝕊50.00 YES at 90% order
filled a 𝕊1.00 NO at 77% order
opened a 𝕊5.00 NO at 70% order

@MingCat "A week or two" from Elon probably means it's not coming until next month.

bought Ṁ150 YES

@Manifold imo as models blur the line between reasoning and non reasoning the criteria here should be changed (for future months, hopefully the ambiguity doesnt come up this month)

There’s a decent chance that the model everyone will call their next flagship model (claude 4 sonnet or opus) will be trained on reasoning traces and maybe have done some RL, so that it can do everyday tasks cheaply and without reasoning, and harder tasks by using reasoning. Imo if that happens it would be more intuitive for both of these to resolve YES (ie they would have different criteria than they have now) but in any case it’s a bit ambiguous to me where the line is drawn for a model to be a reasoning model

bought Ṁ250 YES

https://x.com/oliviergodement/status/1889789220664852610

The API will support o3! We will provide knobs such as reasoning effort to get the best out of new frontier models. We are working through options to package o3 in the broader GPT-5 system, hence the "no longer ship standalone". The dev platform remains a strategic priority :).

wadda hell

https://x.com/sama/status/1889757267425370415

GPT 4.5/5 incoming next weeks/months

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules