
Released = available to some portion of the public (including a subset of subscribers or a limited number of API developers from members of the public). Released only for safety testing does not count.
New model = Either announced by the company as a new model, is clear from numbering/naming it is a distinct model, or able to be selected from some sort of menu as a distinct model. Something like "o1 extra mini" would count as while it is part of o1 it can be considered a distinct model in this market.
Must be publically released for the first time between February 1st 00:00am PST and February 28th 11:59pm PST. If it is announced but not yet released to any members of the public it will not count.
For answers where no specific model type is specified alongside the company, then any type of generative AI model will cause it to resolve yes.
*OpenAI (other) refers to any model that is not their new flagship model (eg. GPT 5), o3, a video generator, or an image generator. It could be a derivative of another language model or some other type of model such as a voice generator.
**Anthropic flagship language model refers to a model comparable to claude 3.5 or gpt-4o that should outperform claude 3.5 sonnet on a majority of performance benchmarks. This should not be a reasoning model.
***Anthropic reasoning model refers to a model that is not considered their everyday task model and is akin to what OpenAI's O1 is to gpt-4o.
****Anthropic (any other) refers to any model that is not a reasoning model nor their new flagship model. For example, it could be a derivative of an existing language model or a different type of AI model entirely.
@Manifold I think you should ask the relatively biggest Yes holders: Those who have the highest % of their balance invested.
@Manifold I'm not certain. But am I mistaken, or does the description not actually specify a generative model?
EDIT: Nevermind, I see it now
@Manifold oh i read it as satisfying the “Either announced by the company as a new model, is clear from numbering/naming it is a distinct model, or able to be selected from some sort of menu as a distinct model.” part of the criteria.
the later part, “For answers where no specific model type is specified alongside the company, then any type of generative AI model will cause it to resolve yes.”, to me read as a sufficient condition for resolution, not a necessary condition
@Manifold I was under the impression that the meta release would count as a generative model for the purposes of the description
@summer_of_bliss I don't think that reading of the description makes much sense to me? The description starts with rules for options where the model is already specified (e.g. "reasoning language model", "O3 mini", etc). Then, for cases where the model isn’t specified, it explains what types of models count: “For answers where no specific model type is specified alongside the company, then any type of generative AI model will cause it to resolve yes.” The alternative would be that (1) that quoted line is meaningless (cut it from the description, & nothing changes), & (2) the question never gives any indication of what models count? (which isn't really how David has been writing these).
(If this model is generative AI then it should definitely resolve YES! I have no idea what it does, & I haven't looked, lots of non-generative output uses gen AI under the hood. But I don't think it makes any sense to exclude that line from the description. And if the intent is to allow any "AI model" count, with no definition provided, then the description should be rewritten.)
@Ziddletwix Thoughtful response, going point by point:
You say “The description starts with the rules for options where the model is already specified”. It doesn’t? The description just defines the term “New model”. Which would apply to every option in this market, surely? Specifically it says:
"New model = Either announced by the company as a new model, is clear from numbering/naming it is a distinct model, or able to be selected from some sort of menu as a distinct model. Something like "o1 extra mini" would count as while it is part of o1 it can be considered a distinct model in this market.”
Can you point out where in this definition it restricts this definition only to options where the specific model/name is already specified? I just don’t see where, I read this as clearly applying to every option in the market.
Then about the “For answers where no specific model […]” paragraph you say that under my interpretation either 1) that quoted line is meaningless/superfluous AND 2) the question never gives any indication of which models count. I disagree on both. The question DOES give an indication of which models count, in the earlier paragraph you just quoted - “Either announced by the company as a new model […]”. Surely you agree these count as criteria?
Now on the first point - suppose I make a market on ‘Will RFK be confirmed by the senate’. I say ‘resolves YES if RFK is confirmed. Resolves no if RFK is not confirmed, dies, or withdraws before being confirmed’. In some sense, that second line is “meaningless” in the sense you’ve just used it. You could cut it out and the criteria are the same, sure. But it acts as clarification. This is incredibly common in prediction markets and was how I read the “generative models would count” line in these criteria.
I would agree that this market should not resolve yes if the description were phrased as “for answers where no specific model type is specified, then ONLY generative models would cause it resolve yes”. But that’s not what it says. It’s phrased as a sufficient, not necessary condition, and so should resolve yes
(This is exactly why I bet M1000 yes when the market was at like 33% - the criteria for these questions where no model is specified, just the company, is incredibly broad as written and thus very likely to resolve yes.)
Today Meta released audiobox-aesthetics model - "Unified automatic quality assessment for speech, music, and sound".
https://github.com/facebookresearch/audiobox-aesthetics
https://ai.meta.com/blog/machine-intelligence-research-new-models/
@bagelfan i dunno i wouldn't say this is a great candidate market for subsidy, given that for any YES resolution all of it will eventually be taken by some news trader. (whereas many sweeps markets can be closed early or etc to use it efficiently)
@Ziddletwix agree that this isn't the most efficient use of subsidy, but 50 sweepcash shared among so many answers is definitely insufficient for what we want to be one of our biggest markets.
@Manifold oh fair i don't have sweeps enabled so i didn't see how tiny it was (i figured per-option it was more like the nba markets, definitely the current amount is way too little)
https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/
Gemini 2.0 Pro Experimental and 2.0 Flash-Lite released.
Dylan Patel repeats claim about Anthropic having a better reasoning model than o3: https://x.com/mark_k/status/1886769660344877073
@Manifold Deep Research doesn't fulfill your requirements to resolve as "other". Other models must be totally distinct models or be based on an expansion of an already existing and released model. You wrote "Something like "o1 extra mini" would count as while it is part of o1 it can be considered a distinct model in this market." This would make deep research resolve as "other" if and only if another version of o3 had already been released, which was not the case. Deep Research is not an expansion of a released model, it is the only model that we have right now for o3. When OpenAI releases another version of o3, only then the market can resolve as "other". You cannot have another model if you don't have the base model before.