Resolve's when the o3 model is usable in the ChatGPT app (mobile or web) for people with a Plus level subscription (which at the time of writing is $20/month).
Update 2024-22-12 (PST) (AI summary of creator comment): - Only a model named exactly "o3" will count for resolution
Exceptions are if the model is named "o3 high" or "o3 low"
Or if there is clear evidence that a model was renamed from "o3" to something else
Models named "o3-mini" or "o3-preview" will not count
interesting part of this is the cost to run:
https://arcprize.org/blog/oai-o3-pub-breakthrough
$7,000 to get 83% on the eval. That's definitely not coming to a $20/month subscription unless there's a breakthrough on compute or very stringent limits on usage.
@MalachiteEagle orders of magnitude more expensive. We will see but that’s a log scale on x with linear y.
@LiamZ sure, but the versions they tuned for arc agi are intended to use the available compute openai allocated for the benchmark evaluation. This is going to be very different from the product they will release to their users. It seems very unlikely that they will release a version of o3 that, on average, takes more than 3 orders of magnitude more compute to respond per task than their o1 release model
@LiamZ the version they release to their users will neither be the version tuned on arc agi, nor will it consume 7k dollars of compute per response
@MalachiteEagle I agree, I think it’s most likely we’ll see a cheaper “o3-mini” type model at entry subscription tier first and possibly only that at the $20/month price for quite a while. Even the “high efficiency” version here was $20 per task which is a lot more than o1 and they’re currently losing money on o1.
will o3-mini, o3-preview, or some other such named version count?
@LiamZ Mini: no, preview: no. The only way this (or the sister market) resolve YES without a model named precisely ”o3” being released is if the models are “o3 high” or “o3 low”, or it’s clear “the model previously known as o3 we have now rebranded to x”
@gallerdude cheers thanks. Based on reported costs I expect a 3o-mini right now within 6 months or so but that 3o itself might be held back as a major selling point for the higher subscription tiers.