This market resolves YES if OpenAI publicly demonstrates an AI model specifically designed for computer use/control during January 2025. Resolution will be based on official OpenAI demonstrations, press releases, and public announcements.
References:
Update 2025-20-01 (PST) (AI summary of creator comment): - Updating existing models (e.g., GPT-4o) to enhance computer use will satisfy the resolution criteria.
It is not necessary to train a brand new model from scratch.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ8,880 | |
2 | Ṁ4,742 | |
3 | Ṁ4,568 | |
4 | Ṁ3,631 | |
5 | Ṁ2,032 |
What if they demo a product that uses an existing GPT/o-series model (maybe with some updates) to do computer use rather than a model specifically for computer use?
E.g. this demo theaidigest.org/agent uses GPT-4o to do computer use
@jellyberg The article that prompted this references “Operator” and the OpenAI website currently references a “Computer Use Agent” with eval scores.
If they made an update to GPT-4o to help it do computer use better, that still fits the description. I never thought, nor had anyone suggested, that they would train a brand new model from scratch; that’s definitely not required to resolve this as yes.
OpenAl website has references to Operator/ OpenAl CUA (Computer Use Agent) - "Operator System
Card Table", "Operator Research Eval Table" and
"Operator Refusal Rate Table"
Including comparison to Claude 3.5 Sonnet Computer use, Google Mariner, etc.
Research preview of Operator for Pro users could be launching soon: https://x.com/btibor91/status/1882345619991519711?s=46