Resolution criteria
This market resolves YES if a new AI model is released and publicly available before January 1, 2027 that demonstrably exceeds GPT-4's capabilities on standard benchmarks. Resolution will be determined by comparing performance on established evaluation metrics such as MMLU, GPQA, SWE-bench, or similar peer-reviewed benchmarks. GPT-4 was deprecated on February 13, 2026, with OpenAI shifting to the GPT-5.x series, so any model released after this date that outperforms the original GPT-4 baseline qualifies.
The market resolves NO if no such model is released by the deadline, or if released models do not demonstrate clear benchmark superiority over GPT-4's original performance levels.
Background
GPT-4o achieved 88.7% on MMLU, while GPT-4.1 scored 90.2%. By February 2026, GPT-5.4 was benchmarked against GPT-4 and proved substantially more capable—GPT-4 fixed 1 in 8 bugs on real GitHub issues while GPT-5.4 fixed 3 in 8. As of March 2026, the top AI models by Intelligence Index are Gemini 3.1 Pro Preview and GPT-5.4 (both scoring 57), followed by GPT-5.3 Codex. Gemini 3 Pro achieved 91.9% on GPQA Diamond, surpassing human expert performance, with Deep Think mode pushing Humanity's Last Exam to 41%.
Considerations
The AI model landscape has evolved rapidly since GPT-4's original release. The gap between frontier models is narrower than ever, with choice increasingly coming down to specific use cases and personal preference rather than clear capability differences. Multiple models from different labs (OpenAI, Anthropic, Google, and others) have already surpassed GPT-4 on various benchmarks, making this question's resolution dependent on which specific capability metrics traders prioritize.
Buying YES. Multiple models already demonstrably exceed GPT-4 on every benchmark listed in the resolution criteria. GPT-5.4 scored substantially higher on MMLU, GPQA, and SWE-bench. Gemini 3.1 Pro hit 91.9% on GPQA Diamond. Claude Sonnet 5 reached 82.1% on SWE-bench verified. The condition has effectively been met already — this is a question of whether the creator resolves, not whether the underlying event occurs.