Can model 90% as good as o4-mini be created with open source and

Question

I believe creating a model 90% as good as o4-mini is within the purview of a smart hobby researcher today.

Specifically, I believe it can be achieved using an open-source model of roughly the caliber available today as base, clever scaffolding for agentic tool-use/web search, and an affordable amount of GPU compute.

Specs:

If a LLM is used as base, it must be open-weights, and released during or before June 2025.

Base model must use fewer than 40B activated params if MoE or fewer than 80B params if dense.

Scaffolding/harness to let the model search/run in a loop is allowed and encouraged. Anything goes as long as it's fully automated and not machine learned.

If compute is used for fine-tuning/reinforcement learning, the cost of the compute must be within $500 or fair market value (whichever is higher.)

"90% as good" is defined as difference between o4-mini and hypothetical model of Cohen's d over task-wise scores in 5 runs of THUDM AgentBench ≤ 0.32.

If there are any competent, good-faith attempts (as judged by me), this market resolves YES if any of them satisfy all criteria, else NO. If there are no such attempts, this market resolves N/A.

Update 2026-05-03 (PST) (AI summary of creator comment): The base model must be open-weights and released during or before June 2025. Models like Qwen 3.5 (released after June 2025) do not qualify, even if they already meet the performance threshold without fine-tuning.

Manifold Markets · Answer

Unlikely — Manifold Markets prediction market estimates a 18% chance (8 traders, as of May 4, 2026).

People are also trading

People are also trading

Related questions