What is Grok 4's performance on METR's task length evaluation?
5
125Ṁ2362026
15%
0 to 1.5 Hours
29%
1.5 to 2 Hours
32%
2 to 2.5 Hours
14%
2.5 to 3 Hours
10%
More than 3 Hours
Resolves based on the METR's measurement of the duration of tasks that can complete with a 50% success rate.
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Grok 4 Heavy does not count
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
Open-source OpenAI model beats Grok 4 on LMArena?
6% chance
Grok 4 before 2026?
99% chance
grok-4 is delayed AGAIN?
6% chance
GPT-5 vs Grok-2+: who will be funnier?
Will Grok 4 be listed on the Chatbot Leaderboard in July?
93% chance
What is Grok 4 Heavy's performance on METR's task length evaluation?
Will Grok 4 Top the Chatbot Leaderboard?
32% chance
Humanity’s Last Exam lists grok 4 at 45%+?
2% chance
How well will Grok 4 do on Frontier Math?
20.8