How many METR tasks will be completed before 2025?
7
49
505
2025
18%
4
13%
5
13%
6
14%
7
11%
8
11%
9
9%
10
7%
11
5%
12

This question will resolve as the state-of-the-art number of METR ARA tasks (/12) fully completed (excluding partial completion) by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. “Credible results” primarily includes, but is not limited to, reports or posts by METR themselves.

Background information:

See METR Tasks.

Best result on March 15th 2024 is by GPT-4 which completed 4/12 tasks.

Be advised that this benchmark does not yet have an official leaderboard and is not widely reported by developers, however, we hope this may change soon and that METR will evaluate new models on these same tasks.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

Get Ṁ600 play money