What year will the first AI exceed 80% on MLE-bench? | Manifold

What year will the first AI exceed 80% on MLE-bench?

37

1kṀ6881

2031

4%

2024

34%

2025

33%

2026

13%

2027

7%

2028

6%

2029

4%

2030

5%

After 2030

https://arxiv.org/abs/2410.07095

We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Kaggle's publicly available leaderboards. We use open-source agent scaffolds to evaluate several frontier language models on our benchmark, finding that the best-performing setup--OpenAI's o1-preview with AIDE scaffolding--achieves at least the level of a Kaggle bronze medal in 16.9% of competitions. In addition to our main results, we investigate various forms of resource scaling for AI agents and the impact of contamination from pre-training. We open-source our benchmark code (this http URL) to facilitate future research in understanding the ML engineering capabilities of AI agents.

Primary metric, so >=80% bronze+ without explicitly training on the test set

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Will an AI score over 80% on FrontierMath Benchmark in 2025

-3% 1d10% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

In what year will AI achieve a score of 85% or higher on the SimpleBench benchmark?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

In what year will AI achieve a score of 95% or higher on the PhysBench benchmark?

Will an AI model outperform 95% of Manifold users on accuracy before 2026?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will any AI model achieve > 40% on Frontier Math before 2026?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Related questions

Will an AI score over 80% on FrontierMath Benchmark in 2025

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

In what year will AI achieve a score of 85% or higher on the SimpleBench benchmark?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

In what year will AI achieve a score of 95% or higher on the PhysBench benchmark?

Will an AI model outperform 95% of Manifold users on accuracy before 2026?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

Will any AI model achieve > 40% on Frontier Math before 2026?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

© Manifold Markets, Inc.•Terms•Privacy