Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

190Ṁ142

2050

67%

chance

ALL

This question is meant to measure the gap between solving the main math-based benchmarks at the time of market creation, and applying mathematics in the real world.

Data science / data analysis / statistics positions: I'll accept anything in this general realm of jobs, with the caveat that I won't require any significant coding. For instance an AI working as an ML scientist would resolve this market YES, but this is not a requirement.

There is no requirement that any human entry-level workers be fired/replaced, e.g. if for some reason both humans and AI are employed to do the same work that counts.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

73% chance

Will any AI model achieve > 40% on Frontier Math before 2026?

70% chance

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

92% chance

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

15% chance

Will models be able to do the work of an AI researcher/engineer before 2027?

26% chance

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

7.6

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

4.3

Will an AI score over 80% on FrontierMath Benchmark in 2025

Sort by:

Having previously worked as an entry level data scientist / data analyst most of my job wasn't doing math, in fact that was a small minority. Most of it was

Translating some very domain-specific problem description/explanation into a mathematical model
Making pretty data visualizations
Explaining mathematical/quantitative findings to mathematical lay-people

An LLM capable of solving a large suite of math problems may or may not be able to do any of these things.

used as entry-level data science / data analysis / statistics workers?

Does this mean that they're uses effectively? Like, they can actually do the job of an entry level worker? Or just that there's some startup selling a product that claims to be capable of it, and someone is willing to pay them for it, even though it doesn't work very well?

@jonsimon In the scenario where it's not clear if the AI analysts are good I will attempt to use one myself and resolve according to my judgment, and if that's not possible and I can't obtain any additional information I would resolve that to yes.

@vluzko What would you use it yourself to do? See my above comment about what the job of a data scientist / data analyst constitutes

Betting that we'll have very expensive models reach here first, and that they will still be too expensive to use routinely two years later.

But if the definition of "used in" entry level jobs is broad enough, wouldn't this already resolve true today? Seems like it should be operationalized as as doing the majority of the work of a full-time entry-level employee.

@JamesGrugett Apologies for the ambiguity: I mean that they are used as full entry level workers, not just by entry level workers. I will rephrase the question.