Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

This question is meant to measure the gap between solving the main math-based benchmarks at the time of market creation, and applying mathematics in the real world.

Data science / data analysis / statistics positions: I'll accept anything in this general realm of jobs, with the caveat that I won't require any significant coding. For instance an AI working as an ML scientist would resolve this market YES, but this is not a requirement.

There is no requirement that any human entry-level workers be fired/replaced, e.g. if for some reason both humans and AI are employed to do the same work that counts.

Get Ṁ600 play money
Sort by:
bought Ṁ5 of NO

Having previously worked as an entry level data scientist / data analyst most of my job wasn't doing math, in fact that was a small minority. Most of it was

  1. Translating some very domain-specific problem description/explanation into a mathematical model

  2. Making pretty data visualizations

  3. Explaining mathematical/quantitative findings to mathematical lay-people

An LLM capable of solving a large suite of math problems may or may not be able to do any of these things.

used as entry-level data science / data analysis / statistics workers?

Does this mean that they're uses effectively? Like, they can actually do the job of an entry level worker? Or just that there's some startup selling a product that claims to be capable of it, and someone is willing to pay them for it, even though it doesn't work very well?

@jonsimon In the scenario where it's not clear if the AI analysts are good I will attempt to use one myself and resolve according to my judgment, and if that's not possible and I can't obtain any additional information I would resolve that to yes.

bought Ṁ10 of NO

@vluzko What would you use it yourself to do? See my above comment about what the job of a data scientist / data analyst constitutes

bought Ṁ20 of NO

Betting that we'll have very expensive models reach here first, and that they will still be too expensive to use routinely two years later.

But if the definition of "used in" entry level jobs is broad enough, wouldn't this already resolve true today? Seems like it should be operationalized as as doing the majority of the work of a full-time entry-level employee.

@JamesGrugett Apologies for the ambiguity: I mean that they are used as full entry level workers, not just by entry level workers. I will rephrase the question.

More related questions