Options are inclusive: if this happens tomorrow then all options resolve to YES.
This market will of course resolve somewhat subjectively. The overall idea is "there is a model that can do (technical part of) the job of a senior ML engineer. This is the job they do in 2024, not the job we call "senior ML engineer" at market resolution (so it's fine if all the ML engineers stay employed doing something slightly different). It's also only the technical portions - there's no requirement that the AI be able to make beautiful presentations or write long scientific reports.
Some example tasks I expect it to be able to do:
Implement, test, and benchmark a paper given no input besides the paper.
Optimize an ML model (training or inference) for a specific set of computing resources
Write, test, and debug distributed ML code
Build, test, and profile training/inference infrastructure for a set of computing resources (e.g. decide which levels of ZeRO to use for a cluster and then implement them)
Basic ML ops type work (e.g. setup Kubernetes + Volcano)
Suggest + implement minor modifications to existing algorithms (e.g. "I think this would learn better if we added a regularization term")
So in short: it should be able to do basically any technical task related to ML engineering, at a high but not world-class level. (In terms of actual resolution this roughly translates to: "I expect it to be better than the engineers I know at random ML start ups, but worse than the people I know at OpenAI/Anthropic/DeepMind")
The AI must actually be deployed - an LLM that could guide someone else through all of these (but lacks the tool integration to actually do it) doesn't count (this is mostly to simplify resolution).
I will give myself one month (2024-07-20) to modify the resolution criteria based on feedback.