When will Genie-like methods be used to train a frontier model on computer-use tasks?
2
1kṀ120
2029
50%
2029
41%
2027

Google DeepMind's Genie 3 is a model that can essentially be used to generate video game frames in response to user actions. It was developed in part for the purposes of training agents in 3D worlds, like SIMA. This could generalize to those agents operating robots in the real world.

However, the most clearly-useful task that requires video understanding and would be useful to teach frontier AI models is computer use - e.g. browsing the web or answering Slack and email. These are the sorts of things that don't directly involve understanding a 3D space but could be very economically relevant.

In theory, Genie-like methods could be used to generate a computer screen on the fly, rather than the software being explicitly coded. The methods aren't ready for this yet - in particular, I assume rendering lots of tiny words on a screen would be difficult - but maybe with enough scale this could work well.

This way, if you wanted to e.g. teach your agent to interact with the custom claims management system of any insurance company, you wouldn't have to ask insurance companies to give you access to their systems or have an LLM write code implementing the systems from scratch. You could generate a brand-new, made-up UI for every rollout, where the UI comes up with plausible responses to the user's actions just-in-time.

If it is confirmed that this kind of method was used to train a frontier model in or before a certain year, the market for that year resolves YES.

By "Genie-like methods," I mean models that neurally generate a video output in response to a user's actions, something like what I describe in this market.

For the term "frontier model," I'll adapt my definition from here: if the model is close to the state-of-the-art on a commonly-used eval, that's good enough to count. For instance, Gemini Ultra supposedly got a state-of-the-art, 90.0% score on MMLU, but some argued that they were gaming the eval. Rather than getting into the weeds figuring out whether or not a model counts as SOTA, I'll err on the side of counting it.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy