When will VideoGameBench be mostly saturated?
9
1kṀ1777
2029
21%
By the end of 2025
45%
By the end of 2026
63%
By the end of 2027
75%
By the end of 2028
82%
By the end of 2029
21%
2030 or later

https://www.vgbench.com

From the webpage:

"tldr;

We introduce a research preview of VideoGameBench, a benchmark which challenges vision-language models to complete, in real-time, a suite of 20 different popular video games from both hand-held consoles and PC.

We also introduce VideoGameBench-Lite, a subset of the games where the environment pauses the game while the model is thinking, thereby ignoring the long inference latency bottleneck of modern vision-language models (VLMs).

Our benchmark focuses entirely on whether VLM agents can beat these games in their entirety, given only raw visual frames from the game. In this research preview, we provide code, explanations of our framework, and initial observations of our basic agent playing these games."

"It becomes apparent after running an agent on any of these games that VLM agents are not close to solving an entire game, let alone even the first level of most games."

Resolves YES when there's a single model that can complete at least 75% (15/20) games in the full version of the benchmark.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy