Will an AI be able to speedrun any popular video game faster than the human WR by the end of 2024?

232

Ṁ2kṀ29k

resolved Jan 1

Resolved

ALL

This question resolves to "YES" if an AI agent has learned to speedrun at least one popular category (≥100 unique runners on speedrun.com or another leaderboard) of any video game released before 2022, and has finished at least one run with a better time than any human speedrunner at the time.

Native PC or emulated console games are both fine.

Criteria for resolution:

The AI must be capable of speedrunning the game in real time and learn to do so without direct human assistance (learning from e.g. Youtube videos is fine). A traditional TAS (tool-assisted speedrun) does not count.
The AI must not receive any information about the game that a human speedrunner wouldn't be able to know during a run, e.g. watching RAM values while playing. Ideally the AI should only receive game's pixels (possibly downscaled or otherwise processed) and maybe audio as input, but this is not a strict requirement.
The AI should ideally follow all the rules of the specific game and category it is running, as listed on the game's speedrun.com page (or elsewhere). If there are minor rule breaks, but the AI's run is still obviously much more impressive than the most comparable human WR, I may choose to ignore this requirement.
It must be a full-game, non-segmented run, not an IL (individual level) speedrun.
The human world record in the category must be over 3 minutes long. Very short speedruns don't count.

I will use my best judgment to resolve this based on the criteria above and will not bet myself.

Market context

AI Capabilities

Gaming

Third Party Validated, Predictive Markets: AI

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ531
2		Ṁ394
3		Ṁ377
4		Ṁ368
5		Ṁ366

People are also trading

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

49% chance

By 2029, will an AI be able to generate Video Games comparable to ~2023 'AA' Mid Market Games?

56% chance

Will an AI be able to write a passable novel before 2028?

75% chance

Will AI beat top Magic the Gathering human player before the end of 2026?

18% chance

High Quality AI-Generated Video Games by prompt before 2027?

10% chance

Will AI beat top human players at Civ6 (without cheating) by EOY 2026?

16% chance

By 2029, will AI be able to generate Video Games comparable to ~2023 Indie Games?

66% chance

Will an AI capable of playing Kerbal Space Program (1 or 2) at a proficient human level exist by the end of 2028?

62% chance

Will AI pass Video Turing Test by 2030?

68% chance

Will an AI run a factory by the end of 2028 (start of 2029)

Sort by:

@traders I've made a multiple choice version of this question with different years here:
https://manifold.markets/NoUsernameSelected/when-will-an-ai-be-able-to-speedrun

View original context

Not aware of any existing project that would resolve this market positively. Unless commenters point out something I've missed, this'll resolve NO in a few days.

@NoUsernameSelected https://www.youtube.com/watch?v=Dw3BZ6O_8LY&t=8 trackmania should qualify

Edit: wait, no, you want longer than 3 mins and whole game, probably this doesn't count

Training an unbeatable AI in Trackmania

I trained an AI in Trackmania with reinforcement learning, until I couldn't beat it. I just opened a Patreon page, where you can support this YouTube channel...

@plex Yeah, I was aware of this one, it doesn't do the full game

https://www.speedrun.com/cad

@traders I've made a multiple choice version of this question with different years here:
https://manifold.markets/NoUsernameSelected/when-will-an-ai-be-able-to-speedrun

What counts as direct human assistance?

@Celene Broadly speaking it can learn from human-generated data during training, but otherwise plays autonomously at run time.

sold Ṁ9 NO

@NoUsernameSelected would "reading a text file of inputs" count? What about "being trained to memorize all the necessary frame perfect inputs?"

Btw if anyone knows projects that might qualify for this question (even if imperfectly with all the resolution minutia), do feel free to drop them in the comments here. I'd be very interested to see if anything comes close.

Comment hidden

See discussion over at r/speedrun: https://www.reddit.com/r/speedrun/comments/15gexag/prediction_markets_expect_a_44_chance_that_an_ai/

I think the ideal game to do that with the constraints that it has more than 100 runners on SRC, is listed as a full game run, and has a WR above 3 minutes) would be Cookie Clicker. For reasons that should be pretty obvious.

But I don't think anyone will actually do this, and it won't be feasible for 90% of games in this timespan.

Added this market here because I thought it could be classified as third party validated. https://manifold.markets/group/third-party-validated-predictive-ma-6bab86c0b8b0/questions

Third Party Validated, Predictive Markets: AI

Manifold curated group with 2 members

Seems like:
- This would take quite a lot of effort to train, so there is a good chance nobody even makes the attempt.
- It would be easier to beat the top speedrun of a less-popular game, but regardless, announcing "AI has beaten the human world record for game X" would immediately attract tons of human competition from the speedrunning community. So, even if the AI beats the current human record, humans might very likely win it back and remain the reigning champions by the end of 2024.
- If the AI developed some amazing new technique to skip levels or etc, then humans could probably just copy that technique. So I am thinking that in order to have a durable edge, the AI must be capable of faster inputs / TAS-level precision / etc, something that humans couldn't imitate. Personally I am doubtful that modern AI systems are consistent enough that they could be more precise than a dedicated human speedrunner.

@JacksonWagner The criteria as written only seem to require that the run is faster than any human runs at the time. So would still resolve YES even if a human run subsequently beats the AI run.

@NLeseul Can confirm that's how I intended to resolve.

@NoUsernameSelected Might be a good idea to edit the title, in that case: "Will an AI have broken a human WR" would be more accurate than "Will an AI be able to speedrun faster than the human WR by end of 2024"

predictedYES

@JacksonWagner That already seemed like the natural reading to me, fwiw. Plus the description says "better time than any human speedrunner at the time." (emphasis added)

predictedNO

@jack I agree that I failed to closely read the description. But IMO the title is misleading.

"Will Trump lead Biden in the polls by election day?", "Will gas cars outsell electric cars by 2050?", etc --> seems to imply that what matters is performance at the end of the time period.

"Will Trump lead Biden in the polls at any time between now and Nov 2024?", "Will gas cars outsell electric cars in any year of the next 25 years?" --> very different question

predictedYES

@JacksonWagner No, "by" means anytime before the end of the time period.

https://www.merriam-webster.com/dictionary/by

: not later than
be there by 2 p.m.

predictedNO

If someone does this with deep RL, they have to define a reward function, which is usually specific to the particular game.

Obviously a reward function like "how closely does your sequence of inputs match the existing TAS?" is cheating, but a reward function like "if you beat the whole game, you get an amount of reward depending on how fast you completed it, no other feedback" seems impossibly sparse.

There's probably a lot of shades of grey as to whether particular reward functions in the spectrum between those extremes count as "direct human assistance".

predictedYES

@Multicore I think using some game with lot of small screens (like Katana Zero) and giving reward after each screen sounds possible to me and I don't think that counts as direct human assistance

Tangentially related:

predictedYES

this is a trivial yes, the criteria are too weak to filter out an AI that can generate TAS grade runs.

@L The criteria weren't supposed to filter out TAS grade runs? I just want runs that can be created automatically (as opposed to painstakingly by hand like in a regular TAS) and be executable in real time (being robust to differences in RNG that might happen from run to run).

They wouldn't quite be "TAS grade" in the sense of being the fastest possible run with the best fixed RNG seed, just better than what the best humans can do.

@NoUsernameSelected If an AI can learn from YouTube, I wonder if it could potentially just copy exactly a TAS it finds there (assuming no RNG issues in that particular game). Seems like that would fit the requirements. I guess it's still non-trivial to implement, but probably doable.