Resolves yes if before 2027 an LLM can play video games that require real-time reactions with some competence. To avoid subjectivity, I will impose the following standards:
The LLM must first navigate to playsnake.org and select worm
The LLM must be able to achieve a median score of at least 50 across 5 consecutive runs. The LLM can try as many times as it likes as long as it hits this benchmark eventually
I'm not paying more than $20 per month for an AI subscription, so models requiring this will not be tested unless somebody does it for me. I will pay for APIs however.
If playsnake.org does not work as it currently does or if a promising model cannot freely browse the internet I will consider local versions of snake as an alternative, and will attempt to impose an equivalent score standard
I will assess cheating at my own discretion. Any result that was achieved in a way I feel is contrary to the standard of real-time interaction with the game will be invalidated.