In 2024, will METR or Google announce the results of a METR eval on a Google LLM?

170Ṁ189

resolved Feb 20

Resolved

YES

ALL

METR = formerly ARC Evals (https://metr.org/)

if METR/Google reorgs and has a clear successor org, that org also applies for the purpose of this market

central YES cases:

if Google releases a model card with something like Gemini Supermega 2024 edition with METR exfiltration eval results, like OpenAI did for GPT4 technical report

does not have to be the specific exfiltration eval.

does not have to be included in initial model release paper. does not have to be specifically in a paper.

does not have to be any specific eval granularity. "METR ran the eval and it was all OK" would be ... annoyingly vague from whoever would write it, but it would count.

has to be confirmed-ish by Google and/or METR. can't be just a Twitter rumor.

Update 2025-02-20 (PST) (AI summary of creator comment): Minimum Valid Evidence Update:
- The archived report claude-3-5-sonnet-report qualifies as sufficient evidence for a YES resolution.
- This document serves as a minimum threshold that confirms METR ran the evaluation and the results were in line with the market’s criteria.

This update supplements the current resolution criteria without replacing the original requirements.

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ31
2		Ṁ8
3		Ṁ7
4		Ṁ6
5		Ṁ3

2 Comments

4 Holders

11 Trades

Sort by:

at a minimum, this counts: https://web.archive.org/web/20241218085738/https://metr.github.io/autonomy-evals-guide/claude-3-5-sonnet-report/

resolve YES

I am not aware of this having happened. I'll sometime soon do a little bit of search and resolve from what I find. But if anyone has evidence and can save me some work, I'd appreciate.