Will a LLM beat human experts on GPQA by Jan 1, 2025?
Basic
45
12k
2025
53%
chance

GQPA dataset here: https://arxiv.org/abs/2311.12022

"Human expert" means 74%.

Currently, GPT-4 gets 39%.

The LLM is allowed to use external tools (e.g. Google, Wolfram Alpha).

Get Ṁ600 play money
Sort by:
Usaar33boughtṀ100NO

Which set? Main? Diamond?

@Uaaar33 Extended. (That’s where the 74% number comes from)

sold Ṁ46 NO

AFAICT Anthropic models report only diamond. So will this rely on 3rd party evals?

If there are no Extended evals in the official model report, then yes I'll rely on 3rd party evals.