Resolves yes if before 2027, a neural net with <10B parameters achieves all of: >75% on GPQA, >80% on SWE-bench verified, and >95% on MATH
Arbitrary scaffolding allowed (retrieval over fixed DB is ok), no talking with other AI, no internet access. We'll allow up to 1 minute of time per question. We'll use whatever tools are available at the time to determine whether such an AI memorized the answers to these datasets; if verbatim memorization obviously happened, the model will be disqualified.
@SIMOROBO A system can be a multi-domain superintelligence without being AGI. I'd guess achieving the listed scores on these problems is a 1/1e6 or 1e7 feat for a human. Super-1:1e6-human is perhaps more precise, but I'll allow myself that much mis-specification in the title.
@JacobPfau I'm pretty sure you would have a hard time convincing anyone that a bipedal robot achieving a time of exactly 10 seconds at a 100-meter sprint is "super-human" and yet that's probably a better than 1:1e7 result for a human.
I have not seen data on human performance on SWE-Bench Verified but I would assume it's very possible for humans to get 100%. Once AI makes it to 100%, other factors such as speed can begin to make it qualify for the term super-human in my opinion.