Will SOTA on any major code benchmark go up at least twice this year?
12
83
แน1.7Kแน190
resolved Jan 1
Resolved
YES1D
1W
1M
ALL
Major code benchmarks include:
Performance on any major code competition (IOI, ICPC, the various competition websites)
A single benchmark needs to go up twice. So a single model that improves SOTA on HumanEval and APPS would not resolve the market YES. We need two different models that both get SOTA on the same benchmark.
Get แน200 play money
๐ Top traders
# | Name | Total profit |
---|---|---|
1 | แน63 | |
2 | แน13 | |
3 | แน10 | |
4 | แน8 | |
5 | แน7 |
Technical AI Timelines questions
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?
53% chance
By end of 2028, will there be a global AI organization, responsible for AI safety and regulations?
42% chance
Related questions
Short-term AI 3.4: By June 2024 will SOTA on APPS be >= 25%?
25% chance
Will SOTA on MATH in Sep 2024 utilize a hard-coded search/amplification procedure?
56% chance
Will there be a period of 12 contiguous months during which no new compute-SOTA LM is released, by Jan 1, 2033?
70% chance
Short Term AI 3.2: By June 2024 will SOTA on MATH be >= 90%?
14% chance
BIG-bench accuracy 75% #2: Will SOTA for a single model on BIG-bench pass 75% by the start of 2025?
60% chance
SOTA on a SWE-bench [Unassisted] in October 2024
SOTA on a SWE-bench [Assisted] in October 2024
Will self-improving AI agents crush SOTA in a complex environment (e.g. AAA game, tool use, science) in next 12 months?
41% chance
By 2026, will it be standard practice to sandbox SOTA LLMs?
26% chance
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
16% chance