Short-term AI 3.4: By June 2024 will SOTA on APPS be >= 25%?
6
23
Ṁ83Ṁ130
Jun 2
25%
chance
1D
1W
1M
ALL
APPS is the more challenging code benchmark (compared to HumanEval). SOTA at market creation is 15.7 by CodeRL. I will use Competition Pass@any.
Notable that the current SOTA is using a very old LLM as the base model, and yet it still beats davinci-002.
Other short-term AI 3 markets:
Get Ṁ200 play money
Related questions
Short Term AI 3.2: By June 2024 will SOTA on MATH be >= 90%?
14% chance
Will there be a period of 12 contiguous months during which no new compute-SOTA LM is released, by Jan 1, 2033?
70% chance
SoAI 23 3/10: Will Self-improving Al agents crush SOTA in a complex environment (e.g. AAA game, tool use, science)?
29% chance
Short-term AI 3.3: By June 2024 will SOTA on HumanEval be >= 99%?
5% chance
Will self-improving AI agents crush SOTA in a complex environment (e.g. AAA game, tool use, science) in next 12 months?
41% chance
BIG-bench accuracy 75% #2: Will SOTA for a single model on BIG-bench pass 75% by the start of 2025?
60% chance
By 2026, will it be standard practice to sandbox SOTA LLMs?
26% chance
For typical SOTA AI systems in 2028, will it be possible for users to know the true reasons for systems making a choice?
POLL
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
16% chance
Short Term AI 3.1: By June 2024 will an AI be mostly/entirely credited with a scientific discovery?
5% chance