
Short-term AI 3.4: By June 2024 will SOTA on APPS be >= 25%?
8
130Ṁ1297resolved Jun 8
Resolved
NO1H
6H
1D
1W
1M
ALL
APPS is the more challenging code benchmark (compared to HumanEval). SOTA at market creation is 15.7 by CodeRL. I will use Competition Pass@any.
Notable that the current SOTA is using a very old LLM as the base model, and yet it still beats davinci-002.
Other short-term AI 3 markets:
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ171 | |
2 | Ṁ31 | |
3 | Ṁ21 | |
4 | Ṁ11 | |
5 | Ṁ5 |
People are also trading
Related questions
BIG-bench accuracy 75% #3: Will SOTA for a single model on BIG-bench pass 75% by the start of 2026?
86% chance
What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?
What will be true of the SOTA AI on the FrontierMath benchmark, before 2028?
BIG-bench accuracy 75% #4: Will SOTA for a single model on BIG-bench pass 75% by the start of 2027?
86% chance
What will be true of the SOTA AI on the FrontierMath benchmark, before 2027?
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
6% chance
BIG-bench accuracy 75% #5: Will SOTA for a single model on BIG-bench pass 75% by the start of 2028?
87% chance
MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?
8% chance
[Carlini questions] SOTA AI scores better than X% of other participants in competitive programming contest by 2027
91.5
[Carlini questions] SOTA AI scores better than X% of other participants in competitive programming contest by 2030
95.2