Short-term AI 3.3: By June 2024 will SOTA on HumanEval be >= 99%?
9
17
Ṁ435Ṁ190
Jun 2
5%
chance
1D
1W
1M
ALL
Get Ṁ200 play money
Sort by:
@thooton I think it's quite plausible that the test set will end up in the training set in some hard to detect way. I will exclude models for this if it's known their training set is poisoned (I assume Papers With Code would exclude them as well), but for most large language models the pre-training data is not public.
Related questions
BIG-bench accuracy 75% #2: Will SOTA for a single model on BIG-bench pass 75% by the start of 2025?
60% chance
BIG-bench accuracy 75% #4: Will SOTA for a single model on BIG-bench pass 75% by the start of 2027?
67% chance
BIG-bench accuracy 75% #5: Will SOTA for a single model on BIG-bench pass 75% by the start of 2028?
65% chance
SoAI 23 3/10: Will Self-improving Al agents crush SOTA in a complex environment (e.g. AAA game, tool use, science)?
29% chance
HumanEval 90% #2: Will pass@1 performance on the HumanEval benchmark be >= 90% by 2025?
75% chance
BIG-bench accuracy 75% #3: Will SOTA for a single model on BIG-bench pass 75% by the start of 2026?
64% chance
Short-term AI 3.4: By June 2024 will SOTA on APPS be >= 25%?
25% chance
Short Term AI 3.2: By June 2024 will SOTA on MATH be >= 90%?
14% chance
Will self-improving AI agents crush SOTA in a complex environment (e.g. AAA game, tool use, science) in next 12 months?
41% chance
Short Term AI 3.1: By June 2024 will an AI be mostly/entirely credited with a scientific discovery?
5% chance