Short-term AI 3.3: By June 2024 will SOTA on HumanEval be >= 99%?
9
17
190
Jun 2
5%
chance

Benchmark. SOTA at market creation is 94.4%.

Other short-term AI 3 markets:

Get Ṁ200 play money
Sort by:
bought Ṁ20 of NO

The last few % points are always the hardest. Unless, of course, you train on the validation set.

@thooton I think it's quite plausible that the test set will end up in the training set in some hard to detect way. I will exclude models for this if it's known their training set is poisoned (I assume Papers With Code would exclude them as well), but for most large language models the pre-training data is not public.