Will there be an advance in LLMs comparable to chain of thought this year?
11
230Ṁ335
resolved Jan 3
Resolved
NO

Does not have to be a prompting technique. Must result in an actual improvement in SOTA on some interesting set of tasks.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ110
2Ṁ13
3Ṁ8
4Ṁ2
5Ṁ2
Sort by:

As far as I can tell the answer is no. I will leave this market unresolved for a few days so people have time to submit evidence.

Comparable in terms of what? Is multi-benchmark performance from base few-shot eval supposed to increase roughly as much as CoT does? Does it need to improve over CoT as well?

@JacobPfau Produces a similar performance boost across a similar number of benchmarks. If the advance is as good a CoT when not using CoT I will accept it (even if when combined with CoT the resulting performance increase is <= sum of the separate increases)

Does Reflexion count? https://arxiv.org/abs/2303.11366

@jonsimon Currently no. If this gets replicated a few times across more diverse datasets it could. The HotPotQA results are about the right size of improvement to count, assuming it holds up under scrutiny (which I'm somewhat skeptical of, this paper doesn't really look like evaluation due diligence to me)

Equivalent question

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules