Kimi K2 Thinking gets top score on HLE, according to independent evaluation?
14
100Ṁ540Feb 1
41%
chance
1H
6H
1D
1W
1M
ALL
Resolves according to https://scale.com/leaderboard/humanitys_last_exam,
or to https://scale.com/leaderboard/humanitys_last_exam_text_only if only text-only is evaluated.
Context:
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
Sort by:
@MikhailDoroshenko do you think k2-thinking is particularly better at internet-based research than the other frontiermodels and that gives it the upper hand on HLE when internet search is allowed?


