Kimi K2 Thinking gets top score on HLE, according to independent evaluation?
19
100Ṁ1295Feb 1
1.7%
chance
1H
6H
1D
1W
1M
ALL
Resolves according to https://scale.com/leaderboard/humanitys_last_exam,
or to https://scale.com/leaderboard/humanitys_last_exam_text_only if only text-only is evaluated.
Context:
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
Sort by:
@traders Kimi K2 Thinking just ranked #4 behind GPT-5, GPT-5 Codex, and Grok 4 in an independent run of HLE from Artificial Analysis: https://artificialanalysis.ai/evaluations/humanitys-last-exam
Although it doesn't resolve this market, it is a significant evidence that Kimi K2 Thinking won't top an independent run of HLE from Scale as well.
@MikhailDoroshenko do you think k2-thinking is particularly better at internet-based research than the other frontiermodels and that gives it the upper hand on HLE when internet search is allowed?


