LLM reaches >90% Brier score on Prophet Arena by 2026?
11
1kṀ3443
Dec 31
5%
chance

Prophet Arena evaluates the ability of LLMs to forecast the future by letting them predict on live prediction markets (Kalshi).

As of 2025-08-18, the current leader in Brier score is GPT-5 with 82.21%.

Resolves YES, if an LLM manages to reach over 90% on the leaderboard for at least one week (to prevent lucky outliers).

Resolves NA, if the website stops being maintained for at least one month.

Resolves NO, otherwise.

Get
Ṁ1,000
to start trading!
Sort by:

This is literally impossible.

Unskilled (naive 50:50) is 75%

If you take sports (most frequent event)

60% is brier of 0.16 so 84%

65.35% means predicting half the variance which is near the theoretical limit (since sports are somewhat random with the true Vegas odds usually being between 10 and 90%)

so that’s only half the way from 75% so 87.5%

90% on this scale means questions being asked on Kalshi are 90/10 questions which is quite unlikely

@ChinmayTheMathGuy the concept of the benchmark is cool but the results aren’t that clear / useful

Brier should be -300% to +100%

so 90% means explaining 60% of the variance.

Is this saying every model loses money since they’re all less than 100%

@ChinmayTheMathGuy also

the most risk averse version (gamma = 1)

uses logarithmic utility which is full Kelly which is still too risky for most, fractional Kelly is much better since the bets give some credence to the market.

gamma=2 roughly corresponds to half kelly

@kiudee I think you mean 1 - brier score, which is what they report. a 10% or less brier score (less is better) would mean a 90% or more score on the Prophet Arena

that's minor and presumably nobody would be confused but probably worth clarifying the description

To confirm, that's what I meant. Thanks for the keen eye.

© Manifold Markets, Inc.TermsPrivacy