Manifold just introduced a metric of calibration (for example https://manifold.markets/egroj/calibration change egroj for your username to access it, or find the link in your profile page).

Resolves YES if the leaderboard includes a category for calibration before the end of the month.

## Related questions

# 🏅 Top traders

# | Name | Total profit |
---|---|---|

1 | Ṁ273 | |

2 | Ṁ94 | |

3 | Ṁ90 | |

4 | Ṁ88 | |

5 | Ṁ65 |

@MaggieDelano there a could be a short explanation of what calibration is (for example: "a measure of how well you assign the correct probabilities to events") on that board and a link to further details for those interested. There are already different leader boards that incentivize different strategies, people that want to be on top of the traders are already fishing for traders without much benefit to manifold (markets "would I get X number of traders in this market). If the calibration is done with the probability after the trade (not sure about this) then aiming for calibration would also lead to optimal profit, except for when you also include strategies of selling before the market closes to profit from changes in the market. Users with high calibration are better as predictor of events (but not necessarily as predictor of market behavior or human reactions to events)

@egroj I think calibration is a misleading metric because people assign it more meaning than they should IMO. Calibration is a necessary but not sufficient criteria for being a good predictor. I much prefer accuracy scores (brier score, log score, etc).

If the calibration is done with the probability after the trade

It is.

then aiming for calibration would also lead to optimal profit, except for when you also include strategies of selling before the market closes to profit from changes in the market

Not necessarily. While that is broadly true for "simple" trading strategies, it isn't true in many cases. One example would be if I'm hedging/arbitraging across multiple markets.

@jack I think brier score is equivalent to calibration, or maybe I misunderstood how calibration is computed, I thought it was brier score multiplied by -100

@egroj Calibration works differently in an environment where you are betting in a market versus giving your probability point estimate. The user calibration page uses a mean squared error measure which is asymmetric (ie you have error if you bet YES at 5% but the market resolved 3% on average—but not if the market resolves 7% on average).

@jack I think the best metric in a market-based environment (other than total pnl) is the Sharpe ratio.

@SG Also the calibration score is computed by first averaging the buckets, and then computing the error, is that right? That means errors can cancel each other out in the averaging, which is very different than summing up the squared error per bet.

@jack "Calibration is a necessary but not sufficient criteria for being a good predictor."

Can you elaborate on that ?

@MaggieDelano

So from what @SG said, it would not disincentivize only betting when you’re really confident, because doing that would not hurt your calibration (at worst it only hurt your rentability if you have too much unspent mana). Is it correct ?

@DenisBaudouin Sure, I'll give some examples:

Say we want to forecast the weather. If a forecaster looks up the long-run historical averages; let's say it's 80% sunny, 10% cloudy, 10% rain; and every day this forecaster simply puts those in as their forecasts for tomorrow, they will be well-calibrated - 80% of days will be sunny as per their prediction 80%. And this is a totally reasonable baseline for a forecast - but it's much worse than what meteorologist can do! A meteorologist might say that on Monday the forecast is 95% chance of sunny, 5% chance of cloudy; while on Tuesday the forecast is 50% cloudy and 50% rain, and they'll also be well-calibrated, but they have provided much more information i.e. their accuracy will be higher.

Another example: say we want to forecast who will win the 2024 presidential election. To keep things simple, I'll say the options you are asked to forecast are Biden, Trump, DeSantis, Marianne Williamsonn, and "someone else" - so 5 possible options. Suppose you have zero knowledge of these people, and you simply predict 20% chance for each. In the end, exactly one will win, and 4 will not, so you will be exactly correctly calibrated - 20% of your predictions of 20% ended up YES. But obviously you provided no information and your accuracy was very low compared to someone who predicts higher for Biden than for Marianne Williamsonn.

@jack Thank you !

I was thinking about the calibration graph when you said that, which also inform on the accuracy (by example, getting all yes/no answers correct at 100% probability vs 50% probability on each). But if there is only a measure of calibration, this is true that we aren’t taking accuracy into account, which is essential !

I am thinking that the market you bet on are also important and can’t be taken into account by calibration or accuracy : betting on 100 market "I will resolve it yes", give you perfect calibration and perfect accuracy, but doesn’t make you a good predictor.

Also the time seems important, betting correctly just before the market resolve is different from betting 1 year in advance.

In fact, if you don’t move the probability when you bet (if there are a lot of mana in the markets), even if you always bet incorrectly, you bet at the market probability, so you get the accuracy and calibration of the market (and better if you bet correctly because of the asymmetric)

This seems incorrect.

What seem important is the probability you would choose, and how it differs from the market probability, and how much weight you would put on it.

This is probably not far from your profit rate.

@DenisBaudouin Right, these metrics are better suited for making a prediction as a probability, rather than as trades.

The calibration graph has the same issues I just talked about above - if you always bet things like the presidential election example I gave, your calibration chart will be perfect.

And yes, which questions you predict on and when is very important to these metrics, which is another reason you can't just compare them blindly. The appropriate comparison would be to look at two predictors, find the subset of questions that they both predicted on (at about the same time), and compare their accuracy on those.

@jack It will be perfect in the sense that you get the points on the line, but not perfect in the sense that your points are on the line, and at the extremities (which indicate better accuracy).