Will Manifold add a Best Calibrated Leaderboard by May 2023?
10
262
แน€210
resolved May 1
Resolved
NO

Manifold recently added a link to a per-user calibration plot on each user's profile page.

Will Manifold add a calibration score leaderboard to allow users to compete on this metric, as they currently do for Profit and Market creation?

This market resolves Yes if they add such a leaderboard before May 2023.

Get แน€200 play money

๐Ÿ… Top traders

#NameTotal profit
1แน€58
2แน€45
3แน€16
4แน€12
5แน€12
Sort by:
predicted NO

Please someone explain me

The green dot at (x%, y%) means when Chad bet YES at x%, the market resolved YES y% of the time on average.

What does the red dot mean then?

does it not matter how far above or below line my dots are

does

mean that when I bet on NO at 80%, it resolved NO only 37.5% time? If so, then if I bet NO at 80% and it always resolves NO, would it not give a co-ordinate of (0.8,1) which would make my red dot above line

please explain someone

predicted NO

See also: https://manifold.markets/egroj/will-the-leaderboards-include-a-cal#FY9MT83UUSk7mWJT1mGa

Copying some of the comments from there: I think calibration is a misleading metric because people assign it more meaning than they should IMO. Calibration is a necessary but not sufficient criteria for being a good predictor. Accuracy scores (brier score, log score, etc) are a much better metric. In a market context, that basically maps to profit, or to Sharpe ratio.

predicted NO

Say we want to forecast the weather. If a forecaster looks up the long-run historical averages; let's say it's 80% sunny, 10% cloudy, 10% rain; and every day this forecaster simply puts those in as their forecasts for tomorrow, they will be well-calibrated - 80% of days will be sunny as per their prediction 80%. And this is a totally reasonable baseline for a forecast - but it's much worse than what meteorologist can do! A meteorologist might say that on Monday the forecast is 95% chance of sunny, 5% chance of cloudy; while on Tuesday the forecast is 50% cloudy and 50% rain, and they'll also be well-calibrated, but they have provided much more information i.e. their accuracy will be higher.

Another example: say we want to forecast who will win the 2024 presidential election. To keep things simple, I'll say the options you are asked to forecast are Biden, Trump, DeSantis, Marianne Williamsonn, and "someone else" - so 5 possible options. Suppose you have zero knowledge of these people, and you simply predict 20% chance for each. In the end, exactly one will win, and 4 will not, so you will be exactly correctly calibrated - 20% of your predictions of 20% ended up YES. But obviously you provided no information and your accuracy was very low compared to someone who predicts higher for Biden than for Marianne Williamsonn.

@jack Thank you for directing me to this comment. I see that the lazy weather man can easily be well calibrated, but would the lazy weather man ever be as well calibrated as a knowledgeable meteorologist who takes the time to provide much higher accuracy?

I gather that calibration is a poor substitute metric for what we are actually interested in, which is accuracy. Is there a positive correlation between calibration and accuracy? Or is there no correlation at all, or are they mostly correlated but it is possible to construct a scenario where they run counter to one another?

predicted NO

@ShitakiIntaki The lazy forecaster is perfectly calibrated assuming that the stats they are using are exactly correct - in the election example, by definition there is exactly 1 winner, so the person who says 20% for each ends up with a perfect calibration score. The problem is that calibration is simply not a measure of accuracy.

As I mentioned in the top comment, good accuracy generally requires good calibration, but good calibration is not sufficient to give good accuracy.

@jack My take away is that, maybe not the lazy weather man where the weather is a continuous system, but the lazy political forecaster can have a perfect calibration on a discrete outcome system. No one can have a better calibration than perfect, so it strictly possible for the more accurate political forecaster to end up with a lower calibration score, ergo there is no meaningful correlation between calibration and accuracy.

bought แน€40 of NO

I think calibration and brier score are useful user stats, interpreted correctly, but meaningless to compete on. Profit is just a better metric - meaningful and harder to exploit.

I want to encourage people to compete on resolution reliability, helpfulness of comments, or using probabilities/betting/question-asking to make better decisions in work and life. Either that or compete on something silly but fun. Calibration leaderboard would just be more of the same.

also see the discussion in the discord (in the calibration thread)

Seems trivial to game. Just place random bets at 50% and be perfectly calibrated. There would need to be some additional metrics to select for people who are actually good at prediction, and that that point you're just recreating the profit leaderboard.

@IsaacKing trivial but of markets that resolved where you held a position at close, what fraction of the time you were on the winning side - seems like a stat that should be visible

predicted NO

@firstuserhere That doesn't seem like a useful stat. The number of shares you currently hold doesn't mean anything; only what probability you bet the market to.

predicted NO

@IsaacKing Agreed, there would need to be some kind of robustness checking along of lines of "user has bet on X many non-self-created resolved markets at Y many distinct % values". That would also resolve this to Yes.