Idea for calibration scores

Aug 31, 2025

Manifold used to have personal calibration scores for each users. It was deemed unnecessary, because you can track how well a user is doing by their profit. But profit isn't a pure measure of accuracy, because some people use the site more than others, some people buy mana, etc.

But as some users have pointed out, the previous calibration metric, the Brier score, isn't much better. To demonstrate why, imagine a user, CoinFlipper, who doesn't even read the title/description of any market. He just looks at the current probability P. Using a random device, he will bet YES with a probability of P and will bet NO with a probability of 1 - P. Assuming that the markets on Manifold do not consistently overestimate or underestimate probabilities, CoinFlipper will eventually end up with a near-perfect Brier score. Even if Manifold does consistently overestimate/underestimate probabilities, it would be easy for CoinFlipper to measure that and factor it in to his random bets.

Obviously this is an issue, because you shouldn't be able to get a good calibration unless you're providing valuable information to the market. So here's a different calibration metric I thought of.

Start off by separating users into those who have positive profit and those who have negative profit. Let's call those with negative profit "Rank 0 (R0)" and positive profit "Rank 1 (R1)". People in R1 are the users who are providing valuable insight to Manifold.

Next, we need to compare how R1 users fare against each other. To do that, calculate each user's "R1 profit". The R1 profit is basically a user's weighted profit, with higher weights given to bets made against other R1 users. For example, let's say you bet 100M on YES, and win 1000M. If 20% of the NO shares were held by R1 users, then 1000M * 20% = 200M is added to your R1 profit. If 0% were R1 users, then nothing gets added to your R1 profit. The reasoning here is that you were betting against R0 users, so winning against them doesn't mean anything. The same calculations are applied if you make a loss (instead of a profit).

Finally, we look at the users who have a positive R1 profit. These users are now called R2 users. We repeat the entire measurement on R2 users to figure out who the R3 users are, and so on. Eventually we would have to stop at, say, R10, because there wouldn't be enough bets between R10 users to make a proper assessment.

You could show the user his/her calibration score as just the rank, or you could convert it to a percentile score.

I don't know if this metric is already well-known, but I think it would be cool if this was added to Manifold (though I imagine it would be a hassle to implement). Anyways, let me know if any of you have feedback.

retr0id 🫘

This kinda sounds like Elo rating

It's Me

@retr0id

I guess it kinda does.

Ziddletwix

The R1 profit is basically a user's weighted profit, with higher weights given to bets made against other R1 users.

What does it mean for a bet to be made “against” another user? Does this solely include limit orders? Or all orders into the AMM, but based on who holds shares of the other side? So if a coin flip is at 50%, and I see the result and I bet it to 99%, is that a bet made against all other users who hold NO shares, or just the last user to bet NO, or against no one because it was into the AMM?

Ziddletwix

@Ziddletwix reading it again, it seems like what matters is the proportion of shares held by the other side. Is this at market resolution? Ie bets made after you count as “against you”, and if they sell their shares it doesn’t count as being against you? Or is it a backwards check at the moment you place a bet (so if someone bet on this market a year ago, and you bet after a bunch of fresh drops, you are “betting against” all those people who bet before). These distinctions have a large impact on how this metric is interpreted

It's Me

@Ziddletwix yeah I left this vague because I don't know how the share prices are determined. It's meant to check who you're betting against the moment you make the bet. So when someone buys a YES share, you check how much the price of the share was brought down by R1 users. I think this would be equal to the percent of NO shares which were held by R1. Unless more recent bets have a stronger effect on the price, in which case it would be weighted by that.

Ziddletwix

@ItsMe So the probability/price of a market is determined a bit differently than that. And it is very difficult to evaluate this proposal unless the details of "who is defined as the counterparty of a given bet" are hammered out (because this proposal boils down to: "profit, but weighted by the skill of the counterparty"—what matters is how you define the counterparty).

For details on how Manifold works, you can check out an overview of the math here, but TL;DR:

Manifold uses both AMM (automated market maker) & limit orders.
The price/probability of a market is determined by the balance of shares held in the AMM [1].
In practice, that means the current price is essentially set by "wherever the last trade left the AMM"—it actually doesn't matter at all which limit orders were executed previously.

Example:

A low liquidity market starts at 50%.
Trader A bets it up to 90%.
Trader B places a massive YES limit order at 90%, it's filled by trader C.
Trader D bets it down back to 50%.
Trader E places a massive YES limit order at 50%, and it's filled by trader F.

Note: the limit orders between B/C & E/F have zero impact of the price of the market. If they never placed their trades, the market would be the exact same (there would just be fewer shares on both sides).

In this example, who bet against who? Most of the YES shares are held by B & E, most of the NO shares are held by C & F. But did Trader C actually "bet against" Trader E? Trader C was only willing to bet NO at 90%, Trader E was only willing to bet YES at 50%, it's not clear the two sides even disagree?

But that's tackling the simple/easy case, simply: "how do you handle limit orders filled at different prices". There's a simple answer to that—you could only consider the counterparty of that specific bet (B vs C, E vs F). The more challenging case is how to make sense of AMM trades (i.e. the vast vast majority of trades on the site), especially for long run markets that shift over time.

If Trader A bets YES against Trader B at 50% for the outcome of a sports game, and the game happens, and Trader C bets it up to 99%, who did Trader C "bet against"? Did they bet against Trader B? It's a strange construction—A & B had a natural, straightforward bet against each other with equal information, and C swooped in after the game was over and bet it up to 99%, they were basically betting on an entirely different market.

The typical Manifold market looks somewhere in between these examples. Most trades are into the AMM (no direct counterparty), and yet they push the price back and forth around some stable-ish market price. Those traders are effectively betting against each other... but not quite directly. And then over time, the market prices aren't actually so stable, they're drifting. So when it shifts up or down (often the sign of what skilled trading looks like), it's unclear how to define who they were betting against.

This might sound like I'm being pedantic, but I'm really not—the premise of this suggestion is that profit should account for the skill of the counterparty, but the vast majority of trades on Manifold do not have a single, direct counterparty. They are placed into the AMM. You cannot weight for the skill of the counterparty without defining who they are.

[1] & technically the underlying parameter p, but that doesn't really matter here.

It's Me

@Ziddletwix I see, I see. How about we instead look at the final profits/losses at the end of the market. Let's say I profit M100 in a market, two R1 users profit M100, and one R1 user loses M800. Then 800/1000 = 80% is weighted towards my profit, so I get a R1-profit of M80. If I had instead made a loss of M100 in that same market, a 20% weight is applied, so I get a R1-loss of 20M. This way a larger weight is applied to markets in which you go against the grain.

Max E

Interesting idea

The All Memeing Eye 👁️

Cool idea :)