Is Polymarket more accurate than Manifold at p<0.05?

140

1kṀ10k

Sep 30

60%

chance

ALL

I’m writing an undergraduate thesis comparing real and play money prediction markets at the moment, for which Polymarket and Manifold are my data sources respectively. Their relative accuracy is one of a few questions I plan to investigate.

The data: paired price time series of markets with identical resolution criteria. Polymarket’s price is the mid of the best bid and ask, Manifold’s the AMM price. Topics span sports futures, politics, econ, crypto prices, awards, and whatever other pairs I could find. Shooting for a sample size of at least 150.

I’ll probably use the prices one week before resolution, at least to resolve this market. I’ll bound Polymarket’s prices between 0.01 and 0.99 for a fair test. I’ll restrict the analysis to a-priori plausibly independent markets (which throws out a lot of politics markets). There’s a fairly big range of liquidity/number of traders in the markets.

The test: permutation test on difference in log scores. This means each market’s forecast is given the score ln(p) if it happened, ln(1-p) if it didn’t. Here, higher log score = more accurate. Then I’ll take the sum of differences in log scores across Polymarket-Manifold pairs. This is the test statistic.

If there were no systematic difference in accuracy, then the sign of each difference in log-scores should be random. This lets us generate a distribution of test statistics if Polymarket and Manifold were equally accurate - assign a random sign to the empirical log-score differences, compute the test statistic, then repeat (say) 10,000 times. If the true test statistic is greater than 95% of these values, we can reject the hypothesis of equal accuracy at 0.05 significance.

This market resolves YES iff this procedure shows Polymarket is more accurate than Manifold at p<0.05.

I anticipate I’ll have done this test some time in the next 1-3 months. But could be next week, whenever I get around to it given my other courses etc. I won’t trade in this market.

Update 2025-04-16 (PST) (AI summary of creator comment): Update from creator
- Exclusion of Manipulated Markets: Any market with clearly manipulated resolutions (e.g. the Ukraine market or the Bitcoin reserve event) will be excluded from the analysis.
- Purpose: This update ensures that only markets with genuine, independently determined resolutions are considered in assessing accuracy.

Get

1,000

to start trading!

People are also trading

Will US real-money prediction markets create significantlty more shenanigans according to Manifold ?

62% chance

Will Manifold be more accurate than the Veritasium crowd-sourced estimate?

60% chance

Will Manifold be acquired by PolyMarket by end of 2030?

10% chance

Is Manifold better-calibrated than financial markets on questions about future asset prices?

6% chance

Will manifold markets meaningfully affect p(doom) by more than 3%?

18% chance

Will Manifold have real money markets with at least 1% of the active liquidity of Polymarket by end of 2025?

2% chance

How many participants does a manifold market need to become sufficiently predictive (80% probability of being right)?

Sort by:

tapping @wasabipesto on this one (created Calibration City mentioned below, which is now brier.fyi)

@shankypanky thanks for the ping! @brod feel free to reach out if you want assistance on this. I haven't run the exact test you mention here but I do have log scores for a few hundred matched markets which should help.

@wasabipesto 🧡

In case it is useful, you can compare prediction platforms here (https://calibration.city/accuracy), see for example the attached plot. Note that it is comparing across potentially different questions, and that Brier scores are less sensitive to extreme outliers than log scores. In either case, Polymarket seems to be doing better than Manifold as far as I can tell?

I think you should set thresholds for the number of traders on manifold markets. There might be some markets with, say, <50 or <10 traders on them that might not be very reasonable to use.

@bens could the question then be "at some values x where num of traders is > x, is polymarket more accurate than manifold where p<0.05?" No reason this can't be exploratory, it would be interesting to know if there is such an x or if there are multiple, and then test predictions off of that x.

bought Ṁ50 NO

p<0.05 is a fairly high bar. Not saying you should be using a different threshold, but I doubt <the difference is strong enough> + <sample size is large enough> to show it at this confidence level.

@Kingfisher plausible! my intuition is a >150 markets would be enough, but the test i’m using is non-parametric, so it does have less statistical power compared to eg a T-test

also worth noting log scores tend to reward/penalise probabilities near 0 or 1 a lot, so i suspect a lot of the result hinges on how well each market prices 90-100% or 0-10% events

@brod It depends on the type of market. Manifold>Polymarket on most 2024 election markets. On others IDK, that would be interesting.

@HillaryClinton Agreed, excited to see results.

@Brad do you have a plan to handle Polymarket markets with clearly-manipulated resolutions? For example, Polymarket's "Will Trump create Bitcoin reserve in first 100 days" is at 10%, due to coordinated manipulation of the consensus mechanism (see comments), while the Manifold consensus is that this has already resolved YES. (Arguably, the Manifold one is correct.)
- Polymarket: https://polymarket.com/event/will-trump-create-a-national-bitcoin-reserve-in-his-first-100-days
- Manifold: https://manifold.markets/AaronSimansky/what-will-happen-within-donald-trum ->

"Trump create a national Bitcoin reserve" sub-question

What will happen within Donald Trump's first 100 days? [Add Answers] (Please ensure you read rules)

@brod Are the probability pairs generally pretty close to each other? Should be easier to detect a difference when the forecasts disagree a lot.

@Kingfisher will avoid any markets with manipulated resolutions like the ukraine one a few weeks ago - didn’t know about the bitcoin reserve one!

@travis Still cleaning data but here’s the Manifold price as a function of Polymarket’s price over about 100 markets (prices sampled daily)

@brod What are the probabilities with the horizontal “manifold lines” in that chart? Eg looks like maybe 90%, 85%, etc? And what’s up with all the manifold markets near 0% with high polymarket probabilities? Mind sharing an example?

(After you’re done, would love to see the dataset uploaded, but totally understand if you’d rather not until the project is complete!)

@Ziddletwix @travis took a closer look - a few illiquid markets and a few fuck ups in pairing on my part, whoops! corrected version:

the remaining lines (see around (0.1, 0.85) and (0.6, 0.2) and (0.95, 0.35)) are markets that didn’t get much attention on manifold and stayed mispriced for a while in particular:

How many SpaceX Starship launches reach space in 2024?

$PNUT listed on Coinbase in 2024?

my main fuck up was accidentally pairing a market on the november 2024 FOMC decision to one on the november 2023 decision - that was the weird set of points at the bottom on the previous chart, my bad!

@brod ah got it, so this plot includes multiple points per market (at different times). For the final test, will it just be a single probability per market (IIuc from description, ~1 wk before resolution), or will it also be a multiple data points?

Cool to see the details!

@Ziddletwix yep that’s right - final analysis will just be the one data point per market (to avoid issues from correlated data points). will also need to get more markets for the final analysis

@brod makes sense!

If the true test statistic is greater than 95% of these values, we can reject the hypothesis of equal accuracy at 0.05 significance.
This market resolves YES iff this procedure shows Polymarket is more accurate than Manifold at p<0.05

so to confirm, this is 95% one-sided? (i.e. just for polymarket more accurate than manifold)

opened a Ṁ250 YES at 25% order

@Kingfisher fwiw i don't think p=0.05 is such a high bar to clear here, since the pairing helps a fair bit (compared to a difference in means).

rough intuition: assume 150 questions, there's some true prob of the event occurring (i went in a uniform sequence), & simulate outcomes. assume manifold & poly always diverge by some delta in the log odds (+/- delta/2 compared to that true prob in log odds). but poly is better, so 60% of the time, that delta points in the right direction, & 40% it points in the wrong direction.

with delta=0.2 (so if true prob = 0.5, you'd have manifold/poly with like a ~5pp gap), & poly is "right" 60% of the time. that should be detected ~most of the time (60%+) @ 95% confidence. "poly is only right 60% of the time, and the markets never disagree by more than 5pp" isn't a super high bar imo—paired tests are fairly strong (for the narrow thing they claim to test).

(that being said, not sure how relevant that naive sim will be bc i'd expect the results will mostly be dominated by their performance on those occasional cases of extreme divergence. my guess is that poly will fare better on those—fewer markets, more users, higher stakes, etc, so fewer blindspots/forgotten markets—in which case it couldn't be too hard to detect the difference if brad can get to 150+ markets. but i understand taking the NO side given that it covers all cases lacking statistical power in addition to other odd surprises. tbh my prediction would hinge quite a bit on seeing a simple scatterplot like the one above but with one data point per market + the final list of all markets included—a lot of this may come down to data cleaning/filters).

@Ziddletwix I tried a simulation like that. I used a random direction for the error, but a larger average error for manifold than polymarket. It was hitting <0.05 about a third of the time, but after I saw Brad's plot, I increased the error to try to match it (just eyeballing) and it's getting <0.05 about half the time. I tried adding big outliers, but surprisingly it didn't make much difference, I guess because it increases the variance of the test statistic and makes <0.05 harder to achieve.

@Ziddletwix yep, one sided test

(also appreciate your & everyone’s comments here, good to get feedback on design and super cool people have taken an interest)

@travis yup. also, in log score, variation tends to be less punished than correctness (obviously that's a simplification, depends on the exact #s & scale you use, but i think it's the general intuition). e.g. for two events that both happen, if polymarket had [0.5, 0.5], versus manifold's [0.4, 0.6] (i.e. same EV forecast but manifold has more variation), poly has a better log score, as expected. but if instead polymarket is [0.52, 0.52] and manifold is [0.5, 0.5] (i.e. poly is just a little bit more correct), poly's log score is ~2x better than in the first case. my sim assumed poly's forecast EV was more correct than manifold's, not just that it had more variation.

@brod
I'm surprised there still seem to be horizontal clusters in both polymarket and Manifold. I'd expected patterns like that too be mirrored along the axis which should result in vertical clusters on Manifold and horizontal ones on polymarket. But then I'm not clear what's causing these clusters in the first place

@AlexanderTheGreater there are multiple data points per market in this plot. So if a

Market on manifold

Is forgotten about and tbe price doesnt change for weeks, but the polymarket price is shifting, you’ll get a horizontal line

@Ziddletwix oh, only manifold markets are forgotten 😔

@AlexanderTheGreater haha yep ziddletwix is right. also the polymarket price is the middle of the bid/ask, so the price can move if people place/remove orders even if no transactions take place, unlike manifold

People are also trading

Will US real-money prediction markets create significantlty more shenanigans according to Manifold ?

62% chance

Will Manifold be more accurate than the Veritasium crowd-sourced estimate?

60% chance

Will Manifold be acquired by PolyMarket by end of 2030?

10% chance

Is Manifold better-calibrated than financial markets on questions about future asset prices?

6% chance

Will manifold markets meaningfully affect p(doom) by more than 3%?

18% chance

Will Manifold have real money markets with at least 1% of the active liquidity of Polymarket by end of 2025?

2% chance

How many participants does a manifold market need to become sufficiently predictive (80% probability of being right)?

POLL

People are also trading

People are also trading

Related questions