Is Polymarket more accurate than Manifold at p<0.05?
94
1kṀ6911
Jul 1
48%
chance

I’m writing an undergraduate thesis comparing real and play money prediction markets at the moment, for which Polymarket and Manifold are my data sources respectively. Their relative accuracy is one of a few questions I plan to investigate.

The data: paired price time series of markets with identical resolution criteria. Polymarket’s price is the mid of the best bid and ask, Manifold’s the AMM price. Topics span sports futures, politics, econ, crypto prices, awards, and whatever other pairs I could find. Shooting for a sample size of at least 150.

I’ll probably use the prices one week before resolution, at least to resolve this market. I’ll bound Polymarket’s prices between 0.01 and 0.99 for a fair test. I’ll restrict the analysis to a-priori plausibly independent markets (which throws out a lot of politics markets). There’s a fairly big range of liquidity/number of traders in the markets.

The test: permutation test on difference in log scores. This means each market’s forecast is given the score ln(p) if it happened, ln(1-p) if it didn’t. Here, higher log score = more accurate. Then I’ll take the sum of differences in log scores across Polymarket-Manifold pairs. This is the test statistic.

If there were no systematic difference in accuracy, then the sign of each difference in log-scores should be random. This lets us generate a distribution of test statistics if Polymarket and Manifold were equally accurate - assign a random sign to the empirical log-score differences, compute the test statistic, then repeat (say) 10,000 times. If the true test statistic is greater than 95% of these values, we can reject the hypothesis of equal accuracy at 0.05 significance.

This market resolves YES iff this procedure shows Polymarket is more accurate than Manifold at p<0.05.

I anticipate I’ll have done this test some time in the next 1-3 months. But could be next week, whenever I get around to it given my other courses etc. I won’t trade in this market.

  • Update 2025-04-16 (PST) (AI summary of creator comment): Update from creator

    • Exclusion of Manipulated Markets: Any market with clearly manipulated resolutions (e.g. the Ukraine market or the Bitcoin reserve event) will be excluded from the analysis.

    • Purpose: This update ensures that only markets with genuine, independently determined resolutions are considered in assessing accuracy.

Get
Ṁ1,000
to start trading!
Sort by:

I think you should set thresholds for the number of traders on manifold markets. There might be some markets with, say, <50 or <10 traders on them that might not be very reasonable to use.

bought Ṁ50 NO

p<0.05 is a fairly high bar. Not saying you should be using a different threshold, but I doubt <the difference is strong enough> + <sample size is large enough> to show it at this confidence level.

@Kingfisher plausible! my intuition is a >150 markets would be enough, but the test i’m using is non-parametric, so it does have less statistical power compared to eg a T-test

also worth noting log scores tend to reward/penalise probabilities near 0 or 1 a lot, so i suspect a lot of the result hinges on how well each market prices 90-100% or 0-10% events

@brod It depends on the type of market. Manifold>Polymarket on most 2024 election markets. On others IDK, that would be interesting.

@HillaryClinton Agreed, excited to see results.

@Brad do you have a plan to handle Polymarket markets with clearly-manipulated resolutions? For example, Polymarket's "Will Trump create Bitcoin reserve in first 100 days" is at 10%, due to coordinated manipulation of the consensus mechanism (see comments), while the Manifold consensus is that this has already resolved YES. (Arguably, the Manifold one is correct.)
- Polymarket: https://polymarket.com/event/will-trump-create-a-national-bitcoin-reserve-in-his-first-100-days
- Manifold: https://manifold.markets/AaronSimansky/what-will-happen-within-donald-trum ->

"Trump create a national Bitcoin reserve" sub-question

What will happen within Donald Trump's first 100 days? [Add Answers] (Please ensure you read rules)
What will happen before May 1, 2025? The rules for specific markets are in the comments. _______________________________________________________________________________________________ Read below: Should there be a dispute, the rules in the comments or those that have been otherwise provided will control over the answer to the question itself. While, the question is intended to provide an easy way for people to bet on their beliefs, it is not the end all be all, and all traders should look at the rules. Please don't hesitate to ask clarifying questions in the comments. In the unlikely event of significant ambiguity regarding whether the resolution criteria have been met. I reserve the right to resolve a question to a percentage that I deem fair. I will give notice before I do this. I will issue a full analysis of my reasoning for resolving a market within 24 hours after I resolve it. If you are dissatisfied with a resolution, please wait for the full analysis, and then leave a comment or message me before leaving a bad review of my resolution. ______________________________________________________________________________________________ On Adding Answers: I reserve the right to N/A any answer that is not in keeping with the spirit of this market Answers must have clear criteria for resolving No meta markets will be allowed (i.e. more than 10 answers resolved to "yes") I will remain the decision-maker on the resolution of all markets. Any statement to the contrary in a market title will be removed. If an answer is added after that answer has already satisfied the criteria to resolve "Yes" or "No" that answer will resolve N/A I will message the creator of a market that I do not believe conforms with rule number two on adding answers (that all answers must have clear criteria for resolving) and they will have 24 hours to provide a clear criteria for resolving (I will help create criteria) or else their submission will be N/A'ed. If there is a market that is possibly subject to N/A I will make that clear once I give the 24 hour notice to the creator of the market. If the criteria for an answer are satisfied before Donald Trump takes office that answer will not resolve unless it is impossible for that answer to resolve yes or no after Trump takes office in which case that answer will resolve N/A. Please note: that given the somewhat ambiguous of some markets, I will not trade on this market. Update 2025-09-01 (PST): - YES if: Section 3 or 4 of the 25th Amendment is invoked Death of the President Resignation of the President Removal from office (AI summary of creator comment) Update 2025-11-01 (PST) (AI summary of creator comment): Definitions Eliminate Mandates: For this market, "eliminate mandates" means that Trump signs an executive order, law, or any other legally binding action that effectively ends federal vaccine mandates within the specified timeframe. The action must be documented in official government releases (e.g., executive orders, laws passed and signed, federal agency directives). State-level vaccine mandate changes do not count unless directly tied to federal action. Statements, promises, or intentions without binding action do not qualify. Vaccine Mandates: Rules or regulations that require individuals or groups to receive any vaccines to participate in activities, employment, or access certain public services, as instituted by federal law or regulation. To be clear, this requires the removal of any enforceable requirements compelling individuals to receive vaccinations, requiring vaccination to receive government benefits or employment, or penalizing non-compliance. Update 2025-11-01 (PST): - A federal death row inmate is executed by the federal government. (AI summary of creator comment) Update 2025-15-01 (PST) (AI summary of creator comment): Resolution Criteria for Cease-Fire within Trump's First 100 Days: Cease-fire Date: The date the cease-fire is entered into is the only date that matters. 14-Day Period: If a cease-fire is entered into before the first 100 days, but the required 14-day period for the cease-fire not to be breached overlaps into the first 100 days, the market will not necessarily resolve to "Yes". Multiple Cease-fires: Another cease-fire agreement entered into within Trump's first 100 days, including one that builds on prior agreements, will count. Hostilities: Continuing hostilities are not necessary for a cease-fire to count. Update 2025-23-01 (PST) (AI summary of creator comment): Impeachment Resolution: The date used will be the date the House of Representatives adopts a resolution impeaching Donald J. Trump. The market will not resolve until it is clear the resolution won't be rescinded, which is shown by either a motion to reconsider being laid on the table or the signing of the resolution. If a resolution of impeachment and articles of impeachment are adopted separately, the time of the impeachment resolution's adoption will control. Update 2025-28-01 (PST) (AI summary of creator comment): Resolution Criteria for 'Executive order ejecting transgender people serving in the military': Ejectment of transgender individuals from the armed forces. Requirement that transgender individuals serve under their sex assigned at birth rather than their gender identity. Update 2025-02-08 (PST) (AI summary of creator comment): Update from creator The market will resolve to Yes if the list of cabinet officials is updated to include a new position within the first 100 days of Trump's time in office. Update 2025-02-10 (PST) (AI summary of creator comment): Update from creator The market resolves Yes if the Trump administration explicitly refuses to comply with a binding court order issued by a U.S. federal or state court. Definition of "Ignore a Court Order": Defying an injunction: Continuing an action that a court has ordered to stop or not doing what the court ordered. Refusing to enforce a ruling: Not acting on a mandate requiring specific action. Publicly declaring noncompliance: Making a public statement that the administration will not comply with a specific court order. Exclusions: Filing an appeal or seeking a stay without simultaneously ignoring the order does not count. Delays in compliance due to bureaucratic processes do not count unless they are willful and explicit. Statements criticizing the court order without actual noncompliance do not count. Update 2025-02-11 (PST) (AI summary of creator comment): Update from creator Tariffs Resolution Clarification General Tariffs: Tariffs applied to all countries generally will not count, even if there are exceptions for a few specific countries. Specific Tariffs: Only tariffs that explicitly name Panama (though not necessarily exclusively) will count. Update 2025-02-11 (PST) (AI summary of creator comment): Clarification on Crash Counting: If a single incident involves multiple jets (for example, one jet colliding with another while landing), it should be counted as a single crash. Update 2025-02-12 (PST) (AI summary of creator comment): Official resolution criteria for jet crashes: The market resolves YES if three or more jet crashes occur in the United States in the first 100 days of Trump’s administration. Each incident is counted as one crash, in keeping with the natural use of the term. Jet crashes must meet FAA guidelines and industry standards. Incidents qualifying as jet crashes include: 2025 Potomac River mid-air collision Med Jets Flight 056 Learjet 35A crash at Scottsdale Airport in Arizona Note that non-jet crashes (e.g., Bering Air Flight 445) do not count. Update 2025-02-18 (PST) (AI summary of creator comment): Pope Francis Definition Update Sede Vacante Trigger: Pope Francis will no longer be considered Pope once the Holy See (Diocese of Rome) enters a period of sede vacante. Update 2025-02-19 (PST) (AI summary of creator comment): Clarification on Papal Definition: Habemus papam Announcement: A person is considered to have been made pope at the moment the Habemus papam announcement is made. Update 2025-02-19 (PST) (AI summary of creator comment): Update from creator Clarification on LMSYS Leaderboard Measurement: Only the result for the number one on the LMSYS leaderboard at day 100 (Wednesday, April 30, 2025 at 12:00:00 noon ET) will be considered. Outcomes from any time before this specified moment will not be used in resolution. Update 2025-02-24 (PST) (AI summary of creator comment): N/A Condition for Monthly Poll If the creator cannot locate this month's Manifold poll by the end of the day, the question will resolve to N/A. Update 2025-02-27 (PST) (AI summary of creator comment): Update from creator Press pool removal: A reporter’s pass is considered revoked only if they are fully removed from the press pool. Partial removal does not count: Restrictions such as limited access (e.g., being barred from entering the Oval Office) are insufficient. AP Note: The AP has not been fully removed from the press pool and therefore does not qualify. Update 2025-03-07 (PST) (AI summary of creator comment): Bitcoin Reserve Criteria Update Trigger: The U.S. government holding any amount of Bitcoin in its reserves at any point during Trump’s first 100 days qualifies. Exclusion: Confiscation of Bitcoin does not count as holding a reserve. Intent: Even if the Bitcoin was obtained through criminal or civil forfeiture proceedings, if the clear intent is to create a strategic Bitcoin reserve as confirmed by the White House announcement and fact sheet, it qualifies for a 'Yes' resolution. Update 2025-03-09 (PST) (AI summary of creator comment): Update from creator Duration: The cessation of military aid does not need to be permanent, but it must last for at least 30 days. Official Statement: There must be an official statement confirming the cessation of military aid. Update 2025-03-14 (PST) (AI summary of creator comment): EU Tariff Clarification Denmark is considered part of the EU. Tariffs placed on the EU (and therefore on all its member states) are treated as general tariffs and do not qualify as specific tariffs for resolving the market as Yes.

@brod Are the probability pairs generally pretty close to each other? Should be easier to detect a difference when the forecasts disagree a lot.

@Kingfisher will avoid any markets with manipulated resolutions like the ukraine one a few weeks ago - didn’t know about the bitcoin reserve one!

@travis Still cleaning data but here’s the Manifold price as a function of Polymarket’s price over about 100 markets (prices sampled daily)

@brod What are the probabilities with the horizontal “manifold lines” in that chart? Eg looks like maybe 90%, 85%, etc? And what’s up with all the manifold markets near 0% with high polymarket probabilities? Mind sharing an example?

(After you’re done, would love to see the dataset uploaded, but totally understand if you’d rather not until the project is complete!)

@Ziddletwix @travis took a closer look - a few illiquid markets and a few fuck ups in pairing on my part, whoops! corrected version:

the remaining lines (see around (0.1, 0.85) and (0.6, 0.2) and (0.95, 0.35)) are markets that didn’t get much attention on manifold and stayed mispriced for a while in particular:

How many SpaceX Starship launches reach space in 2024?

$PNUT listed on Coinbase in 2024?

my main fuck up was accidentally pairing a market on the november 2024 FOMC decision to one on the november 2023 decision - that was the weird set of points at the bottom on the previous chart, my bad!

@brod ah got it, so this plot includes multiple points per market (at different times). For the final test, will it just be a single probability per market (IIuc from description, ~1 wk before resolution), or will it also be a multiple data points?

Cool to see the details!

@Ziddletwix yep that’s right - final analysis will just be the one data point per market (to avoid issues from correlated data points). will also need to get more markets for the final analysis

@brod makes sense!

If the true test statistic is greater than 95% of these values, we can reject the hypothesis of equal accuracy at 0.05 significance.

This market resolves YES iff this procedure shows Polymarket is more accurate than Manifold at p<0.05

so to confirm, this is 95% one-sided? (i.e. just for polymarket more accurate than manifold)

opened a Ṁ250 YES at 25% order

@Kingfisher fwiw i don't think p=0.05 is such a high bar to clear here, since the pairing helps a fair bit (compared to a difference in means).

rough intuition: assume 150 questions, there's some true prob of the event occurring (i went in a uniform sequence), & simulate outcomes. assume manifold & poly always diverge by some delta in the log odds (+/- delta/2 compared to that true prob in log odds). but poly is better, so 60% of the time, that delta points in the right direction, & 40% it points in the wrong direction.

with delta=0.2 (so if true prob = 0.5, you'd have manifold/poly with like a ~5pp gap), & poly is "right" 60% of the time. that should be detected ~most of the time (60%+) @ 95% confidence. "poly is only right 60% of the time, and the markets never disagree by more than 5pp" isn't a super high bar imo—paired tests are fairly strong (for the narrow thing they claim to test).

(that being said, not sure how relevant that naive sim will be bc i'd expect the results will mostly be dominated by their performance on those occasional cases of extreme divergence. my guess is that poly will fare better on those—fewer markets, more users, higher stakes, etc, so fewer blindspots/forgotten markets—in which case it couldn't be too hard to detect the difference if brad can get to 150+ markets. but i understand taking the NO side given that it covers all cases lacking statistical power in addition to other odd surprises. tbh my prediction would hinge quite a bit on seeing a simple scatterplot like the one above but with one data point per market + the final list of all markets included—a lot of this may come down to data cleaning/filters).

@Ziddletwix I tried a simulation like that. I used a random direction for the error, but a larger average error for manifold than polymarket. It was hitting <0.05 about a third of the time, but after I saw Brad's plot, I increased the error to try to match it (just eyeballing) and it's getting <0.05 about half the time. I tried adding big outliers, but surprisingly it didn't make much difference, I guess because it increases the variance of the test statistic and makes <0.05 harder to achieve.

@Ziddletwix yep, one sided test

(also appreciate your & everyone’s comments here, good to get feedback on design and super cool people have taken an interest)

@travis yup. also, in log score, variation tends to be less punished than correctness (obviously that's a simplification, depends on the exact #s & scale you use, but i think it's the general intuition). e.g. for two events that both happen, if polymarket had [0.5, 0.5], versus manifold's [0.4, 0.6] (i.e. same EV forecast but manifold has more variation), poly has a better log score, as expected. but if instead polymarket is [0.52, 0.52] and manifold is [0.5, 0.5] (i.e. poly is just a little bit more correct), poly's log score is ~2x better than in the first case. my sim assumed poly's forecast EV was more correct than manifold's, not just that it had more variation.

@brod
I'm surprised there still seem to be horizontal clusters in both polymarket and Manifold. I'd expected patterns like that too be mirrored along the axis which should result in vertical clusters on Manifold and horizontal ones on polymarket. But then I'm not clear what's causing these clusters in the first place

@AlexanderTheGreater there are multiple data points per market in this plot. So if a

Market on manifold

Is forgotten about and tbe price doesnt change for weeks, but the polymarket price is shifting, you’ll get a horizontal line

@Ziddletwix oh, only manifold markets are forgotten 😔

@AlexanderTheGreater haha yep ziddletwix is right. also the polymarket price is the middle of the bid/ask, so the price can move if people place/remove orders even if no transactions take place, unlike manifold

Really excited to see what happens with this!

Will you be requiring Manifold markets to have a certain amount of traders? Manifold says somewhere between 10-20 traders is where calibration stops getting more accurate, and also that they haven't conducted thorough analysis on the effect of liquidity yet: https://manifold.markets/calibration

@MingCat thank you! I didn’t have any hard cutoff for traders in mind, but all I’ve got so far have been >10. I’d guess if the market’s on polymarket too it must be somewhat popular to trade on. And where multiple Manifold markets on one topic exist I’ve chosen whichever has more traders.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules