

Jack
@jackUnderstanding and optimizing the world. Top Manifold trader and author, specializing in blindly copying statistical models and correcting dumb mispricings. Software engineer, EA, and forecasting enthusiast. If I win mana from you, thank you for matching my donations.
Trading profits
---
Balance
---
Portfolio
---
1D
1W
1M
ALL
Creator rank
#0
Total markets
0
Unique traders
13,231+5%
Posts
Election forecast comparison
This is an unintended duplicate. See here instead:
https://manifold.markets/post/comparing-election-forecast-accurac
It's interesting to compare forecasts between different prediction platforms, but it's rare for them to have questions that are identical enough to compare easily. Elections offer one great opportunity.
I will score several prediction platforms on the following set of questions on the outcome of the 2022 US midterm elections.
For each prediction platform, I will take the predicted probabilities on Monday evening, and compute the average log score on these questions. This is a measure of prediction accuracy - higher log score means better accuracy.
However, note that the election results are highly correlated, so the platform that turns out to be most accurate may not have actually been the best set of predictions. The forecast that scores best is probably going to be the forecast that happened to best predict the broader question of how left or right skewed the entire election was, but some of that might be "luck" which it might not be able to repeat across many different election years. To truly measure accuracy well, we'd need to run this experiment several times over different election cycles.
Of course, I've also created meta prediction markets on which prediction platform will be the most accurate:
[markets]I've selected this set of 10 questions to compare across prediction platforms:
Senate control
House control
Senate races
Pennsylvania - Mehmet Oz R vs John Fetterman D
Nevada - Adam Laxalt R vs Catherine Cortez Masto D
Georgia - Herschel Walker R vs Raphael Warnock D
Wisconsin - Ron Johnson R vs Mandela Barnes D
Ohio - J. D. Vance R vs Tim Ryan D
Arizona - Blake Masters R vs Mark Kelly D
Governor races
Texas - Greg Abbott R vs Beto O'Rourke D
Pennsylvania - Doug Mastriano R vs Josh Shapiro D
These were selected as races that had a high amount of interest across the prediction platforms. They are not all highly competitive races - which is a good thing for looking at how accurate and well-calibrated predictions are across a range of high or low competitiveness races. The main reason for using a limited set of questions is that not all prediction platforms made forecasts on all races. It also makes the data collection easier for me.
I plan to compare these prediction platforms:
Manifold
538
Polymarket
PredictIt
Election Betting Odds (an aggregate of a few prediction markets)
Metaculus
Manifold Salem Center Tournament
Others can be added, just add a comment with the data on their predictions for each of the questions above.
Fine print:
In the event that the winner of an election is not one of the current major-party candidates, I will exclude that race from the calculation. This is to normalize slightly different questions between platforms - some ask which candidate will win, others ask which party will win.
For PredictIt, I will use the last YES prices - e.g. inferred Republican probability will be the average of Republican YES price and 1 - Democratic YES price. I will not use the NO prices (this is because the YES prices are what the platform highlights most prominently)
For Metaculus, I will use the Metaculus Prediction. I will also score the Metaculus Community Prediction for comparison.
Manifold Salem Center Tournament does not have a question on the Texas governor race. I will substitute Manifold's prediction there.
Other notes:
Log score is log(predicted probability of the actual outcome). It's always negative, and higher (closer to zero) is better.
This methodology is partly based on https://www.metaculus.com/questions/5502/comparing-538-and-predictit-forecasts-in-2020/
See prediction questions on which platforms will be most accurate here: https://manifold.markets/group/election-forecast-comparison0
comments
Comparing election forecast accuracy
Update: results are posted here: https://firstsigma.github.io/midterm-elections-forecast-comparison
The summary results are:
[image]See the post for details and analysis.
It's interesting to compare forecasts between different prediction platforms, but it's rare for them to have questions that are identical enough to compare easily. Elections offer one helpful opportunity.
I will score several prediction platforms on a set of 10 questions on the outcome of the 2022 US midterm elections: Senate and House control, and several individual Senate and gubernatorial races with high forecasting interest.
For each prediction platform, I will take the predicted probabilities on Monday evening, and compute the average log score on these questions. This is a measure of prediction accuracy - higher log score means better accuracy.
I plan to compare these prediction platforms:
538 (statistical modeling and poll aggregation)
Polymarket (real-money prediction market)
PredictIt (real-money prediction market)
Election Betting Odds (prediction market aggregator)
Manifold (play-money prediction market)
Metaculus (prediction aggregator)
Others can be added, just add a comment with the data on their predictions for each of the questions above.
Important note: the election is much closer to one overall prediction than a set of independent predictions, because the races are highly correlated. The forecast that scores best is probably going to be the forecast that landed closest to the mark on the broader question of how much the nation overall went left or right, or how far left or right the polls were biased - and a large part of this is chance. So despite the large number of individual races, each election cycle can be thought of as roughly one data point, and to truly measure accuracy well, we'd need to run this experiment several times over different election cycles.
Of course, I've also created meta prediction markets on which prediction platform will be the most accurate: https://manifold.markets/group/election-forecast-comparison
Questions compared
I selected this set of 10 questions to compare across prediction platforms:
Senate control
House control
Senate races
Pennsylvania - Mehmet Oz (R) vs John Fetterman (D)
Nevada - Adam Laxalt (R) vs Catherine Cortez Masto (D)
Georgia - Herschel Walker (R) vs Raphael Warnock (D)
Wisconsin - Ron Johnson (R) vs Mandela Barnes (D)
Ohio - J. D. Vance (R) vs Tim Ryan (D)
Arizona - Blake Masters (R) vs Mark Kelly (D)
Governor races
Texas - Greg Abbott (R) vs Beto O'Rourke (D)
Pennsylvania - Doug Mastriano (R) vs Josh Shapiro (D)
These were selected as races that had a high amount of interest across the prediction platforms. The main reason for using a limited set of questions is that not all prediction platforms made forecasts on all races - the main limiting factor was which questions were on Metaculus. (I did later find a couple more races on Metaculus, but did not add them to my list because I had already preregistered the question set.) Using a smaller set of questions also makes the data collection easier for me.
They are not all highly competitive races - which is a good thing for looking at how accurate and well-calibrated predictions are across a range of high or low competitiveness races.
Fine print on methodology
In the event that the winner of an election is not one of the current major-party candidates, I will exclude that race from the calculation. This is to normalize slightly different questions between platforms - some ask which candidate will win, others ask which party will win.
For 538, I use the forecasts on this page https://projects.fivethirtyeight.com/2022-election-forecast/, i.e. the Deluxe model. I also score the Classic and Lite models for comparison
For PredictIt, I compute the inferred Republican win probability as the average of Republican YES price and 1 - Democratic YES price. I do not use the NO prices (this is because the YES prices are what the platform highlights most prominently)
For Metaculus, I will use the Metaculus Prediction. I will also score the Metaculus Community Prediction for comparison.
For Manifold, there are often multiple questions on the same race, sometimes with slight differences in resolution criteria. I used only the prediction on the market featured on the main midterms map page https://manifold.markets/midterms.
Manifold has a separate instance for the Salem Center/CSPI Tournament which I will also compare. The market mechanics are the same but it uses a separate play-money currency and has a similar but different user base.
This tournament does not have a question on the Texas governor race. I will substitute the main Manifold's prediction there. (For the purposes of main Manifold to Salem Manifold comparison, this is equivalent to excluding this question.)
See prediction questions on which platforms will be most accurate here: https://manifold.markets/group/election-forecast-comparison5
comments
Self-resolving
Self-resolving markets are markets that resolve based on the market itself, instead of the author deciding how to resolve them. In general, this isn't a very good idea, as many experiments have demonstrated: the reason markets work is because participants make profits based on whether they are correct or not, and in a self-resolving market there is nothing to actually tie the market resolution to the real-world question that was asked, so there's no reason to expect the market to have anything to do with the question.
But there are several reasons people are interested in exploring them:
1) Asking questions where determining an answer is expensive. For example, it might take a massive amount of data collection work or it might cost millions of dollars to run a randomized controlled trial.
2) Asking highly subjective questions, such as:
Should Universal Basic Income exist in the US?
At the end of 2023 will manifold users think Twitter has changed for the better?
Did the NSA work to weaken post-quantum cryptography?
Did Hans Niemann cheat against Magnus Carlsen?
Is supersymmetry realized in nature?
These questions are often poll-like, and indeed resolving to the result of a poll is a common way to design them.
3) Many blockchain systems use protocols for forming a consensus resolution that are similar to self-resolution, and they have similar dangers, and it's interesting to explore how to make them more robust. For example, Polymarket uses an oracle where token holders essentially vote with cryptocurrency to settle disputed resolutions, and there are a couple examples of misresolutions.
I think it's valuable to find tweaks to make such markets work better, and valuable to have experiments demonstrating how they can go wrong.
If you are an author thinking about creating a market along these lines, I think these methods work much better:
Resolving to the result of a poll. Examples: https://manifold.markets/SneakySly/at-the-end-of-2023-will-manifold-us and https://manifold.markets/jack/will-we-believe-sbf-committed-willf. While they aren't perfect, polls tend to work a lot better at getting a reasonable result and being mostly (but not 100%) robust to market manipulation.
Some random chance of resolving the normal way, otherwise resolve N/A. Imagine a market predicting the result of an expensive experimental trial. This mechanism means you only have to actually run the trial some fraction of the time, and it is incentive-compatible (you can't profit on the market by manipulating it). Example: https://manifold.markets/jack/does-gdpr-require-selfserviceautoma. Downside is that profit incentives for making good predictions are correspondingly lower.
And, if you really want to resolve-to-mkt despite the flaws:
If you are ok with some author subjectivity, you can say "Resolves to MKT but the author will override if it looks like market manipulation". This has empirically worked ok for many low-stakes markets, but you do have to acknowledge that it becomes very subjective. The line between voting your beliefs and manipulating the market can be very blurry.
Or, if you want to avoid that sort of subjective resolution mechanism, you should at least include a) a random chance of resolving the normal way based on external data, and b) protections against last-minute price manipulation such as a randomized close time or quiescence criteria. E.g. https://manifold.markets/Yev/will-biden-be-president-on-october-fb3d01633429 I do not believe this is as robust against manipulation as a simple poll, while being far more complicated, but at least it's better than a resolve-to-mkt without those features.
The problem is that the reason prediction markets work is that participants profits are based on whether they are correct or not. If the market resolution is instead based only on what the market participants say, and not based on any external data about the actual question, then it can become completely disconnected from the question it is trying to answer - it's a Keynesian beauty contest.2
comments
Public randomness sources
How to generate a random number in a way that everyone can verify is fair even if they don't necessarily trust each other. One name for this is a "public randomness beacon".
Update: We have a Manifold bot that you can easily use to generate secure, verifiable public randomness! See https://manifold.markets/FairlyRandom. This implements the method I proposed below.
Here are some methods:
Some website that publishes random numbers periodically, for example daily. E.g. https://avkg.com/en/daily-random/ (the first google result for "daily random number"). Downsides are that you only get them once a day. (If anyone finds a website like this that does hourly numbers, please let me know!) Numbers from some random website might be manipulated or hacked - definitely not suitable for high-security applications (but probably fine for manifold markets)
Or maybe lottery draws - many of these are published. But lottery numbers might be statistically biased.
Blockchain: Take the first block published after some pre-specified timestamp (e.g. midnight UTC), and use the last N characters of the hash as a random value. See https://manifold.markets/jack/resolves-yes-1-chance-na-99-chance for example. Downside: blocks are published at unpredictable intervals, and there's some small chance of disagreement about block timestamps, or short-lived forks. I'd suggest using e.g. Ethereum rather than Bitcoin, so you don't have to wait as long for the random number to be generated.
I believe the ideal way to do this is with a true public randomness beacon, which generates random numbers periodically (e.g. every minute) and publishes and signs them cryptographically to provide strong verifiability and security properties. Unfortunately, the current implementations I am aware of don't work very well.
drand.love - generates publicly verifiable random numbers every 30 seconds. We have a Manifold bot that you can easily use for this! See https://manifold.markets/FairlyRandom
https://beacon.nist.gov/home - generates publicly verifiable random numbers every 60 seconds. Example URL to view a past random value: https://beacon.nist.gov/beacon/2.0/chain/2/pulse/123244
In general, to avoid using the same random values for multiple different things, you would want to use hash(concat(public randomness beacon value, nonce)) where nonce is published in advance (e.g. the market ID).
Also, Manifold could provide a RNG service as a built-in feature! That would make things a lot easier. The simplest method I can think of is Manifold just adds a bot that comments with a random number upon request. E.g. I post a comment @RngBot 1-20 and it replies with a random number between 1 and 20. (Even better would be if that bot sources the random number from a public randomness beacon instead of just using a local RNG, so that you don't have to trust the bot's security. It can get the next RNG value from a public beacon, and combine it with the comment id of the requesting comment as the nonce.)
Some related discussion:
https://manifold.markets/Yev/will-a-nuclear-weapon-be-detonated-535ab868d69a
https://manifold.markets/jack/what-are-the-best-ways-to-operate-a#f7X6mltyeV6QopuBs6MX7
comments
Nuclear Risk Forecasting
How likely is Russia to use nuclear weapons in Ukraine, and how likely is escalation to other countries? How likely are test vs offensive detonations? Tactical vs strategic weapons? Military vs civilian targets? If one type of nuclear strike occurs, how likely is escalation?
A large collection of Manifold prediction markets seeks to forecast these questions to better understand the risks of nuclear conflict. This page is my attempt to organize some of them to help readers and forecasters find the information they're looking for and see how they interrelate to each other.
Note: The prediction market data shown below updates in real-time, but my commentary here is updated manually (most recently updated October 11), so it may not always match the real-time prediction market data.
How accurate should you expect these markets to be?
Expect error bars of a few percentage points, generally - don't expect particularly fine-grained accuracy around 1-5% and definitely don't expect much accuracy below 1%. More details on this at the end.
I've compared several of these Manifold markets against similar forecasts elsewhere e.g. Metaculus, Samotsvety to help check that they are in a reasonable ballpark and inform my own forecasts.
Any nuclear detonation (including test detonations)
Forecasters predict a substantially elevated chance of nuclear weapons being used in the next couple months. They predict that if a nuclear detonation occurs in the next few months, it will very likely be either Russia or a North Korean nuclear test.
(https://manifold.markets/embed/jack/will-a-nuclear-weapon-be-detonated-d8af7cf07475)(https://manifold.markets/embed/jack/will-russia-detonate-a-nuclear-weap-1c2a001e2311)North Korea has recently conducted a large number of missile tests, and satellite imagery and intelligence indicates the North Korea has completed preparations for a nuclear test at their underground test site. South Korean intelligence agencies are expecting a nuclear test in the next few weeks.
(https://manifold.markets/embed/jack/will-north-korea-conduct-a-nuclear-422f66ae0107)Test vs offensive detonation
Forecasters predict a much lower chance of offensive use of nuclear weapons compared to test detonations. The forecasts predict both that the first nuclear weapon use would likely be a test, and that the chance of escalation from a test detonation to an offensive detonation by the end of the year is low.
@/jack/if-russia-detonates-a-nuclear-weapo-ff6d153f53db
@/jack/if-russia-first-detonates-a-nuclear
(https://manifold.markets/embed/jack/will-a-nuclear-weapon-be-detonated-6843759174bb)(https://manifold.markets/embed/jack/will-an-offensive-nontest-nuclear-w)Additionally, forecasters predict a moderate chance that the use of an offensive nuclear weapon will be somewhat predictable in the week leading up to it, as opposed to being a surprise.
(https://manifold.markets/embed/jack/if-an-offensive-nuclear-detonation)Mass casualty events
The chance of nuclear conflict causing a large number of deaths is forecasted to be much lower than the chance of any offensive detonation, but still concerningly high. This is largely because forecasters predict that offensive nuclear detonations will most likely target military assets (see section Military vs civilian targets below).
(https://manifold.markets/embed/grid/will-at-least-50k-ukrainians-die-fr/will-at-least-3-million-americans-d/will-at-least-67000-people-in-the-u)Reaction to a Russian nuclear weapon and potential escalation paths
Forecasters predict that if Russia uses a nuclear weapon in Ukraine, there is a high chance of direct military conflict between NATO and Russia, as well as a high chance of China stepping away from their partnership with Russia.
(https://manifold.markets/embed/jack/if-a-nuclear-weapon-is-detonated-in-456c43a66cb5)(https://manifold.markets/embed/MarlonK/conditional-on-russia-using-nuclear)They predict a much lower chance of escalation to use of nuclear weapons by NATO or full-scale nuclear war.
(https://manifold.markets/embed/jack/if-a-nuclear-weapon-is-detonated-in)Nuclear weapons beyond Russia and Ukraine
Forecasters predict that if an offensive nuclear detonation occurs, it will most likely be Russian nuclear strike on Ukraine. They predict a small but still concerningly high chance of escalation of nuclear conflict outside Ukraine.
(https://manifold.markets/embed/jack/will-a-nuclear-weapon-be-detonated-932e4533f811)(https://manifold.markets/embed/jack/will-russia-detonate-a-nuclear-weap)(https://manifold.markets/embed/jack/will-a-nuclear-weapon-detonate-in-n)(https://manifold.markets/embed/jack/will-a-nuclear-weapon-detonate-in-n-6edbcd23a9f1)(https://manifold.markets/embed/FRCassarino/if-a-nuclear-weapon-is-launched-in-2c427c814f98)(https://manifold.markets/embed/IsaacKing/conditional-on-at-least-one-nuclear)Tactical vs strategic weapons
Forecasters predict that the first nuclear weapons are used are much more likely to be tactical instead of strategic weapons. Forecasters predict a low chance of escalation from tactical to strategic weapons.
(https://manifold.markets/embed/jack/will-the-first-offensive-nuclear-we-e394defb3cf5)(https://manifold.markets/embed/jack/will-a-tactical-nonstrategic-nuclea)(https://manifold.markets/embed/jack/will-a-strategic-not-tactical-nucle)(https://manifold.markets/embed/jack/if-a-nuclear-weapon-is-detonated-of-f890e2adc514)Military vs civilian targets (counterforce vs countervalue)
Forecasters predict that nuclear conflict is likely to only target military assets (counterforce targeting), rather than targeting civilian populations.
Metaculus's description of countervalue vs counterforce targeting:
Countervalue targeting is "the targeting of an opponent's assets that are of value but not actually a military threat, such as cities and civilian populations". Compared to nuclear strikes against counterforce targets or battlefield targets, countervalue nuclear strikes would typically cause both many more immediate fatalities and much more smoke (increasing the risk of nuclear winter).
(https://manifold.markets/embed/FRCassarino/if-a-russian-nuclear-weapon-strikes)(https://manifold.markets/embed/jack/will-a-countervalue-nuclear-weapon)(https://manifold.markets/embed/jack/if-a-nuclear-weapon-is-detonated-of)Deliberate vs accidental, unauthorized, or inadvertent
Forecasters predict that the next nuclear detonation will most likely be deliberate, but still assign a substantial chance to inadvertent, accidental, or unauthorized detonation.
(https://manifold.markets/embed/jack/will-the-first-offensive-nuclear-we)Other nuclear risk forecasting
The Metaculus Nuclear Risk Tournament has a large series of forecasts on similar questions as above, and I've used them heavily both in formulating the questions here and in making predictions on them
A couple superforcasting teams with strong track records have published detailed reports:
Samotsvety
Swift Centre
Cautionary notes on prediction market accuracy
Some of these markets synthesize the predictions of several forecasters with strong track records, while other markets are new and are based on a small number of unreliable data points. You can check out the comments in the markets to get a better sense of where the prediction is coming from.
Also, the market structure of prediction markets (both in general and Manifold in particular) means they tend to be somewhat inaccurate at forecasting events with probabilities near 0% and 100%. There are several reasons for this, partly it is because some types of trading (limit orders) only accept whole numbers, so the lowest they can go is 1% and they can't distinguish 1.5% and 2%; another reason is that it takes a large amount of funds to correct small mispricings close to the extremes.
Don't expect particularly fine-grained accuracy in the 1-5% range.
Definitely don't expect much accuracy below 1%. E.g. if the true probability is 1 in a million I wouldn't be surprised for a market to be at 0.1% or 1%. Some strategies to get better estimates of these probabilities included chaining multiple conditional probabilities together and using amplified odds markets.
Common definitions and resolution criteria used in my questions
I'm mostly using the same definitions as Metaculus, to lower ambiguity and increase consistency and comparability.
Other authors are often using different definitions and criteria than me. Read resolution criteria carefully, as wording like "in combat" can have very different meanings to different authors.
In general, nuclear detonations may include deliberate, inadvertent, or accidental/unauthorised detonations. Questions that do not explicitly specify otherwise include any of these potential causes of a nuclear detonation.
Offensively means detonations that are not for testing purposes nor for civilian purposes (even if such detonations cause substantial damage).
In questions that do not specify "offensively", a test detonation would count towards question resolution.
See https://www.metaculus.com/questions/2797/no-non-test-nuclear-detonations-by-2024-01-01/
A strategic nuclear weapon is a weapon designed mostly to be targeted at the enemy interior (away from the war front) against military bases, cities, towns, arms industries, and other hardened or larger-area targets, while a tactical (non-strategic) nuclear weapon is a nuclear weapon designed mostly to be used on a battlefield, near friendly forces, or on or near friendly territory. There is no exact definition on weapon yields or ranges. But note that this question is about the type of weapon, not the type of target; it's conceivable that a non-strategic weapon could be used against the sort of target strategic weapons are designed for or vice versa.
See https://www.metaculus.com/questions/8584/nsnw-as-the-first-nuclear-detonation-by-2024/
Countervalue: A detonation is considered countervalue for the purpose of this question if credible media reporting does not widely consider a military or industrial target as the primary target of the attack (except in the case of strikes on capital cities, which will automatically be considered countervalue for this question even if credible media report that the rationale for the strike was disabling command and control structures). Counterforce is the opposite.
See https://www.metaculus.com/questions/7461/total-countervalue-detonations-by-2050/
Country borders:
For the purposes of this question, a country's territory will include the 12 nautical mile territorial sea.
For the purposes of this question, Ukrainian territory will be defined as internationally recognized prior to 2014 (that is, including Crimea, Donetsk, Luhansk, Kherson, and Zaporizhia).
For the purposes of this question, to qualify as within a country, the nuclear weapon must be detonated less than 100 kilometers above Earth's mean sea level.
Fatalities must be caused by the immediate effects of the detonation, so fatalities caused by things like fallout, rioting, or climate effects will not count towards question resolution.
Detonation unless otherwise specified means nuclear explosion. If a nuclear weapon were launched/dropped/etc but the nuclear weapon did not detonate (due to malfunction, interception, etc), that would not count as detonation. If a conventional explosion occurs but no nuclear explosion, that does not count.
More questions
More markets can be found in the Nuclear Risk group.
Since there are a ton of questions about related attributes/criteria, I made a spreadsheet to help organize some of them.4
comments
Market is not endorsement
Creating a market about a bad thing doesn't mean the author endorses the bad thing. Predicting that a bad thing will happen doesn't mean that you want it to happen. This tag is meant to help remind people of this fact when they see markets about controversial topics.0
comments
Comments
No comments yet