Which site will be most accurate at predicting the Electoral College results? (Manifold, Polymarket, Nate Silver, 538)
Basic
69
Ṁ5.7k
Nov 30
15%
Manifold
37%
Polymarket
33%
Nate Silver
15%
FiveThirtyEight

The resolution will be based on expected/average number of electoral college votes predicted to be won by Harris. I believe this approach is better at measuring predictive accuracy than a binary approach.

Specifically, predictions made for these 11 close states will be included in the calculation for a total of 150 votes:

  • Pennsylvania (19)

  • Wisconsin (10)

  • Michigan (15)

  • Nevada (6)

  • Arizona (11)

  • Georgia (16)

  • North Carolina (16)

  • New Hampshire (4)

  • Minnesota (10)

  • Virginia (13)

  • Florida (30)

The remaining 388 non-swing states/votes are allocated based on Cook Political Report's projection, regardless of whether there are upsets.

Which site will come closest to the final electoral result with the lowest absolute difference? (Predictions will be taken on November 4, the day before the election.)

Example calculations:

  • Nate Silver predicts Pennsylvania to be 52% probability for Harris = 19 electoral votes * 52% = 9.9 votes Harris

  • Polymarket predicts Pennsylvania to be 38% probability for Harris = 7.2 votes Harris

  • Each site gets 54 votes Harris for California (a non-swing state)

Get Ṁ1,000 play money
Sort by:

As of today here are the predicted EV based on the resolution methodology:

  • Nate Silver: 276.3 for Harris

  • Polymarket: 276.9 for Harris

  • Manifold: 278.5 for Harris

There isn't much divergence currently, but the spread may get wider closer to the election. FiveThirtyEight and market had a 40 vote divergence in 2020 which was probably down to the polls being too good for Biden (+8 lead) and the market being skeptical.

Hey, I think there’s a major methodological problem with this market. All of these sites have correlated probabilities, so it doesn’t make sense to evaluate these probabilities independently! You won’t get the average electoral count by simply summing the probabilities of each state, that will give you a completely meaningless value!

I suggest looking at markets like “will Trump get at least X electoral votes” and using the simulations that 538 and silver bulletin both do (that shows how likely they view certainly electoral outcomes at various thresholds) rather than the way you’re doing it! Otherwise it basically has no correlation with the true accuracies of any of these models.

To elaborate what I mean, let’s take an extreme example:

538 thinks that PA, MI, and WI each have 66% chance of going Blue, but those are 100% correlated, ie if one goes Blue, all go Blue.

Silver Bulletin thinks PA, MI, and WI each have a 50% chance of going Blue, but that there’s a decent chance that these are uncorrelated altogether.

Now let’s say 2 of 3 states go Blue (let’s just pretend they all have the same electoral votes).

By your metric, 538 > Silver Bulletin, even though 538 gave basically zero chance of this outcome occurring!

let’s take a DIFFERENT example now:

Let’s say Manifold thinks there’s a 90% chance PA goes Blue and a 10% chance MI and NH go Blue.

Let’s say Polymarket thinks there’s a 50% chance of each going Blue.

If PA goes RED and the other two go Blue, both sites are roughly at the same electoral probability, even though Polymarket was MUCH more accurate than Manifold in predicting the outcomes of each state! That’s bad!

I addressed some of this in other replies and I think the issue is overstated. I did the calculations for the Nate's 2020 and 2024 models and the calculations came 3 or 4 votes within the "average EV projection". And, Nate and 538 assume similar state error correlations. I saw a comparison of the old Economist vs 538 correlation matrix from 2020 and their swing state correlations are similar, so your example is unrealistic.

To evaluate the markets which don't have average EV projection as an output, I proposed this as a simple solution to compare between markets and statistical models.

Note that this market is not judging accuracy in a normal way at all.

Assume some model predicts a 100% chance of winning Florida, Georgia, Pennsylvania, and Wisconsin, while predicting a 0% chance of winning any of the others. Now assume that prediction is exactly wrong: Harris loses Florida, Georgia, Pennsylvania, and Wisconsin, while winning every other close state. This market would consider this exactly wrong prediction to have been perfectly accurate at predicting the electoral college results, simply because it coincidentally got the number of electoral college votes correct. A model predicting exactly the opposite of the original model also would be considered perfectly accurate. Meanwhile a different (completely blind) model predicting a 50% chance of winning every state would also be considered to have made a perfectly accurate prediction.

Taking that completely blind model example further: Assume it predicted a 50% chance of winning every state, and Harris won exactly half of the close states (yes, I realize there are 11 states, but just pretend half of a state's electors voted wrong or something). This market would bizarrely consider this to be a horribly inaccurate model (probably, assuming she didn't coincidentally win precisely the middle-sized states). This is significant because this is what these sites are actually trying to predict: individual state probabilities, regardless of how much each state contributes to total electoral results.

I hope it's clear now why this market is measuring something, but it definitely isn't measuring which of these sites is the most "accurate" at predicting things.

I understand the argument for an alternative measure like log-loss, see my comment to below reply for why this market is fine. I think you are overstating your case.

Your below reply addresses the issue if all the sites were trying to predict the total electoral votes received by Harris, but none of the individual state predictions (which are what is being used here) are trying to predict that at all. My issue isn't about models being able to strategically alter their predictions to beat the other models in a game theory sense; my issue is that none of these sites are trying to predict this in the first place. Even if every single individual Manifold market were less accurate than every individual Polymarket market (in the literal sense that the odds were not as close to the real outcome), Manifold might still be considered more accurate by this market's measure.

To concretely see how much of a difference this makes: Nate Silver's model predicts a total of ~257 electoral votes, but if you used your accuracy measurement of individual states, you would 'expect' a result of only ~239 electoral votes. That's a 7% difference even while including all of the states which are basically certain. Presumably the difference would be much greater if only taking into account the close states, whose results are significantly correlated.

(I believe this data is accurate. I'm basing it indirectly on a Sportsbook Review article because I refuse to give Nate money. And maybe Nate actually is biasing his individual state predictions to give him the final result that he wants, but Manifold and Polymarket obviously cannot do this. Consider it more for demonstration purposes.)

The metric of interest is often not directly optimized for in the modeling. Do we compare LLM's based on how they accurately predict the next word or downstream and productive tasks like writing, coding, reasoning? For the purpose of this market, the idea is to see who comes closest to final electoral votes.

To your point about using individual states vs the full model output, you're right that Nate and 538 both have explicit EV predictions across their simulations, but Manifold and Polymarket don't have liquid EV markets, so state forecasts are used to infer them for consistency.

bought Ṁ50 Nate Silver YES

Note that this metric rewards mild overconfidence, since it makes it easier to stand out against competitors.

Can you explain why that is the case?

There are more than 2 options so the more middle of the road options will find themselves sandwiched between 2 other options while overconfident models will get all of the blow out senerios in addition to their guess. Technically log loss probably would be the best way to determine who did best.

@PlainBG So when you have multiple competitors (e.g. in a kaggle contest with many teams), having the most accurate model will still get you the best score in expectation. But being overconfident trades expected value for variance, and once you have more than two teams having more variance increases your odds of winning

(At the extreme, imagine 2^50 teams - they could each guess one possible outcome with 100% certainty and one would win while the others would all score negative infinity).

This tradeoff gets better for the variance side (i.e. you should be more overconfident) the more competing models there are, and worse the more predictions there are (although this gets complicated with correlated predictions). Not sure what the optimal amount of overconfidence for four teams is (I'd guess not high, but otoh there aren't that many uncorrelated predictions involved in predicting states). The actual calculation is pretty gnarly.

I understand the argument for using log-loss since it is commonly used for performance eval. This is just sum/mean absolute error, which is fine for following reasons:

  • Only swing state predictions are considered: overly confident odds given by any of the forecasts are rare

  • The outcome of interest is not the individual state predictions, but the total electoral vote. I'm defining the latter to be the benchmark so the state predictions are just intermediate to that

  • My definition is more intuitive and relevant than log-loss which is too technical for the purpose of "who is best at predicting EV"

Predictions will be taken on November 4, the day before the election