The resolution will be based on expected/average number of electoral college votes predicted to be won by Harris. I believe this approach is better at measuring predictive accuracy than a binary approach.
Specifically, predictions made for these 11 close states will be included in the calculation for a total of 150 votes:
Pennsylvania (19)
Wisconsin (10)
Michigan (15)
Nevada (6)
Arizona (11)
Georgia (16)
North Carolina (16)
New Hampshire (4)
Minnesota (10)
Virginia (13)
Florida (30)
The remaining 388 non-swing states/votes are allocated based on Cook Political Report's projection, regardless of whether there are upsets.
Which site will come closest to the final electoral result with the lowest absolute difference? (Predictions will be taken on November 4, the day before the election.)
Example calculations:
Nate Silver predicts Pennsylvania to be 52% probability for Harris = 19 electoral votes * 52% = 9.9 votes Harris
Polymarket predicts Pennsylvania to be 38% probability for Harris = 7.2 votes Harris
Each site gets 54 votes Harris for California (a non-swing state)
@traders Last update before the election. Here are the final predictions from the platforms/models:
Polymarket: 263.6 EV (Harris)
Manifold: 267.5 EV
Nate Silver: 267.9 EV
FiveThirtyEight: 267.9 EV
538 and Nate Silver are TIED. There's also the interesting property that favorable Trump outcomes will lead to resolving in favor of Polymarket, and favorable Harris outcomes will lead to resolving in favor of 538 or Nate Silver. In the latter scenario, I will resolve both to 50%. Also, Manifold has an extremely low probability of being picked as it requires the final Harris EV to be 265, 266, or 267.
Feel free to arb, I will abstain from trading.
@PlainBG Isn't Nate silver the creator of 538? It would make sense they have the same predictions, both are most likely still using the same models
@nixtoshi He was, but he's left now, and the new 538 model seemed to give very different results from Nate Silver for much of the year.
@nixtoshi Nate Silver's contract with Disney (who own 538) was that they got to buy the 538 brand but not the model, so when they fired him the model left with him and they had to build a new model (which they hired g Elliott Morris, who has a more mixed record, to do).
@ShakedKoplewitz I see! Thanks for the info. I thought they used the same model because their predictions were the same as far as I looked
@FreshFrier actually nevermind I think this market's methodology rewards being miscalibrated towards republicans, because then you're the market positioned to take advantage in most worlds where the polling error benefits republicans, while the middle markets have to fight for the scraps of the like 30-70th percentile outcomes
As of today here are the predicted EV based on the resolution methodology:
Nate Silver: 276.3 for Harris
Polymarket: 276.9 for Harris
Manifold: 278.5 for Harris
There isn't much divergence currently, but the spread may get wider closer to the election. FiveThirtyEight and market had a 40 vote divergence in 2020 which was probably down to the polls being too good for Biden (+8 lead) and the market being skeptical.
Hey, I think there’s a major methodological problem with this market. All of these sites have correlated probabilities, so it doesn’t make sense to evaluate these probabilities independently! You won’t get the average electoral count by simply summing the probabilities of each state, that will give you a completely meaningless value!
I suggest looking at markets like “will Trump get at least X electoral votes” and using the simulations that 538 and silver bulletin both do (that shows how likely they view certainly electoral outcomes at various thresholds) rather than the way you’re doing it! Otherwise it basically has no correlation with the true accuracies of any of these models.
To elaborate what I mean, let’s take an extreme example:
538 thinks that PA, MI, and WI each have 66% chance of going Blue, but those are 100% correlated, ie if one goes Blue, all go Blue.
Silver Bulletin thinks PA, MI, and WI each have a 50% chance of going Blue, but that there’s a decent chance that these are uncorrelated altogether.
Now let’s say 2 of 3 states go Blue (let’s just pretend they all have the same electoral votes).
By your metric, 538 > Silver Bulletin, even though 538 gave basically zero chance of this outcome occurring!
let’s take a DIFFERENT example now:
Let’s say Manifold thinks there’s a 90% chance PA goes Blue and a 10% chance MI and NH go Blue.
Let’s say Polymarket thinks there’s a 50% chance of each going Blue.
If PA goes RED and the other two go Blue, both sites are roughly at the same electoral probability, even though Polymarket was MUCH more accurate than Manifold in predicting the outcomes of each state! That’s bad!
I addressed some of this in other replies and I think the issue is overstated. I did the calculations for the Nate's 2020 and 2024 models and the calculations came 3 or 4 votes within the "average EV projection". And, Nate and 538 assume similar state error correlations. I saw a comparison of the old Economist vs 538 correlation matrix from 2020 and their swing state correlations are similar, so your example is unrealistic.
To evaluate the markets which don't have average EV projection as an output, I proposed this as a simple solution to compare between markets and statistical models.
Note that this market is not judging accuracy in a normal way at all.
Assume some model predicts a 100% chance of winning Florida, Georgia, Pennsylvania, and Wisconsin, while predicting a 0% chance of winning any of the others. Now assume that prediction is exactly wrong: Harris loses Florida, Georgia, Pennsylvania, and Wisconsin, while winning every other close state. This market would consider this exactly wrong prediction to have been perfectly accurate at predicting the electoral college results, simply because it coincidentally got the number of electoral college votes correct. A model predicting exactly the opposite of the original model also would be considered perfectly accurate. Meanwhile a different (completely blind) model predicting a 50% chance of winning every state would also be considered to have made a perfectly accurate prediction.
Taking that completely blind model example further: Assume it predicted a 50% chance of winning every state, and Harris won exactly half of the close states (yes, I realize there are 11 states, but just pretend half of a state's electors voted wrong or something). This market would bizarrely consider this to be a horribly inaccurate model (probably, assuming she didn't coincidentally win precisely the middle-sized states). This is significant because this is what these sites are actually trying to predict: individual state probabilities, regardless of how much each state contributes to total electoral results.
I hope it's clear now why this market is measuring something, but it definitely isn't measuring which of these sites is the most "accurate" at predicting things.
Your below reply addresses the issue if all the sites were trying to predict the total electoral votes received by Harris, but none of the individual state predictions (which are what is being used here) are trying to predict that at all. My issue isn't about models being able to strategically alter their predictions to beat the other models in a game theory sense; my issue is that none of these sites are trying to predict this in the first place. Even if every single individual Manifold market were less accurate than every individual Polymarket market (in the literal sense that the odds were not as close to the real outcome), Manifold might still be considered more accurate by this market's measure.
To concretely see how much of a difference this makes: Nate Silver's model predicts a total of ~257 electoral votes, but if you used your accuracy measurement of individual states, you would 'expect' a result of only ~239 electoral votes. That's a 7% difference even while including all of the states which are basically certain. Presumably the difference would be much greater if only taking into account the close states, whose results are significantly correlated.
(I believe this data is accurate. I'm basing it indirectly on a Sportsbook Review article because I refuse to give Nate money. And maybe Nate actually is biasing his individual state predictions to give him the final result that he wants, but Manifold and Polymarket obviously cannot do this. Consider it more for demonstration purposes.)
The metric of interest is often not directly optimized for in the modeling. Do we compare LLM's based on how they accurately predict the next word or downstream and productive tasks like writing, coding, reasoning? For the purpose of this market, the idea is to see who comes closest to final electoral votes.
To your point about using individual states vs the full model output, you're right that Nate and 538 both have explicit EV predictions across their simulations, but Manifold and Polymarket don't have liquid EV markets, so state forecasts are used to infer them for consistency.
There are more than 2 options so the more middle of the road options will find themselves sandwiched between 2 other options while overconfident models will get all of the blow out senerios in addition to their guess. Technically log loss probably would be the best way to determine who did best.
@PlainBG So when you have multiple competitors (e.g. in a kaggle contest with many teams), having the most accurate model will still get you the best score in expectation. But being overconfident trades expected value for variance, and once you have more than two teams having more variance increases your odds of winning
(At the extreme, imagine 2^50 teams - they could each guess one possible outcome with 100% certainty and one would win while the others would all score negative infinity).
This tradeoff gets better for the variance side (i.e. you should be more overconfident) the more competing models there are, and worse the more predictions there are (although this gets complicated with correlated predictions). Not sure what the optimal amount of overconfidence for four teams is (I'd guess not high, but otoh there aren't that many uncorrelated predictions involved in predicting states). The actual calculation is pretty gnarly.
I understand the argument for using log-loss since it is commonly used for performance eval. This is just sum/mean absolute error, which is fine for following reasons:
Only swing state predictions are considered: overly confident odds given by any of the forecasts are rare
The outcome of interest is not the individual state predictions, but the total electoral vote. I'm defining the latter to be the benchmark so the state predictions are just intermediate to that
My definition is more intuitive and relevant than log-loss which is too technical for the purpose of "who is best at predicting EV"