The resolution will be based on **expected/average number** of electoral college votes predicted to be won by Harris. I believe this approach is better at measuring predictive accuracy than a binary approach.

Specifically, predictions made for these 11 close states will be included in the calculation for a total of 150 votes:

Pennsylvania (19)

Wisconsin (10)

Michigan (15)

Nevada (6)

Arizona (11)

Georgia (16)

North Carolina (16)

New Hampshire (4)

Minnesota (10)

Virginia (13)

Florida (30)

The remaining 388 non-swing states/votes are allocated based on Cook Political Report's projection, regardless of whether there are upsets.

**Which site will come closest to the final electoral result with the lowest absolute difference?** (Predictions will be taken on November 4, the day before the election.)

Example calculations:

Nate Silver predicts Pennsylvania to be 52% probability for Harris = 19 electoral votes * 52% = 9.9 votes Harris

Polymarket predicts Pennsylvania to be 38% probability for Harris = 7.2 votes Harris

Each site gets 54 votes Harris for California (a non-swing state)

## Related questions

Note that this market is *not* judging accuracy in a normal way at all.

Assume some model predicts a 100% chance of winning Florida, Georgia, Pennsylvania, and Wisconsin, while predicting a 0% chance of winning any of the others. Now assume that prediction is *exactly* wrong: Harris loses Florida, Georgia, Pennsylvania, and Wisconsin, while winning every other close state. This market would consider this *exactly wrong* prediction to have been perfectly accurate at predicting the electoral college results, simply because it coincidentally got the *number* of electoral college votes correct. A model predicting exactly the opposite of the original model *also* would be considered perfectly accurate. Meanwhile a different (completely blind) model predicting a 50% chance of winning every state would *also* be considered to have made a perfectly accurate prediction.

Taking that completely blind model example further: Assume it predicted a 50% chance of winning every state, and Harris won *exactly* half of the close states (yes, I realize there are 11 states, but just pretend half of a state's electors voted wrong or something). This market would bizarrely consider this to be a horribly *inaccurate* model (probably, assuming she didn't coincidentally win precisely the middle-sized states). This is significant because this is what these sites are actually trying to predict: individual state probabilities, regardless of how much each state contributes to total electoral results.

I hope it's clear now why this market is measuring *something*, but it definitely isn't measuring which of these sites is the most "accurate" at predicting things.

Your below reply addresses the issue if all the sites were trying to predict the total electoral votes received by Harris, but none of the individual state predictions (which are what is being used here) are trying to predict that at all. My issue isn't about models being able to strategically alter their predictions to beat the other models in a game theory sense; my issue is that none of these sites are *trying* to predict this in the first place. Even if every single individual Manifold market were less accurate than every individual Polymarket market (in the literal sense that the odds were not as close to the real outcome), Manifold might *still* be considered more accurate by this market's measure.

To concretely see how much of a difference this makes: Nate Silver's model predicts a total of ~257 electoral votes, but if you used your accuracy measurement of individual states, you would 'expect' a result of only ~239 electoral votes. That's a 7% difference even while including all of the states which are basically certain. Presumably the difference would be much greater if only taking into account the close states, whose results are significantly correlated.

(I believe this data is accurate. I'm basing it indirectly on a Sportsbook Review article because I refuse to give Nate money. And maybe Nate actually *is* biasing his individual state predictions to give him the final result that he wants, but Manifold and Polymarket obviously cannot do this. Consider it more for demonstration purposes.)

The metric of interest is often not directly optimized for in the modeling. Do we compare LLM's based on how they accurately predict the next word or downstream and productive tasks like writing, coding, reasoning? For the purpose of this market, the idea is to see who comes closest to final electoral votes.

To your point about using individual states vs the full model output, you're right that Nate and 538 both have explicit EV predictions across their simulations, but Manifold and Polymarket don't have liquid EV markets, so state forecasts are used to infer them for consistency.

There are more than 2 options so the more middle of the road options will find themselves sandwiched between 2 other options while overconfident models will get all of the blow out senerios in addition to their guess. Technically log loss probably would be the best way to determine who did best.

@PlainBG So when you have multiple competitors (e.g. in a kaggle contest with many teams), having the most accurate model will still get you the best score in expectation. But being overconfident trades expected value for variance, and once you have more than two teams having more variance increases your odds of winning

(At the extreme, imagine 2^50 teams - they could each guess one possible outcome with 100% certainty and one would win while the others would all score negative infinity).

This tradeoff gets better for the variance side (i.e. you should be more overconfident) the more competing models there are, and worse the more predictions there are (although this gets complicated with correlated predictions). Not sure what the optimal amount of overconfidence for four teams is (I'd guess not high, but otoh there aren't that many uncorrelated predictions involved in predicting states). The actual calculation is pretty gnarly.

I understand the argument for using log-loss since it is commonly used for performance eval. This is just sum/mean absolute error, which is fine for following reasons:

Only swing state predictions are considered: overly confident odds given by any of the forecasts are rare

The outcome of interest is not the individual state predictions, but the total electoral vote. I'm defining the latter to be the benchmark so the state predictions are just intermediate to that

My definition is more intuitive and relevant than log-loss which is too technical for the purpose of "who is best at predicting EV"