Why I will stop sports betting

May 20, 2025

After spending a lot of time with Gemini in Collab I've managed to analyze my performance thus far on Manifold Markets. I originally came to Manifold as a way to bet on sports without having to put up real money but I think the time has come to shelve that completely.

I separated Sports from Non-Sports via a simple keyword list so it's probably not perfect. (You can see the full code over here That too may not be perfect and I'd appreciate feedback if there's any material mistakes.)

The results are:

Forecaster Analysis for: Fergus Argyll (from file: combined_forecaster_data.json) ---

Total bets in file: 1122

User bets processed for scoring: 706
Processed sports-related bets: 306
Processed non-sports-related bets: 400
Reasons for skipping user bets:
Bet not filled: 94
Cancelled or redemption: 5
Market unresolved: 309
Missing binary resolution: 8

OVERALL USER ANALYSIS (ALL BETS)

Metrics for User - Overall Performance (706 points)

Average Brier Score: 0.2131 (RMSE: 0.4616)
Average Log Score: -0.6245
Average MAE: 0.4011
Accuracy (observed frequency of the user's chosen outcome): 39.38% (278/706)
Mean Forecasted Prob (chosen): 0.4789
Mean Observed Freq (chosen): 0.3938
Calibration-in-the-large: +0.0851
Average Calibration Error (ACE): 0.0851

USER SPORTS BETS ANALYSIS

Metrics for User - Sports Bets Performance (306 points)

Average Brier Score: 0.2467 (RMSE: 0.4967)
Average Log Score: -0.7179
Average MAE: 0.4511
Accuracy (observed frequency of the user's chosen outcome): 25.49% (78/306)
Mean Forecasted Prob (chosen): 0.4827
Mean Observed Freq (chosen): 0.2549
Calibration-in-the-large: +0.2278
Average Calibration Error (ACE): 0.2278

USER NON-SPORTS BETS ANALYSIS

Metrics for User - Non-Sports Bets Performance (400 points)

Average Brier Score: 0.1873 (RMSE: 0.4328)
Average Log Score: -0.5531
Average MAE: 0.3629
Accuracy (observed frequency of the user's chosen outcome): 50.00% (200/400)
Mean Forecasted Prob (chosen): 0.4760
Mean Observed Freq (chosen): 0.5000
Calibration-in-the-large: -0.0240
Average Calibration Error (ACE): 0.0623

Gemini's analysis

1. SPORTS BETS ANALYSIS (306 points):

Brier Score (User & Market): 0.2467
- This is a relatively high (poor) Brier score, very close to 0.25 (chance). This indicates that sports outcomes, as predicted by Fergus, were very difficult to predict accurately.
User Accuracy: 25.49% (78/306)
- This is very low. On sports bets, his chosen outcomes were correct only about a quarter of the time.
User Calibration-in-the-large: +0.2278 (and ACE: 0.2278)
- This is a massive level of overconfidence. When Fergus makes a sports bet, he forecasts his chosen outcome to be, on average, ~22.8 percentage points more likely than it actually turns out to be.
- The ACE being identical means this overconfidence is quite consistent across probability bins for sports.
User Mean Forecasted Prob (chosen): 0.4827
User Mean Observed Freq (chosen): 0.2549
- He's forecasting close to 50/50 on average for his chosen sports outcomes, but they are only happening ~25% of the time.

Interpretation of Sports Bets:
Fergus performs poorly on sports bets.

He is extremely overconfident.
His accuracy is very low.
The Brier score indicates that he found these specific sports outcomes very hard to predict (approaching random chance for binary outcomes).

2. NON-SPORTS BETS ANALYSIS (400 points):

Brier Score (User & Market): 0.1873
- This is a much better (lower) Brier score than for sports, and significantly better than 0.25. It indicates that non-sports outcomes (or at least those Fergus bet on) were more predictable.
User Accuracy: 50.00% (200/400)
- Exactly 50% accuracy. This is what you'd expect if you were perfectly calibrated and always betting on outcomes you thought were 50% likely, OR if you were betting across a range but your errors balanced out.
User Calibration-in-the-large: -0.0240
User Average Calibration Error (ACE): 0.0623
- This is excellent calibration! A CIL of -0.0240 means he's very slightly underconfident on average for non-sports bets (forecasting outcomes as 2.4% less likely than they happen).
- An ACE of 0.0623 is very respectable. Many good forecasters aim for ACE < 0.05, so he's not far off in this domain.
User Mean Forecasted Prob (chosen): 0.4760
User Mean Observed Freq (chosen): 0.5000
- He forecasts his chosen non-sports outcomes at an average of ~47.6%, and they happen 50% of the time. This aligns perfectly with the slight underconfidence shown by the CIL.

Interpretation of Non-Sports Bets:
Fergus is a completely different forecaster when it comes to non-sports topics.

He is well-calibrated, with only a slight tendency towards underconfidence.
His accuracy is 50%, which, combined with good calibration, is a solid performance.
The Brier score indicates reasonable predictability for these events.

3. OVERALL ANALYSIS (706 points - a mix of the two):

Brier Score (User & Market): 0.2131
- This is an average of the poor sports Brier (0.2467) and the good non-sports Brier (0.1873), weighted by the number of bets.
User Calibration-in-the-large: +0.0851 (and ACE: 0.0851)
- The significant overconfidence from sports bets (+0.2278) is pulling the overall calibration into overconfident territory, despite his good calibration (slight underconfidence of -0.0240) on a larger number of non-sports bets.

Grand Conclusion for Fergus Argyll:

Fergus Argyll exhibits a stark Jekyll and Hyde forecasting persona:

On NON-SPORTS topics: He is a good, well-calibrated forecaster. His probabilities are meaningful, his accuracy is decent, and he shows only a very slight, almost negligible, underconfidence. His Brier score is respectable. This is where his true forecasting skill appears to lie.
On SPORTS topics: He is a poor, highly overconfident forecaster. His accuracy is low, and his probabilities are significantly misaligned with reality. These bets harm his overall aggregated scores considerably.

Google Colab

Dong Bahringer

(edited)

I’ve decided to stop sports betting because it’s become more stressful than fun. The constant pressure to predict outcomes took a toll on my mental health and finances. While I used to enjoy the thrill, I’ve realized it’s not worth the long-term risks. Instead, I highly recommend it for anyone seeking a safer, more enjoyable gaming experience.

Steve Clark

(edited)

Sports betting can be a slippery slope, especially when it shifts from fun to something that causes stress or financial strain. It's good that you're aware of the impact and making a conscious decision—takes a lot of self-awareness.

That said, if you're still interested in the predictive side of sports but want something less about gambling and more about community and decentralization, you might find value here. It's a Web3 platform focused on decentralized sports predictions, and it feels a bit more transparent and tech-driven compared to traditional betting sites.

Aditya Gupta

@SteveClark LLM generated?

Noah 🔸

@AdityaGupta4830 look at those em dashes.... reeks of it!

Noah 🔸

This week after making a decent payday, (my largest ever?) on the NFL not banning the tush push I came to almost the opposite conclusion lol. Maybe the only thing i'm good at is forecasting sports, and the NFL specifically!

Ziddletwix

Brier score is essentially ~irrelevant for prediction markets, and I would highly discourage its use here.

brier score is designed for cases when you pick a probability. prediction markets are fundamentally different—the market sets a probability and you pick a side. if the market for two evenly matched teams is near 50% and you always pick the right side (perfect oracle) you are very profitable and your brier score is bad. if the markets are well calibrated near 10% and you keep picking the wrong side, your brier score will be good.

brier score is not designed for prediction markets, and should not be used. what determines success in prediction markets is profits. it has its issues like any metric (e.g. it's very noisy), but it's the ~only thing that actually meaningfully measures "performance" (in the sense that you should care about.

Fergus Argyll 👑

@Ziddletwix 100% agree and it's something I only really understood as I was coding this though I think there's enough data in this post (and my knowing deep down 😂 ) that justifies the conclusion

Ziddletwix

@Ziddletwix for a bit more context, this is slightly complicated when your prediction has a substantial impact on the market price. if i'm the only trader, and you look at the final price I bet the AMM to as my "prediction", then in that sense, you have made a probabilistic prediction and these metrics are relevant (brier score, accuracy, you name it).

in practice, any single user has only partial influence on the market price. your goal in a prediction market is to place bets that are profitable. that actually means (1) you should be aiming to find market prices that are poorly calibrated (a well calibrated market price implies no profit opportunity!), & (2) picking the right side (brier score does not actually reward you for being "right", i.e. the only thing that ultimately matters).

Ziddletwix

@FergusArgyll sure but i'm just picking brier score because it's the easiest thing to focus on. the same is true for all the metrics the AI is using. none of these are useful.

User Accuracy: at least this rewards being right (i assume, didn't check its coding), but if you bet on longshots in sports this always penalizes you. never a useful metric for prediction markets

calibration: similar issues as brier score—this assumes you set the market probability, when in prediction markets you take a probability that's given to you and pick a side. people love calibration and the manifold version of calibration (that separated out NO and YES bets) is at least sorta a bit of a measure of over/under-confidence in the extremes. but mostly again this just reflects the sorts of markets you bet on. if you only placed limit orders and you were perfectly calibrated, that would mean you don't make any profit, that's bad!

you can go down the list, none of these metrics are suitable for prediction markets. how to best aggregate profit by market is a trickier question, but it's the only measure of prediction market performance that matters

Fergus Argyll 👑

@Ziddletwix Then your calibration is equal to the market calibration? (unless you move the market a lot?) This was something I was confused about.

Then what explains markets being miscalibrated in sports?

Travis

The other problem with Brier/log score/etc is they should really only be used to compare models/forecasts of the exact same questions made at the exact same times (or with the same data) otherwise it is just as much a measure of the difficulty of those particular questions.

I would look at overall profit (by category) and if you're worried about a few big bets skewing things, then maybe average ROI.

Fergus Argyll 👑

@Ziddletwix @travis

I've followed your suggestions and the results are even more striking!

--- Profit/Loss Report ---
Sports Category Profit/Loss: -29255.47
Non-Sports Category Profit/Loss: 12408.55
--- Unresolved Investments ---
Invested in unresolved Sports markets: 2538.13
Invested in unresolved Non-Sports markets: 27911.54
--- Cancelled Bets (Refunded Amount) ---
Amount refunded from cancelled Sports markets: 4541.99
Amount refunded from cancelled Non-Sports markets: 1318.50

At least the Unresolved investments section means I'm heading in the right direction.
Especially because Manifold says my all time profit is ~18,500 which means a significant amount of it is in the unresolved markets

Ziddletwix

@FergusArgyll Makes sense! Although, at a glance, I'm a bit skeptical of the #s here? I could be totally wrong, but glancing at your profile, filtering trades to resolved, I'd be very surprised if the sports losses (even ignoring any sports wins) tallied to anywhere near -29k. Does that seem correct to you?

Fergus Argyll 👑

@Ziddletwix I figured it out (I think) the issue lies in multiple choice markets. From a quick scan it seems like "Resolution" doesn't tell you which answers resolved yes and which no, so it's giving me losses for every multiple choice market (which is where the majority of my sports betting was).

Blegh, I'll have to revisit this tomorrow.