Duplicate of this old market, with some alterations.
My current estimate of P(Doom) is 1%. I think that ASI will be invented in the near future, but that alignment-by-default (in Yud's terms) is very probable, with actual misalignment taking the form of catastrophic risk from empowered threat-actors or by leading to sub-optimal value lock-in. I have read HPMOR and enjoyed it and am fairly familiar with Yudkowky's arguments for doom. But I also found Will Macaskill's review compelling.
Any increase counts. For the "Increases___" questions, I'm using my current average estimates so that I can be easily swayed one way or another.
I don't expect outside events or arguments to affect my P(doom) before I finish reading the book, but if they do, I'll attempt to disentangle the effects from the book.
If commentators want to defend/criticize the arguments as presented in the book, I will consider this relevant.
And while I will not bet myself, betters are welcome to ask for updates as I complete chapters. I will try to finish it by 6/16/2026, but I'll resolve earlier if I can.
🏅 Top traders
| # | Trader | Total profit |
|---|---|---|
| 1 | Ṁ22 | |
| 2 | Ṁ18 | |
| 3 | Ṁ9 | |
| 4 | Ṁ5 | |
| 5 | Ṁ4 |
Given the size and accessibility of the book, it isn't too surprising that there weren't many new arguments or evidence. But nonetheless, I hoped for at least some degree of attention to stronger criticisms. For instance, on why exactly they rule out catastrophic harms and claim 0 corrigibility rather than imperfect corrigibility, leaping straight to IABIED. I am convinced that such an ASI could exist, but as to why it is the most plausible, there is little attention given. Other than a few brief comments on the online notes, they don't grapple with this. I am fully on board with the fragility of value thesis and with the immense difficulty of perfectly specifying human values and motivating accordingly. But they leap to assuming ASI would aim to paper clip the entire lightcone. In general, while they admirably drive home the refrain about non-humanness and the alienness of purpose, they still have an awkwardly human presumption of how minds work. That's why we get the long "extinction scenario" story. What is the ASI doing in it? Plotting, like Dr. Evil in his secret volcano lair. A simple extrapolation of current AIs doesn't point to something converging on a human-like mind, or even a survival driven animal-like mind. Relatedly, the book is weak on justifying latent "goals". That having drive and agency is of economic value does some work argumentatively. But not much. Tons of things are of potential economic value!
On the positive side, I think some of the parables where good, if not new to me. "Humanity's Special Power", explaining the power of intelligence and the "Correct Nest Aliens" were strong. "We'd Lose" was also strong, as well as the Stockfish analogy. And they convincingly picked apart many commonly espoused weak arguments. The outline of successful forecasting in the intro, how they can claim to predict anything, was alright - I just wish they applied it with more care and less hubris. Taking the outside view isn't just something for other people to do.
There is very little on timelines. And the treaty section is relatively sparse in this regard too. They also have a peculiar overselling of doom when justifying the urgency for complete global compliance, saying "if stays legal in South Africa, someone will do it in South Africa" and etc. In principle I understand the reasoning, but it is weird they don't bother clearly conditioning this on the particular timeline/TAI scenario that make South Africa or North Korea even remotely plausible candidates for developing ASI - compute constraints are not trivial! And they seem to disregard the possibility that just a US, UK and China treaty would be sufficient. Perhaps because they put such little confidence in alignment being solved even then.