This year, I have gained a lot of optimism for AI safety, mostly due to finding various deploy-time interventions highly promising.
By EOY 2023, will I believe my current (at time of market creation) optimism to have been (at least one of) naive, miscalibrated, or wrecked by new evidence?
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ26 | |
2 | Ṁ20 | |
3 | Ṁ11 | |
4 | Ṁ10 | |
5 | Ṁ9 |
This market led to some fascinating conversations that I’m deeply thankful for.
I think I understand the arguments for doom better now than when I created this market. I think I could (>60% probability?) pass ITT with someone who holds the tradition MIRI worldview and accompanying high p_doom. I continue to take the MIRI concerns seriously, but they far from dominate my worldview.
I find myself just as optimistic about humanity’s capacity to retain control of our destiny as I was when I created this market.
I feel especially optimistic about:
prosaic approaches (oversight techniques, probing, WTSG)
preparedness work, RSPs, control, governance
progress in interpretability
My credences in typical MIRI scenarios are substantial, but now mostly below 50%. A lot of my early hesitance wrt the MIRI arguments, which I initially suspected to be borne of ignorance, I now feel more confident in.
I am still working on improving my model of the alignment problem. I continue to be excited about many threads involving threat modeling and predicting FOOM.
I remain excited about having conversations about any and all of these topics, and plan to actively seek them out in 2024.