This year, I have gained a lot of optimism for AI safety, mostly due to finding various deploy-time interventions highly promising.
By EOY 2023, will I believe my current (at time of market creation) optimism to have been (at least one of) naive, miscalibrated, or wrecked by new evidence?
This market led to some fascinating conversations that I’m deeply thankful for.
I think I understand the arguments for doom better now than when I created this market. I think I could (>60% probability?) pass ITT with someone who holds the tradition MIRI worldview and accompanying high p_doom. I continue to take the MIRI concerns seriously, but they far from dominate my worldview.
I find myself just as optimistic about humanity’s capacity to retain control of our destiny as I was when I created this market.
I feel especially optimistic about:
prosaic approaches (oversight techniques, probing, WTSG)
preparedness work, RSPs, control, governance
progress in interpretability
My credences in typical MIRI scenarios are substantial, but now mostly below 50%. A lot of my early hesitance wrt the MIRI arguments, which I initially suspected to be borne of ignorance, I now feel more confident in.
I am still working on improving my model of the alignment problem. I continue to be excited about many threads involving threat modeling and predicting FOOM.
I remain excited about having conversations about any and all of these topics, and plan to actively seek them out in 2024.