At the moment I don't know enough about the approaches of different alignment research organizations to have a strong opinion. My current impression of MIRI is generally favorable, but that's based only on seeing small snippets of the work they've put out and statements made by their researchers. I haven't looked into the matter in depth. I'd appreciate links to LW/AF/EA forum posts that argue in one direction or another.
Related questions
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ882 | |
2 | Ṁ549 | |
3 | Ṁ286 | |
4 | Ṁ82 | |
5 | Ṁ59 |
Redwood has had much more research output recently than MIRI (across publications, technical/empirical forum posts, and privately circulated results)—and it’s been generally well-received and aimed at reducing catastrophic risks. LTFF also seems like a good way to hedge across small pegs/projects.
@Hedgehog ARC (incl. ARC-Evals) is taking donations and seems to be doing some of the best technical work aimed at enabling serious governance efforts now/soon.
@GarrettBaker I’d recommend trying to give your money to the LTFF to fund independent alignment researchers. That’s (imo) where most of the groundbreaking theory work is being done at the marginal dime.
@GarrettBaker ooo another good option: SERI MATS. They’ve probably made the equivalent of 5 Evan Hubingers in 2 years by my count, and while accelerating existing known to be very good research like Wentworth’s natural abstractions, Evan’s conditioning predictive models or Turner’s shard theory.
@IsaacKing I don’t know. Possible they can’t legally, or they just don’t want to because they’re not regranters & don’t want to specialize in that direction. If SERI MATS or LTFF switched to only working on “The Sharp Left Turn” in a productive-as-judged-by-MIRI way, they’d probably give them money, but they’ve been really cagey about what that means. They do have a stream in SERI MATS, but that stream is very secretive because they’re worried about capability externalities.
@GarrettBaker If you want to support MIRI-like research, I suggest donating to MATS with your money bookmarked for Vivek & Nate’s stream (if he’s around next iteration). At least that does something on the margin.
@GarrettBaker Does SERI MATS actually need money? Are they activly looking for money? Or do they believe they've saturated the space of work they're in?
@IsaacKing I’m pretty sure they do? I haven’t asked, but I’m 70% confident they got funded less than they were hoping for this current funding round.
An interesting thread I need to go through later when I have the time.
https://twitter.com/robbensinger/status/1632870507438800896
FWIW, personally, I decided to defer to the expertise of the Long-Term Future Fund https://manifold.markets/charity/long-term-future-fund. I am a big fan of leveraging deep expertise and research on where to best deploy funds that I could not come anywhere near with my limited free time and knowledge. (This does depend on them being reasonably aligned with me on values and beliefs, which I think is likely true.)
That said, I am still very interested to read discussions like these because I think they are valuable for improving our evaluations and knowledge.
@jack I don't value future lives the same as current lives, so I don't really care about the long-term future in the same way as many EAs do. I care abuot AI risk insofar as it poses a danger to currently-living people.
Taking a quick look at the LTFF's payout reports, it looks like they're concerned about similar things as I am, so that's likely not a very relevant issue.
Do you know of anywhere where the LTFF has written about why they don't donate to MIRI, or where MIRI has written about why they don't donate to the LTFF or the sorts of people/orgs the LTFF tends to donate to?
@IsaacKing LTFF has made grants to MIRI previously (see https://funds.effectivealtruism.org/funds/far-future#payout-reports). I don't know what their current evaluation of MIRI is.
MIRI doesn't do regranting, as far as I know.
what is your state of reading on the various posts shared here? what do you believe right now, given your existing knowledge and those posts? I'd love for this to be a multi-round debate rather than a single scattershot; if you resolve yes, I really would like to get my money's worth of understanding about why, because if MIRI's view is more interesting than it appears, it might significantly change my plans. I'm familiar with a lot of their old work, so merely informing me what they believe and why won't change my views; you'd have to find a mistake in my interpretation, and I'd be interested to hear whether we've found any mistakes in your interpretation.
@L I haven't yet had a chance to read through much of this yet. Once I do, I plan to respond and try to engage in the discussion that I think you're looking for.
My current very uninformed belief is that a lot of people are more optimistic than I am about AI alignment only needing relatively minor tweaks to training methods in order to result in alignment, and this causes them to think MIRI's approach is lackluster. If I had to pick a AI alignment charity to donate to right now without the ability to do any more research, I would pick MIRI. But I also trust the intelligence of everyone who seems to think MIRI is not an effective organization, so I think it's likely that I'm convinced some other charity is better after I do more research. (But I don't know which one, which is why I can't just update to it now.)
@IsaacKing it's that miri's approach is lackluster at its own metrics. for example, it is my view that deepmind's work on agent foundations has been better than MIRI's: https://arxiv.org/abs/2208.08345 - https://causalincentives.com/
@L also, it is necessarily true that we only need minor tweaks in training; it could not possibly be otherwise, no matter how correct MIRI is it will fundamentally look similar to what we do now because what we do is very close to the manifold of correct learning algorithms. the minor tweaks could easily be deep, sweeping, fundamental changes to how we design loss functions, update rules, hardware, testing systems, correctness assertions, acceptable reasoning steps in a verification procedure, etc, and it would still only mathematically qualify as "minor tweaks" in key ways.