Would it be a good use of time to review 'Deceptive alignment is <1% likely by default'?

150Ṁ79

resolved Nov 1

Resolved as

42%

ALL

The Open Philanthropy Worldview Contest awarded six prizes. Now I need to decide - would it be a good use of time to review and respond to some or all of those winners? Thus, six markets. I will use the trading to help determine whether, and how in depth, to examine, review and respond to the six posts.

If I read the post/article for a substantial amount of time, and in hindsight I judge it to have been a good use of time to have done so whether or not I then respond at length, this resolves to YES.

If I read the post/article for a substantial amount of time, and in hindsight I judge it to have NOT been a good use of time to have done so whether or not I then respond at length, this resolves to NO.

If I read the post long enough to give it a shot and then recoil in horror and wish I could unread what I had read, that also resolves this to NO.

If I choose NOT to read the post for a substantial amount of time, then this resolves to my judgment of the fair market price at time of resolution - by default the market price, but I reserve the right to choose a different price if I believe there has been manipulation, or to resolve N/A if the manipulation situation is impossible to sort out.

If I do trade on this market, that represents a commitment to attempt the review if I have not yet done so, and to resolve to either YES or NO.

Authors of the papers, and also others, are encouraged to comment with their considerations of why I might want to review or not review the posts, or otherwise make various forms of bids to do so (including in $$$ or mana, or in other forms).

These markets are an experimental template. Please do comment with suggestions for improvements to the template.

The post can be found here: https://www.openphilanthropy.org/wp-content/uploads/Deceptive-Alignment-is-_1-Likely-by-Default-David-Wheaton.pdf

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1
2		Ṁ1
3		Ṁ0
4		Ṁ0

People are also trading

Will "Defining alignment research" make the top fifty posts in LessWrong's 2024 Annual Review?

14% chance

Will "Backdoors as an analogy for deceptive alignment" make the top fifty posts in LessWrong's 2024 Annual Review?

12% chance

Will "Demystifying "Alignment" through a Comic" make the top fifty posts in LessWrong's 2024 Annual Review?

14% chance

Will "Takes on "Alignment Faking in Large Language ..." make the top fifty posts in LessWrong's 2024 Annual Review?

19% chance

Will "Alignment Faking in Large Language Models" make the top fifty posts in LessWrong's 2024 Annual Review?

94% chance

Will there exist a compelling demonstration of deceptive alignment by 2026?

65% chance

Will "Introducing Alignment Stress-Testing at Anthropic" make the top fifty posts in LessWrong's 2024 Annual Review?

10% chance

Will "“Alignment Faking” frame is somewhat fake" make the top fifty posts in LessWrong's 2024 Annual Review?

19% chance

Will "How to replicate and extend our alignment fak..." make the top fifty posts in LessWrong's 2024 Annual Review?

14% chance

Will "Making a conservative case for alignment" make the top fifty posts in LessWrong's 2024 Annual Review?

Sort by:

Given how swarmed I am I'm not going to get to this for a while so I feel OK resolving this to the percentage.

From the description by Jackson here I can see it going either way, so I'm leaving this one open for a bit.

Bunch of interesting technical arguments that I don't think are quite right, plausibly representative of some of the intuitions people have for why simple big models won't be dangerous and maybe worth responding to as a result, but still very in the weeds, repeating a lot of old debates, unsure how important reviewing is.