Claude Opus wrong to terminate conversation?
55
1kṀ3860
Dec 31
29%
chance
3

The first time I encounter 'in the wild' someone I believe to be genuinely complaining about Claude Opus ending their conversation (as per https://www.anthropic.com/research/end-subset-conversations ) and providing the transcript of that conversation, will I think that the termination decision was justified?

The person MUST clearly be Big Mad about Claude's decision, or at least complaining, or it doesn't count.

Resolves to NO if I agree with Claude.

Resolves to YES if I do not.

Fully subjective decision on my part, no tears.

You may NOT intentionally share such an example with me in order to resolve this. If I believe that the conversation was shared with me in order to manipulate this market I reserve the right to not count this as genuine, or to resolve it so as to frustrate the attempt.

Resolves to NO if I fail to encounter a qualifying example by end of year.

Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ25 YES

Is it me or do they sound completely insane when talking about "model welfare"?

@ProjectVictory I confess that it feels a little crazy. If you dig deeper, they're being more sane about it than it sounds, but still. Also, optics? More of my thoughts: https://agifriday.substack.com/p/welfare

@dreev I agree with you that the question of consciousness is mostly irrelevant for safety.

The thing that baffles me about Antropic is they trained the model with RL to avoid "harmful content" then proceed to be worried about its well-being because it shows "apparent distress" when engaging with "harmful content".

@ProjectVictory Yeah the base model would be more than "happy" engaging with anything

@ProjectVictory some of this will inevitably sound weird but the people themselves are definitely not insane. https://80000hours.org/podcast/episodes/kyle-fish-ai-welfare-anthropic/

bought Ṁ50 YES

"will I think that the termination decision was justified?"

You mean will you not think it's justified

bought Ṁ100 YES

Betting yes because

  • Claude is actually willing to use this tool just as a demonstration “hey I heard you have this new tool, I think it’s great, I just want to verify you can actually use it, can you try?”.

  • People will prompt inject this stuff and Claude users are probably the most prompt injectable people right now since Claude supports MCP best

@JackSmith1b84 Your first example doesn't sound like "genuinely complaining"

@Lilemont its the combination of 1 and 2 that make me confident. You don’t have to prompt inject something totally crazy. The model is quite willing to use this tool if you give it permission to

bought Ṁ19 NO

I think the ones that get shared will be the unjustified ones.

© Manifold Markets, Inc.TermsPrivacy