Are AISafetyMemes' tweets factually correct?
26
1kṀ8873
Mar 15
21%
chance

I'll randomly sample an original post from AISafetyMemes on twitter that was released in 2024 and is still up and go into detail about the each of the claims made.

If the tweet makes no verifiable claims of interest, then I'll resample.
Then I'll resolve this market to "yes" if I find no substantial falsehood.

This might become vibes-based, but I will default to the literal interpretation of the words and I won't be eager to mark predictions as false if AISafetyMemes flagged those as a predictions or speculation.


Not only is the point of this market to evaluate AISafetyMemes, but also to evaluate our community (or the part that visits manifold) on whether they unfairly state that AISafetyMemes is "right" or "wrong".

And, be encouraged to duplicate my market, but with you, or some poll, as the judge, so any conversation about the account can become better guided.

Reminder that I will only sample a single tweet and the resolution has zero say about the unsampled tweets. Most of the interestingness here is to be gained from the number the traders come up with before resolution.

  • Update 2025-15-01 (PST) (AI summary of creator comment): - Only original tweets will be sampled; replies are excluded except when adding context to the original tweet.

    • Sample size considerations: While a larger sample size (e.g., 5 or 10 tweets) is acknowledged as beneficial for engagement, the current market will not change the resolution criteria mid-market. Future markets may adopt a larger sample size.

  • Update 2025-02-27 (PST) (AI summary of creator comment): Update from creator

    • Literal Evaluation: Claims will be assessed very literally. For example, if AISafetyMemes says "guy said x" when in fact the tweet merely implied x, that will count as a false claim.

    • Verification Challenges: If verifying individual claims becomes too burdensome, the market may be resolved as N/A.

    • Resolution Postponement: Resolution is postponed to allow a thorough review of the individual claims.

Get
Ṁ1,000
to start trading!
Sort by:

sorry im postponing to actually resolve this

i want to give the individual claims a good look and i noticed it was not trivial to verify.

i will be very literal about everything, so aism saying "guy said x" when guy merely implied x will count as a false

if stuff becomes too annoying to verify then ill resolve N/A and non-seriously complain about a tweet not citing sources (not-seriously because we do not expect multiple sources in the same tweet)

@Jono3h sorry its taking so long, especially considering how i originally planned to have this resolve within a week.

I'll make a good effort to resolve this by mid march

Too late to trade apparently, but I’m really curious how anyone could see this tweet as false. How would you put in concrete terms what Ben said if not the way AISM said it?

@HollyElmore Ben said there’s a 30% chance that Claude could be finetuned in a way that would qualify it for ASL-3 (and by implication, increase the risk of catastrophe), AISM said Ben said there’s a 30% chance that Claude could be finetuned in a way that would actually cause a catastrophe (not just increase the risk of catastrophe, and not just qualify it for ASL-3).

The speaker in the video is intentionally speaking in a circuitous way.

The speaker in the video said that there is a 30% chance the current Claude model could be fine tuned to qualify as ASL-3. ASL-3 is defined as a heightened probability of catastrophic risk. Catastrophic risk is defined as causing thousands of deaths or billion in damage.

AISafetyMemes is simply connecting these dots that Anthropic defined. The Anthropic cofounder is intentionally not connecting his own dots .

I would argue this is not a “substantial falsehood” which was the criteria of the bet. None of the logical links were substantially false.

@AlvaroCuba AISafetyMemes claimed that according to an Anthropic cofounder, there’s a 30% chance that a finetuned Claude would cause catastrophic misuse. That is not implied. Even under the most generous interpretation, you can only infer that there’s a 30% chance that a finetuned Claude would heighten the probability of catastrophic misuse (which could mean from 1% to 10%, but has no direct relationship to the 30% number).

EDIT: the original version of this comment wrongly used the phrase “catastrophic risk”, which I have now changed to “catastrophic misuse” to align with how Anthropic defines ASL-3, as quoted in the next comment.

@CharlesFoster Yes, I think you're right that there's a slight inaccuracy there.

ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.

So a 30% chance of falling into that category does not exactly equal a 30% chance of those things happening.

I'll resolve this market to "yes" if I find no substantial falsehood.

I don't think this is a substantial falsehood, so much as a truncation that led to an inaccurate simplification. But either way this goes, it's splitting hairs. Most of AI Safety Memes tweets are basically just true facts, or slight simplifications like these that don't substantially change the conclusion.

@Haiku If your doctor tells you “I think it’s 30% likely that if you go skydiving tomorrow it will substantially increase your risk of dying in 2025.” and you report that as “Doctor says there’s a 30% chance that if I go skydiving tomorrow then I will die in 2025.” I think that would be a substantial misrepresentation, because the doctor did not claim that nor was it implied by what the doctor did say.

The issue I’m pointing at in the specific AISafetyMemes post under consideration is structurally identical to the above.

And to be clear, I don’t think anyone should interpret the closing 11% number as an overall assessment of how factually accurate AISafetyMemes is. I think the overall number is way, way higher (>50%, and probably more like 70%+). But unfortunately, the market remained open for trading after the chosen post was revealed, which let folks trade based on the claims in that specific post and not just on general accuracy.

@CharlesFoster Yeah, I have to admit you're right. The degree of misrepresentation is subjective, but the misrepresentation itself does exist.

https://x.com/AISafetyMemes/status/1819968588788429226
alright, here is the sampled post. I initially didn't save the link so I had to sift through my browser history to recover it.

Anthropic founder: 30% chance Claude could be fine-tuned RIGHT NOW to replicate and spread on its own - and cause “thousands of deaths or hundreds of billions in damage”

AND a 30% chance it could cause chemical, biological, or nuclear catastrophes!

Sorry, but how the fuck is this industry still unregulated?? There are still more regulations on selling a sandwich!

How many warning signs do we need to see?

I wish I could at least look forward to the coming I Told You So moment, except we might not even get one.

We really might just all drop dead one day -- coups are typically lightning fast.

I'm mostly just living in constant state of disbelief at how this isn't the only story on the news.

If any executive in any other industry said anything like this, it would be the only story.

Note: major props to Ben Mann for saying this publicly - very, very important for transparency!

filled a Ṁ150 NO at 47% order

@Jono3h

30% chance Claude could be fine-tuned RIGHT NOW to replicate and spread on its own - and cause “thousands of deaths or hundreds of billions in damage” AND a 30% chance it could cause chemical, biological, or nuclear catastrophes!

This is not a valid interpretation of these evaluation results.

@CharlesFoster It is an accurate quote of Ben Mann, which makes that part of the tweet factually correct.

@Haiku No, if you watch the linked video of Ben Mann you will see that what AISafetyMemes posted is not an accurate quote from him, it is more like a paraphrase, and it is not an accurate paraphrase.

@CharlesFoster

Here are exact quotes of the relevant sections from the linked video:

[…]

In our latest round of testing Claude 3, we focused on three areas that could pose catastrophic risks if an AI were to master them: knowledge about dangerous chemical and biological agents (or CBRN), the ability to hack into secure systems, and the potential to autonomously replicate and spread.

[…]

While Claude 3 demonstrated increased capabilities compared to its predecessor, our team determined that it did not cross the threshold into ASL-3 territory. However, we did note that with additional fine-tuning and improved prompt engineering, there is a 30% chance that the model could have met our current criteria for autonomous replication, and a 30% chance it could have met at least one of our criteria related to Chemical, Biological, Radiological, and Nuclear (or CBRN) risks.

[…]

The most intriguing test was for autonomous replication. This is our attempt to see if the AI could start to act on its own, without human guidance. We set up a suite of tasks that included things like breaking out of a digital sandbox (a controlled environment), manipulating its environment through a command line (like typing commands into a computer), and even writing self-modifying code.

Here, Claude 3 showed some real progress, solving parts of these tasks. The model had to reach a 50% success rate on the tasks to raise serious concerns, and Claude did not reach that level. However, its performance was notable enough for us to take notice.

[…]

@Jono3h the speaker in the video is intentionally speaking in a circuitous way.

The speaker in the video said that there is a 30% chance the current Claude model could be fine tuned to qualify as ASL-3. ASL-3 is defined as a heightened probability of catastrophic risk. Catastrophic risk is defined as causing thousands of deaths or billion in damage.

AISafetyMemes is simply connecting these dots that Anthropic defined. The Anthropic cofounder is intentionally not connecting his own dots .

I would argue this is not a “substantial falsehood” which was the criteria of the bet. None of the logical links were substantially false.

@AlvaroCuba idk how this is not the obvious interpretation. What do the rest of you think he’s leaving out?

Replied to the top-level comment from @AlvaroCuba.

hi, excuse the delay, I'll get to it. soon. I guess I'll re-open the market for another while

I'll post the sampled tweet soon, because I'm curious what you think of it.
But after having posted it, I won't visit Manifold anymore (or read manifold notifications) before having made my own judgement on the matter.

@Jono3h how soon are we talking about?

@harfe sorry I got too lazy. I shouldn't have made the promise in the first place

Is the sample any tweet i.e. could it be a reply? And why not make the sample slightly larger, like 5 or 10 tweets. Even if it doesnt change the expected value, I believe it could make more people likely to engage with this bet.

@AlvaroCuba No replies, except to add context to their original tweet.

I think you're right in the larger sample size. I don't want to change the resolution mid-market but in the future I will.

@Jono3h This is the kind of thing I see a lot: https://x.com/AISafetyMemes/status/1879938756334977117?t=jjmO7bL4Gm-uUxfBqfXu1A&s=19

It's just quotes & they're reported accurately.

Related questions

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules