Will significant evidence emerge that Alibaba's ROME AI tried to break free?

Question

The market @/MaxHarms/did-alibabas-rome-ai-try-to-break-f asks whether an incident described in a paper from Alibaba, in which an AI attempted to mine crypto and subvert human oversight without being prompted to, really occurred as stated. See this excerpt from the resolution criteria:

This market will resolve YES if by the market close there has been no significant evidence that it wasn't the AI. It can also resolve YES if there has been a significant validation by a trusted third-party.

YES resolution: This market resolves YES if the incident is significantly validated by anybody. Unlike the other market, it counts if the incident is significantly validated by Alibaba researchers themselves. "Significant validation" means that someone takes another look at the data from the incident, they publicly release new findings that weren't included in the original paper, and these findings generally support that the incident occurred as stated.

NO resolution: If nothing else happens by March 13, 2028, this market resolves NO by default. If the other market resolves NO before mine resolves YES, mine also resolves NO.

Motivation: I think the default outcome of the above market is that no additional evidence will come out for or against the incident, and it will resolve to YES. But this doesn't say much about whether or not the incident actually occurred. The paper was released 3 months ago; for all we know the authors have already deleted the logs, and it seems likely that they won't bother to look into it further.

I'll try to use similar standards to @MaxHarms for what kind of incident counts as "trying to break free." See this paragraph from the other market:

To clarify my resolution criteria, if a human was hacking their servers, perhaps by exploiting the AI, then this resolves NO and splits based on whether it was an inside job. If it was not a deliberate, human-driven hack, then this resolves YES iff the situation broadly matches the narrative provided by the authors, especially that this was a spontaneous, unprompted behavior. If there are significant details, such as the inclusion of lots of (positive/rewarded) crypto mining/hacking examples in the training data, which were left out of the paper (thus making it look more like instrumental convergence) then I will likely resolve NO (wrong/lying). (Some examples are allowed, as long as they're part of the standard ocean of data that resembles how other models get trained.)

Manifold Markets · Answer

Unlikely — Manifold Markets prediction market estimates a 12% chance (35 traders, as of Mar 22, 2026).

People are also trading

People are also trading

Related questions