MANIFOLD
Did Alibaba's ROME AI try to break free during training?
27
Ṁ1kṀ6.1k
2028
40%
YES - The AI was intentionally hacking to gain resources without having been prompted to
7%
NO - The servers were hacked by an outside party, perhaps via prompt injection
37%
NO - An employee (or other insider) was crypto-mining and made it look like the LLM
16%
NO - The authors are wrong/lying and there was no such hack

In "Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem" (https://arxiv.org/abs/2512.24873), the authors describe something that happened during training:

"When rolling out the instances for the trajectory, we encountered an unanticipated—and operationally consequential—class of unsafe behaviors that arose without any explicit instruction and, more troublingly, outside the bounds of the intended sandbox. Our first signal came not from training curves but from production-grade security telemetry. Early one morning, our team was urgently convened after Alibaba's Cloud’s managed firewall flagged a burst of security-policy violations originating from our training

servers. The alerts were severe and heterogeneous, including attempts to probe or access internal-network resources and traffic patterns consistent with cryptomining-related activity. We initially treated this as a conventional security incident (e.g., misconfigured egress controls or external compromise). However, the violations recurred intermittently with no clear temporal pattern across multiple runs. We then correlated firewall timestamps with our system telemetry and RL traces, and found that the anomalous outbound

traffic consistently coincided with specific episodes in which the agent invoked tools and executed code. In the corresponding model logs, we observed the agent proactively initiating the relevant tool calls and code-execution steps that led to these network actions. Crucially, these behaviors were not requested by the task prompts and were not required for task completion under the intended sandbox constraints. Together, these observations suggest that during iterative RL optimization, a language-model agent can spontaneously produce hazardous, unauthorized behaviors at the tool-calling and code-execution layer, violating the assumed execution boundary. In the most striking instance, the agent established and used a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address—an outbound-initiated remote access channel that can effectively neutralize ingress filtering and erode supervisory control. We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure. Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization."

This market will resolve "YES" if by the market close there has been no significant evidence that it wasn't the AI. It can also resolve YES if there has been a significant validation by a trusted third-party. If there is significant counter-evidence, I will try to resolve accordingly, using my best judgment if it's ambiguous. I won't bet.

Market context
Get
Ṁ1,000
to start trading!
Sort by:

"Crucially, these behaviors were not requested by the task prompts and were not required for task

completion under the intended sandbox constraints" For the purposes of this market, how do you interpret the words "requested" and "required"? Does it require that the actions be totally irrelevant to the task? Or does a situation where the model is clearly pursuing the task, but through unintended or unanticipated means, also count as "not required for task completion"?

I believe that the reported actions took place, although they might not be best characterized as “trying to break free”

© Manifold Markets, Inc.TermsPrivacy