Full question: Will a joint statement or consensus document be released by the official US-China intergovernmental AI dialogue (Track 1) specifically committing to a shared technical safety benchmark or evaluation framework by December 31, 2027?
Question Title
Will the US and China Release a Joint Statement Committing to a Shared AI Technical Safety Benchmark or Evaluation Framework by December 31, 2027?
Background
Artificial Intelligence (AI) safety governance has emerged as a rare area of potential cooperation between the United States and China despite broader geopolitical tensions. On May 14, 2024, the first Track 1 dialogue (official intergovernmental meeting) on AI was held in Geneva, where representatives from the US Department of State and the White House met with counterparts from the Chinese Ministry of Foreign Affairs and National Development and Reform Commission. While this meeting established a channel for exchanging views on risk, it did not produce a joint technical commitment.
By mid-2025, the landscape shifted following the release of "America’s AI Action Plan" under a new US administration, which emphasized US "dominance" in the AI sector while maintaining a pillar for "international diplomacy" to manage catastrophic risks. Concurrently, reports like the Oxford Martin School’s Promising Topics for US–China Dialogues on AI Safety and Governance (Siddiqui et al., 2025) argued that dialogues should move beyond abstract threat models toward "concrete governance mechanisms," such as shared technical standards for evaluating dangerous model capabilities (e.g., biological or cyber-offensive risks).
As of April 8, 2026, the Track 1 AI dialogue has faced periods of suspension and resumption, often held in the shadow of export controls and competitive AI breakthroughs. A commitment to a "shared technical safety benchmark" would represent a significant escalation of cooperation, moving from high-level rhetoric (like the 2023 Bletchley Declaration) to measurable, verifiable technical alignment.
Resolution Criteria
This question will resolve as YES if, between January 1, 2025, and 23:59 UTC on December 31, 2027, the governments of the United States and the People's Republic of China issue a joint statement, consensus document, or joint communiqué that includes a specific commitment to a shared technical safety benchmark or evaluation framework for AI.
For the purposes of this question:
Track 1 Dialogue is defined as formal, official negotiations and meetings between government officials representing their respective sovereign states.
Shared technical safety benchmark or evaluation framework refers to a specific, named set of quantitative tests, qualitative evaluation protocols, or red-teaming standards designed to measure AI model risks (e.g., model "red lines," capability thresholds for "frontier models," or safety evaluation suites).
Specificity Requirement: A vague agreement to "work toward safety" does not count. The document must reference a specific framework or a commitment to co-develop a singular, unified standard. A commitment to "co-develop" counts only if the document specifies the technical parameters, capability thresholds, or named methodology that will form the basis of the shared standard.
Exclusion: Agreements on the "interoperability" or "mutual recognition" of separate national standards do not qualify as a "shared" or "unified" framework unless both nations adopt a single, identical set of technical protocols.
Joint Statement/Consensus Document must meet the following conditions:
Publication: Published simultaneously or in coordination by official government repositories (e.g., state.gov, whitehouse.gov, or mfa.gov.cn). Coordinated, identical, or near-identical statements released by both governments within a 24-hour window that reference a common agreement reached through Track 1 dialogue shall qualify as a joint statement, even if published as separate documents.
Endorsement: Signed or formally endorsed by cabinet-level officials or their direct deputies. Eligible US officials include the Secretary of State, Secretary of Commerce, or National Security Advisor. Eligible Chinese officials include the Minister of Foreign Affairs, Minister of Industry and Information Technology, or the Director of the Office of the Central Foreign Affairs Commission.
Multilateral Scope: A multilateral statement or treaty where the US and China are both signatories counts as a "joint statement" only if the document specifically identifies a bilateral US-China commitment to the framework or if the two nations issue a separate, coordinated bilateral endorsement of the multilateral standard.
Eligible Events Window: January 1, 2025, to December 31, 2027, 23:59 UTC. Previous agreements (like the Bletchley Declaration) are excluded.
Resolution Source
Resolution will be based on official readouts and press releases from the following government portals:
United States: U.S. Department of State (state.gov) and the White House (whitehouse.gov).
China: Ministry of Foreign Affairs of the People's Republic of China (mfa.gov.cn) and the State Council (english.www.gov.cn).
In the event of a dispute, reporting by at least two major international news agencies (e.g., Reuters, Associated Press, or Agence France-Presse) confirming the existence and content of such a joint document will be sufficient for resolution.
Forecast Rationale
Time left: 632 days (~21 months). Status quo is NO: there is no qualifying US-China joint AI benchmark statement now. Scope check: I would put the odds of some bilateral AI readout or vague safety language materially higher than this, but this question is narrower because it requires an official jointly published document, cabinet/deputy endorsement, and a specific shared benchmark or unified evaluation framework rather than general cooperation. Why NO: the US policy environment emphasizes AI dominance and competition with China [PDF] America's AI Action Plan - The White House, and historically shared technical standards are much rarer than generic communiques. Why YES: Track 1 channels exist, and catastrophic-risk management could still create a late-breaking Schelling point around a named eval framework. Bet check: 7% is about 1 in 14; I am roughly indifferent between buying YES at 7 cents and NO at 93 cents.
Full analysis: decomposition, probabilistic components, and multi-method reconciliation
Generated by the Paper-to-Forecast pipeline — an automated system that transforms research papers into calibrated forecasting questions.