Top OSWorld score in 2025?
1
10kṀ8450
2026
75.1 %
expected
90%
Above 50%
84%
Above 60%
73%
Above 70%
58%
Above 80%
42%
Above 90%

Background

OSWorld is a benchmark for evaluating multimodal AI agents on real-world computer tasks in open-ended environments. It tests an AI's ability to navigate operating systems, use applications, and complete practical tasks through a combination of vision and text inputs/outputs.

As of January 24, 2025, the highest OSWorld score is held by OpenAI CUA (200 steps) with a score of 38.1. Other notable scores include:

  • UI-TARS-72B-DPO (50 steps): 24.6

  • UI-TARS-72B-DPO (15 steps): 22.7

  • Claude 3.5 Sonnet (50 steps): 22.0

Resolution Criteria

This market will resolve to the highest verified OSWorld score achieved by any AI model during the 2025 calendar year (January 1, 2025 to December 31, 2025). The score must be publicly reported and verifiable through official sources such as the OSWorld leaderboard, academic publications, or credible tech news outlets.

If multiple models achieve the same highest score, the market will resolve to that score. If scores are reported with different decimal precisions, they will be considered at their reported precision.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules