#### Resolution criteria
Saturation on OS-World Verified means a model can execute simple, realistic tasks in Linux-based environments using popular open-source applications, such as adding page numbers to a document or exporting a CSV file from a spreadsheet. The market resolves YES when the highest-performing model on the official OS-World Verified leaderboard reaches 95% success rate or higher. Resolution will be determined by checking the official OS-World leaderboard at the time of resolution. If the benchmark is substantially modified or discontinued, the market resolves N/A.
Background
OS-World was originally developed by researchers from the XLANG Lab at the University of Hong Kong and released in April 2024, with a major update dubbed OS-World Verified in July 2025. With human performance estimated at ~72%, the best current systems have reached 84.4% of human capability. OS-World remains far from saturated with substantial headroom to human-level performance.
Considerations
Task difficulty changes over time in unpredictable ways, many tasks can be completed without using much or any GUI interactions, and the skill of interpreting the instruction is sometimes as important as the skill of using the computer. Uncontrollable factors include anti-crawling mechanisms and CAPTCHAs on websites that can cause the benchmark's signal to gradually weaken over time.