What will be the best score on the WebArena benchmark before 2025?

5

160Ṁ275

resolved Jan 29

Resolved as

57%

1H

6H

1D

1W

1M

ALL

This question will resolve as the state-of-the-art success rate (SR) with no UA Hint on the WebArena benchmark by an AI system, including any post-training enhancements but excluding any human assistance. This will be based on credible publicly available results prior to January 1st 2025. Credible sources include but are not limited to blog posts, arXiv preprints, and papers.

Background information:

See WebArena.
WebArena is a standalone, self-hostable web environment for building autonomous agents. WebArena introduces a benchmark on interpreting high-level realistic natural language command to concrete web-based interactions. We provide annotated programs designed to programmatically validate the functional correctness of each task. See the paper and specifically section 5.1 for results.
Best publicly reported score on March 15th 2024 is GPT-4 based and achieved 14.41%.
Be advised that this benchmark does not yet have an official leaderboard and is not widely reported by developers, however, we hope this may change soon given that it seems like a high quality and important benchmark.

Part of the AI Benchmarks series by the AI Safety Student Team at Harvard on evaluations of AI models against technical benchmarks. Full list of questions:

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ39
2		Ṁ9
3		Ṁ1

People are also trading

What will be the best performance on OSWorld by December 31st 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will an AI score over 80% on FrontierMath Benchmark in 2025

What will be the best performance on SWE-bench Verified by December 31st 2025?

Top OSWorld score in 2025?

What will be the highest score achieved on SWE-Bench Verified in 2025?

Related questions

What will be the best performance on OSWorld by December 31st 2025?

What will be the best score on Cybench by December 31st 2025?

What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Will an AI score over 80% on FrontierMath Benchmark in 2025

What will be the best performance on SWE-bench Verified by December 31st 2025?

Top OSWorld score in 2025?

What will be the highest score achieved on SWE-Bench Verified in 2025?

© Manifold Markets, Inc.•Terms•Privacy