Will the highest-scoring LLM on Dec 31, 2026 show

Question

BASELINE (2025 leader): The current leader as of Aug 2025 appears to be GPT-4.5 with ~90.2% on MMLU, with Claude 4 and Gemini 2.5 Pro at ~85-86%

Top 7 LLMs Ranked in 2025: GPT-4o, Gemini, Claude & More.

To establish the 2025 baseline:

On Dec 31, 2025, identify the LLM with the highest average score across the "Core Benchmark Suite" (defined below)

This becomes the baseline for calculating 10% improvement

CORE BENCHMARK SUITE (to avoid cherry-picking):

MMLU (general knowledge)

HumanEval (coding)

GSM8K (math reasoning)

ARC-Challenge (scientific reasoning)

GPQA (expert-level knowledge)

RESOLUTION CRITERIA:

On Dec 31, 2026, identify the highest-scoring LLM on the same benchmark suite

Calculate the percentage improvement: (2026_score - 2025_score) / 2025_score × 100

BET RESOLVES YES if improvement is less than 10%

BET RESOLVES NO if improvement is 10% or greater

KEY DEFINITIONS:

"LLM": Text-based language models (excludes multimodal-only systems)

"Publicly available": Model must be accessible via API, open-source, or major consumer platform

"Score sources": Use official leaderboards (HuggingFace, Papers with Code) or company-reported figures

"Average": Simple arithmetic mean across the 5 benchmarks

EDGE CASES:

If benchmarks become saturated (>98% scores), substitute with the most widely-adopted replacement benchmark

If a benchmark is discontinued, use the closest equivalent as determined by academic consensus

Minimum 3 valid benchmark scores required for inclusion

Example calculation:

2025 leader: 85% average

2026 leader: 92% average

Improvement: (92-85)/85 = 8.2% → YES (less than 10%)

Manifold Markets · Answer

Likely — Manifold Markets prediction market estimates a 72% chance (6 traders, as of Jan 26, 2026).

People are also trading