
By Nov 12 2025, will there be a model that meets all of these criteria:
>84.6% on the Artificial Analysis Quality Index
ie the average of benchmark scores on
MMLU
GPQA
MATH
HumanEval
MGSM
with no regressions on any individual benchmark
Note:
does not need to be an OpenAI model
open weights or free models will count as cheaper
quantized/distilled versions count, as long as they also beat the same accuracy thresholds
There's a model that fits your criteria: Gemini 2.5 Flash-Lite (September 2025 Preview).
https://artificialanalysis.ai/models/gemini-2-5-flash-lite-preview-09-2025-reasoning/
1. 756 tok/s per ArtificialAnalysis.
2. $0.1/Mtok input, $0.4/Mtok output.
3. As of the AAII v3 suite, its score is 48 vs o1-preview's 45 (old index used benchmarks that have been saturated since, so aggregate scores have lowered across the board). Individually, as far as I can tell, there are no statistically significant regressions; 71% vs. 73% on GPQA Diamond probably shouldn't be considered one in the spirit of the question.
@JoshYou imo we’ve only just started realizing algorithmic speedups- still seems to be plenty of low hanging fruit, in fact if there /isn’t/ a reasoning model that is faster than that (regardless of acc) by this time next year I would be extremely surprised. Also whether speedups attributed to blackwell hw speedups or no, we can discuss, ie should it be measured wrt current h100s