Resolution criteria
How the winner will be selected:
https://eqbench.com/index.html -> Elo Score -> Model -> Model Maker
The market resolves to the LLM maker with the highest Elo score on the EQ-Bench 3 leaderboard at https://eqbench.com on January 1, 2027. Resolution is determined by navigating to the leaderboard, identifying the top-ranked model by Elo score, and determining its maker.
If the top Elo score belongs to a model from a maker not listed in the provided answer options (i.e., a new company or unlisted maker), the market resolves to "Other."
Background
EQ-Bench 3's Elo score is calculated from pair-wise model comparisons, where an LLM judge rates each response against eight core dimensions of emotional intelligence. The test set contains 45 scenarios spanning 3 turns, where the user messages set up the scenario and inject conflict, while the evaluated model must reply in-character with introspection blocks exposing reasoning and theory-of-mind understanding. EQ-Bench 3 is a subjective evaluation judged by an LLM (Sonnet 3.7), so results should be considered roughly indicative but not absolute truth.
Considerations
Elo scores are relative and shift around when new models are added, meaning the leaderboard composition and rankings can change significantly as new models are evaluated. Additionally, the benchmark's subjective nature means results should be treated as indicative rather than definitive, and performance can vary based on the specific judge model and evaluation methodology used.
