This market will resolve to the first 50% time horizon, as reported by METR, of Moonshot AI's Kimi K3 Thinking. If a model in the Kimi K3 family of models is evaluated by METR that is able to reason before providing an answer, like a reasoning model, but it doesn't contain "Thinking" in its name (like Kimi K2 Thinking did), this still counts as Kimi K3 Thinking for the purpose of this market. Kimi K3 Code, Kimi K3 Heavy, these all count if they are the first such model to be evaluated and reported on by METR.
50% time horizon is a measure of AI autonomy based on the length of tasks that AI can do: roughly, it is the time that humans take to complete tasks that an AI system can successfully do 50% of the time. See METR's "Measuring AI Ability to Complete Long Tasks" for the technical definition. Claude 3.7 Sonnet, released in February 2025, was the leading model with a 50% horizon of 59 minutes.

Left bounds inclusive, right bounds exclusive.
See also:
/jim/claude-45-opuss-metr50-horizon (jim's version)
/Bayesian/claude-opus-45s-metr50-time-horizon (my version)
/Bayesian/gemini-3s-50-time-horizon-per-metr
/Bayesian/grok-420s-metr-50-time-horizon
/Bayesian/grok-5s-50-time-horizon-per-metr
/Bayesian/r2s-50-time-horizon-per-metr
/Bayesian/kimi-k3-thinkings-metr-50-time-hori (this market)
Update 2025-12-20 (PST) (AI summary of creator comment): If Kimi K3 is tested on METR with a subpar inference provider (similar to what happened with Kimi K2), the market will still resolve based on those results regardless of whether they may be unrepresentative of the model's true capabilities.
People are also trading
The members of the AI futures project have given an update and they appear to now be relying on the 80% time horizon length graph from METR for their predictions rather than the 50% time horizon length graph. This implies that a 50% time horizon is not enough. While I think markets for 50% time horizons are useful, I now think that more attention needs to be paid to 80% time horizon lengths. I am planning to create markets for 80% time horizons either tonight or some other time this week unless someone beats me to it.