Will a GPT-5-worthy model be able to break 50% on NYT Connections benchmark?

AI Models are really bad in playing NYT connections.

If GPT-5, or a model worthy of the name, can break 50%, according to Lech Mazur, or some else trusted source by me or Gary Marcus, this market resolves to YES.

If a model is considered GPT-6 worthy, or is GPT-6 itself, it won't count.

Obviously, this is nuanced, and therefore I won't bet.

This market resolves to NO on Dec 31st 2026.

Does anyone know what the win rate is for a top skilled human player is?

@AnilJason marcus says its 90%

bought Ṁ90 YES

31% is pretty good imo, I don’t think it takes much to get from there to 50

@dominic I tried to make so it'd be another step change like GPT 3.5 to GPT-4

@MP Oh yeah, that's fair. It would be a step change, but I think it's not as big as from 4% to 31% (which is why I bought YES)

So are we basically just waiting for a single model from OpenAI to be released, which is either called GPT-5 or you believe it to be equivalent to GPT-5 but under a different name, and then we judge that single model on its Connections score? Something like Claude 4 or Gemini 3.0 would not be "GPT-5-worthy?"

@CDBiddulph A model like Claude 4 can be GPT-5 worthy.