Will AIs stay below 1453 elo in 2024 on chat.lmsys.org/?leaderboard as predicted by Gary Marcus?
Dec 31
That is not the same statement at all.

ELO is a way to measure how much better an agent is compared to the rest of competitors, based on the results of head-to-head matches. Gary is saying that we won’t have something 200 points higher than GPT-4. Gary’s statement could be correct if a few great chatbots emerge, and GPT-4’s ELO rating drops as a result.

Not a particularly bold prediction by Gary lol, he's trying to say that models are plateauing but if we get a 1400 Elo model before the end of the year that would not be a plateau at all, and would still be 50 away from this line.

I'd bet no on a version of this that excluded ties and a few other things. However, I think that ties and the low quality of many questions (and the low quality of the evaluation of many answers) might mean that even enormous improvements might not be enough to get a model to 1453. We are, I think, bumping on the ceiling of this particular method of evaluation, and while a fantastic enough model probably could reach 1453, it would not be the smallish-moderate sized improvement it may appear to be at first glance.


