Will the GPT4+code-interpreter+search score > 1350 on Lmsys Arena Leaderboard?

As Bard ranked as high as 1215 on Lmsys Leaderboard, it leaves people wonder what GPT4 plus all the agents ability currently served on the website can achieve in terms of user preference?

Resolve to NA if it was not on the leaderboard by EOY 2025

Sort by:

Tricky part about this question, though, is I think "1350" will depend on the scores of other models. It isn't set exactly what 1350 means. Might make more sense to say something like "will score 100 points more than model XXX".

But even then, this might actually be a tricky one to resolve because it's not clear that they'd add a search-enabled version that is an identical base model. So they might have, like, old models competing against a new model that also has search enabled.

(Also how the models score is going to depend on the mix of questions that end up being used in the arena. If people start skewing their questions towards things where the search part helps. So maybe with today's question mix it would get 1275 ELO, but if people start specifically asking questions around this functionality it would get 1400 ELO.)

I am talking about search-enabled version

I would assume the same base model, but different instruction tuning to better use function calls?