Will general purpose AI models beat average score of human players in Diplomacy by 2028?

General purpose (not trained for a specific task) language models demonstrated chess playing ability. They are also capable of deception and lie detection. Will language models or visual-language models* beat the average score of human players during a series of 40 games on webDiplomacy.net by 2028? (question modeled after Meta's Cicero result).
[EDIT: Please notice that while "CICERO achieved more than 2x the average score of its opponents" this question requires only achieving the above-average score]

*models or agents trained on different modalities (so e. g. models capable of controlling robotic arm like PaLM-E) would also qualify as long as they weren't trained specifically to play Diplomacy

Get Ṁ600 play money

More related questions