Will general purpose AI models beat average score of human players in Diplomacy by 2028?
Plus
10
Ṁ11712027
60%
chance
1D
1W
1M
ALL
General purpose (not trained for a specific task) language models demonstrated chess playing ability. They are also capable of deception and lie detection. Will language models or visual-language models* beat the average score of human players during a series of 40 games on webDiplomacy.net by 2028? (question modeled after Meta's Cicero result).
[EDIT: Please notice that while "CICERO achieved more than 2x the average score of its opponents" this question requires only achieving the above-average score]
*models or agents trained on different modalities (so e. g. models capable of controlling robotic arm like PaLM-E) would also qualify as long as they weren't trained specifically to play Diplomacy
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
75% chance
Will a large language model beat a super grandmaster playing chess by 2028?
70% chance
In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?
63% chance
Will an AI score 1st place on International Math Olympiad (IMO) 2025?
42% chance
Will an AI be capable of achieving a perfect score on the Putnam exam before 2028?
74% chance
Will an AI by OpenAI beat a super grandmaster playing chess by 2028?
69% chance
Will an AI be capable of achieving a perfect score on the Putnam exam before 2026?
39% chance
Will an AI model outperform 95% of Manifold users on accuracy before 2026?
56% chance
In 2028, will an AI be able to play randomly-selected computer games at human level, given the chance to train via self-play?
79% chance
Will an AI be capable of achieving a perfect score on the Putnam exam before 2030?
73% chance