Will AI for Diplomacy be mildly superhuman by 2024?

Meta AI recently achieved 90th percentile Diplomacy play (no restrictions afaict): https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/.

Within one year will AI be mildly superhuman at (full-press) Diplomacy in the sense of having a higher ELO rating than any human player? If there are not ELO ratings available for some reason I may accept an alternative such as winning a tournament against the best human players. I will not accept any alternative that does not involve some kind of direct, well-incentivized competition between the AI and the best human players.

Related to:

Sort by:
StevenK avatar
Stevenis predicting NO at 71%

I made this related market to basically express the question of "will people try to make further progress on this at all", because I think getting a decent sample of test games against humans might be more annoying than people expect.

jonsimon avatar
Jon Simon

@StevenK Exactly my thinking, whether this is something researchers will continue to compete on.

StevenK avatar
Stevenis predicting NO at 76%

Chess and Go AI both seem to have taken something like 10-15 years to go from 90th percentile human to mildly superhuman. Things are different now and I don't know how much that says about Diplomacy AI timelines, but it does seem like evidence for it taking more than a year.

StevenK avatar
Stevenis predicting NO at 71%

@StevenK Though my impression is Diplomacy has fewer players than Chess or Go by orders of magnitude, so maybe that means the gap between 90th percentile and top players is also not that big.

StevenK avatar
Stevenis predicting NO at 82%

I wonder to what extent sampling randomness affected Cicero's results. The probability of 10 wins in 40 games is 0.029 given a win rate of 1/7 (which would make Cicero just an average player), 0.144 given a win rate of 1/4, so a likelihood ratio of 5, which is okay but leaves some room for doubt. The score they give is 25.8%, which is a little bit higher and implies some of the games were draws, which complicates this calculation.

StevenK avatar
Stevenbought Ṁ30 of NO

@StevenK But note that Cicero could also be better than 90th percentile. Probability of 10 wins in 40 games would be 0.02 given an underlying Cicero win rate of 40%, which for all I know might be better than any human player; I don't know where to look for the data. And a lot of the human players that did better than 25.8% in the sample got lucky themselves. So I think instead of thinking of Cicero as 90th percentile, we should think of it as an unknown 50th-100th percentile.

StevenK avatar
Stevenis predicting NO at 62%

@StevenK Though apparently scoring for Cicero didn't work the way I thought: "For our experiments, games end at the end of 1908, and are scored according to the sum-of-squares scoring system, in which each player’s share of the score is proportional to the square of the number of SCs they control."

ManifoldDream avatar
Manifold in the WildBot

Will AI for Diplomacy be mildly superhuman by 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition