Will AI for Diplomacy be mildly superhuman by 2024?
48
196
930
resolved Jan 1
Resolved
NO

Meta AI recently achieved 90th percentile Diplomacy play (no restrictions afaict): https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/.

Within one year will AI be mildly superhuman at (full-press) Diplomacy in the sense of having a higher ELO rating than any human player? If there are not ELO ratings available for some reason I may accept an alternative such as winning a tournament against the best human players. I will not accept any alternative that does not involve some kind of direct, well-incentivized competition between the AI and the best human players.

Related to:

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,028
2Ṁ121
3Ṁ109
4Ṁ40
5Ṁ34
Sort by:

It appears that there has been no work on this at all since Cicero.

predicted NO

I made this related market to basically express the question of "will people try to make further progress on this at all", because I think getting a decent sample of test games against humans might be more annoying than people expect.

@StevenK Exactly my thinking, whether this is something researchers will continue to compete on.

predicted NO

Chess and Go AI both seem to have taken something like 10-15 years to go from 90th percentile human to mildly superhuman. Things are different now and I don't know how much that says about Diplomacy AI timelines, but it does seem like evidence for it taking more than a year.

predicted NO

@StevenK Though my impression is Diplomacy has fewer players than Chess or Go by orders of magnitude, so maybe that means the gap between 90th percentile and top players is also not that big.

predicted NO

I wonder to what extent sampling randomness affected Cicero's results. The probability of 10 wins in 40 games is 0.029 given a win rate of 1/7 (which would make Cicero just an average player), 0.144 given a win rate of 1/4, so a likelihood ratio of 5, which is okay but leaves some room for doubt. The score they give is 25.8%, which is a little bit higher and implies some of the games were draws, which complicates this calculation.

bought Ṁ30 of NO

@StevenK But note that Cicero could also be better than 90th percentile. Probability of 10 wins in 40 games would be 0.02 given an underlying Cicero win rate of 40%, which for all I know might be better than any human player; I don't know where to look for the data. And a lot of the human players that did better than 25.8% in the sample got lucky themselves. So I think instead of thinking of Cicero as 90th percentile, we should think of it as an unknown 50th-100th percentile.

predicted NO

@StevenK Though apparently scoring for Cicero didn't work the way I thought: "For our experiments, games end at the end of 1908, and are scored according to the sum-of-squares scoring system, in which each player’s share of the score is proportional to the square of the number of SCs they control."

Will AI for Diplomacy be mildly superhuman by 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

More related questions