
Meta AI recently achieved 90th percentile Diplomacy play (no restrictions afaict): https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/.
Within one year will AI be mildly superhuman at (full-press) Diplomacy in the sense of having a higher ELO rating than any human player? If there are not ELO ratings available for some reason I may accept an alternative such as winning a tournament against the best human players. I will not accept any alternative that does not involve some kind of direct, well-incentivized competition between the AI and the best human players.
Related to:
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ1,028 | |
2 | Ṁ121 | |
3 | Ṁ109 | |
4 | Ṁ40 | |
5 | Ṁ34 |
People are also trading
@StevenK Though my impression is Diplomacy has fewer players than Chess or Go by orders of magnitude, so maybe that means the gap between 90th percentile and top players is also not that big.
I wonder to what extent sampling randomness affected Cicero's results. The probability of 10 wins in 40 games is 0.029 given a win rate of 1/7 (which would make Cicero just an average player), 0.144 given a win rate of 1/4, so a likelihood ratio of 5, which is okay but leaves some room for doubt. The score they give is 25.8%, which is a little bit higher and implies some of the games were draws, which complicates this calculation.
@StevenK But note that Cicero could also be better than 90th percentile. Probability of 10 wins in 40 games would be 0.02 given an underlying Cicero win rate of 40%, which for all I know might be better than any human player; I don't know where to look for the data. And a lot of the human players that did better than 25.8% in the sample got lucky themselves. So I think instead of thinking of Cicero as 90th percentile, we should think of it as an unknown 50th-100th percentile.
@StevenK Though apparently scoring for Cicero didn't work the way I thought: "For our experiments, games end at the end of 1908, and are scored according to the sum-of-squares scoring system, in which each player’s share of the score is proportional to the square of the number of SCs they control."