Will AI for Diplomacy be strongly superhuman by 2024?
30%
chance

Meta AI recently achieved 90th percentile Diplomacy play (no restrictions afaict): https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/.

Within one year, will AI be superhuman at Diplomacy, which for the purposes of this market means an ELO rating corresponding to a 90% win rate against the best human players?

Nov 22, 11:51pm: Will AI for Diplomacy be superhuman by 2024? → Will AI for Diplomacy be strongly superhuman by 2024?

Sort by:
StephenMalina avatar
Stephen Malina
bought Ṁ10 of NO

To me, this seems possible but unlikely given my very limited sense of how much people will continue to work on this. Maybe people are going to continue to push on it more than I realize though?

StevenK avatar
Steven
bought Ṁ100 of NO

@StephenMalina Even if researchers put their all into it, here's what I see as the basic argument why it won't happen. Diplomacy is a 7-player game where each player starts with 2 or 3 neighbors. Because conflict is mostly a matter of "more armies wins", any pair of players can defeat any one neighboring player early on if they so choose. Whether they so choose is substantially random and depends on whim and on tactical expediency that in turn depends on the way that moves unpredictably play out. If AI players are noticeably AI and not human, it also depends on how human players feel about allying with AI players. Throughout the game, it remains the case that success depends on your opponents not allying against you. Later on, there are stalemate lines, where if an opponent controls enough territory, there's nothing you can do to force a win. So it's hard to see how a 90% win rate is possible without highly reliable superhuman psychological manipulation. I would update a lot if someone who had played a significant amount of Diplomacy thought a 90% win rate was an attainable criterion. As it stands, I think people are just betting on the words "strongly superhuman" because they analogize it to Chess or Go in a way that I don't think is right.

StevenK avatar
Steven
bought Ṁ100 of NO

@StevenK Compare to whether AI will reach a 90% win rate against top human players at three player chess. No matter how good the AI is, it seems to me that sometimes its two opponents will gang up on it at key points, and to reliably avoid that, it would need a model of how the human mind responds to board positions that's deterministic enough that it can reliably steer into board positions that cause players to behave in a given way.

vluzko avatar
StevenK avatar
Steven
bought Ṁ30 of NO

a 90% win rate against the best human players

Given that it's a seven player game, sometimes the other players happen to ally against you, and there's luck involved (in the same sense as there's luck in rock-paper-scissors, because people play mixed strategies), a 90% win rate sounds like it would almost require a mind hacking level of persuasion, but maybe I'm missing something.

vluzko avatar

@StevenK Hmm, yeah, I didn't consider the impact of multiplayer for the criteria. I think I want something more like "90% probability of not losing" against any specific player.

StevenK avatar
Steven
bought Ṁ28 of NO

@StevenK My impression is also that the tactical part of the game isn't complex enough to support the possibility for much superhuman brilliance, though I haven't seen what top-level Diplomacy play looks like.

StevenK avatar
Steven
is predicting NO at 25%

@vluzko If half the time winning the game comes down to whether your neighbors like you, and if the AI doesn't pass the Turing test, then win rate also starts depending a lot on whether the world's top Diplomacy players prefer AI wins to human wins.

l8doku avatar
l8doku
bought Ṁ10 of YES

@StevenK I don't know much about Diplomacy specifically but I played some similar games, and I think superhuman level is quite possible and achievable. The problem is, I imagined superhuman levels as something like "ELO higher than any human player, with some margin", which is still a much lighter threshold than "90% probability of not losing".

StevenK avatar
Steven
is predicting NO at 25%

@l8doku Yes, ELO higher than all human players may well be achievable.

StevenK avatar
Steven
is predicting NO at 51%

@vluzko Are you currently intending to resolve according to a literal 90% win rate in multiplayer games or some other criterion that you're still thinking about? Multiplayer seems essential to the game and I don't see any way to measure Diplomacy skill in terms of a 90% chance of not losing against any individual player. Maybe someone who has played Diplomacy a lot could weigh in?

StevenK avatar
Steven
is predicting NO at 51%

@StevenK Note that Cicero got a 25.8% score. I think score is equivalent to win rate with a 2-way draw converted to a half win, 3-way draw converted to a one third win, and so on.

vluzko avatar

@StevenK I'm going to resolve with a literal 90% win rate for this market, and make a second market for 'weakly superhuman'. Do you have a suggestion for what 'about as superhuman at diplomacy as alphago was at go' translates to? (Never mind if it's achievable/likely)

StevenK avatar
Steven
is predicting NO at 33%

@vluzko I don't have a suggestion, but I do have another difficulty, which is that Cicero's games weren't played all the way to a win/draw:

"For our experiments, games end at the end of 1908, and are scored according to the sum-of-squares scoring system, in which each player’s share of the score is proportional to the square of the number of SCs they control."

If future experiments also use this kind of blitz scoring, it means games will rarely play out all the way to someone winning by the standard rules.

StevenK avatar
Steven
is predicting NO at 33%

@StevenK That is, even if the AI did very well, under this scoring, we wouldn't find out if it was actually going to win 90% of full games unless it did so in an unusually short amount of time.

StevenK avatar
Steven
is predicting NO at 33%

@StevenK It looks like the Diplodocus experiments for no-press Diplomacy AI used a similar scoring rule, with limited but slightly longer games and sum-of-squares scoring at the end. So a question it could make sense to ask is "If turn limit + sum of squares scoring is used for future full-press Diplomacy AI on a reasonable sized sample of games against top human players, will it score at least as well as Cicero (25.8%) or Diplodocus (26-27%) did in their respective games against a wider range of players?"

StevenK avatar
Steven
is predicting NO at 33%

@StevenK One could also ask about Elo directly. From the Diplodocus paper:

Elo ratings were computed using a standard generalization of BayesElo (Coulom, 2005) to multiple players (Hunter, 2004) (see Appendix I for details). This gives similar rankings as average score, but also attempts to correct for both the average strength of the opponents, since some games may have stronger or weaker opposition, as well as for which of the seven European powers a player was assigned in each game, since some starting positions in Diplomacy are advantaged over others. To regularize the model, a weak Bayesian prior was applied such that each player’s rating was normally distributed around 0 with a standard deviation of around 350 Elo.

The best scoring Diplodocus, which scores 27% (compared to average 1/7) has an Elo of 181 where I think the median player has 0. I haven't looked into the details, but note:

400 points in Elo systems generally corresponds to a 10-fold increase in expected winning odds or expected average score

StevenK avatar
Steven
is predicting NO at 33%

@StevenK Maybe it's easier to score 25% against a population that scores 25% against a population that scores 25% against the general population of players than it is to score 90% against the general population of players, just because some bad luck can't be eliminated. So as another complication, maybe the assumptions behind Elo break down here.

StevenK avatar
Steven
is predicting NO at 33%

@StevenK If I'm not mistaken, getting a 90% score would require the AI to get 54 shares to 1 share for each of 6 human players, so it ends up with 54/60=0.9, so that's a log10(54) * 400 = 693 point Elo difference. 90% score at the end of a blitz game is probably even more stringent than an eventual 90% win rate, because it means the AI has to complete its wins faster, but on the other hand, people are claiming blitz games are relatively easy for AI.

StevenK avatar
Steven
is predicting NO at 33%

@StevenK To rephrase some of what I've said earlier in the thread: it seems much more likely to me that there will be a tower of 7 AIs on top of the best human, each of which scores points as if it had 100 more Elo when playing against the next lowest AI in the tower, than a single AI that scores points as if it had 700 more Elo when playing against the best human.

ManifoldDream avatar

Will AI for Diplomacy be superhuman by 2024?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition