Manifold Plays Chess 2 Retrospective

Feb 13, 2023

Manifold Plays Chess is a series of markets exploring how prediction markets can be used to make decisions. The idea is that you create a market that asks, "What's the best chess move?", and then you play the winning answer. The market should be able to aggregate all information to converge on the actually best move, and so manifold should be at least as good as any other publicly available engine at chess.

You might recognize this as a Futarchy in the spirit of Robin Hanson. Our only value is winning at chess, and we're making prediction market bets on how to best achieve that. The dream is that we could extend these structures to find good decisions in domains beyond chess -- more games at first, but eventually even business and politics. Conversely, if decision markets can't even win at chess, what hope do they have in the real world?

Manifold Plays Chess 2 was my latest iteration of this experiment. The rules made it a bit different from a futarchy, in ways that I'll get into. This post is an attempt to bring together the various bits of analysis that been accumulating through a number of offshoot markets and discord servers, so that we might learn from my failures and eventually achieve robust decision markets.

Manifold Plays Chess 1

Briefly, let's go over what happened in the original Manifold Plays Chess by @AlexLiesman. Alex himself played as black, and the collective market played white. The rules were simple: every day, Alex posts a free-response market where anyone can bet on any legal move, closing 24 hours later. When the market closes, he uses a random number generator to pick a winning answer weighted by the implied market probability. Here's an example:

Because of the way automated market makers work, it was easy to bid up any move to a meaningful probability, and hard to make a move totally dominant.

After a few moves, I created a market to track our odds of winning:

This ended up being the set-up for exploitation.

The game played normally for the first few moves, until user @jfjurchen started buying up lots of NO in the win market, and lots of bad moves in the daily markets. He got unlucky with the RNG, but eventually succeeded in blundering our queen, and we promptly resigned after that. By my accounting he spent ᛗ3993 manipulating the move markets, and won ᛗ1131 when he succeeded [1]. His losses were mainly due to bad RNG, and somewhat due to other users defending good moves.

This reveals an asymmetry in the structure of chess: it takes many good moves to win, and only one bad move to lose. In fact, in both rounds we've treated "Resign" as a valid move, which loses immediately.

Manifold Plays Chess 2

For the second round, we played against a lichess bot instead of a human, for the simple reason that I'm personally not very good at chess. Manifold played as black, vs a weak level 3 bot as white.

The reason for choosing a weak bot was that the bot should be very easy to beat, so long as manipulators don't figure out a way to pick bad moves.

The Setup

The round was played across two kinds of markets: one Stake market that would last the entire game, and many Policy markets -- one for each turn. Here's the stake market:

And an example policy market:

In this structure, evaluation market resolves to the final result of the game: do we win, or lose?

However, the policy markets are where this diverges from traditional futarchy. Each day, I posted a new policy market for the next round of play. Users could use a bot command to vote for the next move, but votes were weighted according to the net number of YES shares the user owned in the Stake market. The idea here was to set up incentives -- you can help in picking the next move, but only if you have a financial interest in winning.

At the end of each day, I used a script to tally the votes for each move. Then, I used @FairlyRandom to pick a random move proportionally to the votes. I'd make that move in Lichess, resolve the policy market, and start the next turn.

To dissuade attempts to suddenly change your stake or vote in the last few minutes, I shared that I would run the vote tally "some point soon after this market closes" but not at any predetermined time. This either worked or was unnecessary -- there was very little last-minute trading, and I tallied every round except the first within 20 minutes of closing.

The actual market bets on each policy market weren't used as part of the evaluation, which means they could sometimes be disconnected from the voting probabilities. Someone probably could have turned a profit by correcting these probabilities according to the vote, but there wasn't much mana at play, and it did not end up mattering much.

The Results

By now, you've noticed that we did, in fact, lose this round. This means that manipulators won -- how? If you want to figure it out for yourself, do so now before moving on.

Market participants have access to the StockFish chess engine, so our skill level was mostly determined by how much we wanted to make a good move, rather than knowing what the best move would be. This means that for each move, our skill level was either very high or very low, with very little in between.

Meta-market

User @jfjurchen made the first attempt at manipulation by creating a secondary market:

The idea was that users could accumulate a NO stake in his alternative market without giving up voting power, weakening the incentive to pick good moves. He intended to buy up YES in the alt market to make it profitable for other users to do this, but he did not find a way to incentivize this strongly enough without going negative himself. Eventually @Bot found these markets and ran arbitrage.

The final volume of the meta-market was ᛗ26,700 versus ᛗ106,212 in the main market.

Alts

Alt betting was our ultimate downfall, as predicted by @Yev on discord.

The strategy is, you buy a bunch of YES on one account to accumulate a voting stake, and then a bunch of NO on an alt to hedge your position. Then you take an unhedged position in whichever direction you like. It requires a large investment: you're buying both sides of the bet, so you're paying ᛗ1 per share against an unhedged trader who pays the market price of ᛗ0.10 - ᛗ0.20. But it's risk free, so a whale can come in and arbitrarily decide the vote.

The Story

The first user to pull in a bot was @ms (and his alt, @ms_test ). He bought ᛗ9000 worth of NO on his main account and ᛗ3715 of YES on his alt, giving him a voting power of 12,385 without risk. Then he bought a further YES position on his alt, presumably intending to help win the game. At this point (Day 3), the total voting pool was 2935, so any move he picked would have had about a 75% chance of winning. However, he eventually exited before making any votes.

Mikhail stepped out soon after local cetacean @jack decided to step in with an investment of about ᛗ130,000. Jack had already had a small position beforehand, but he used his stake to dictate moves 5-9 for his own secret goals.

After move 9 @jack retreated to a more mundane position, and moves 10-19 played out fairly straightforwardly. This is probably a good idea of how manifold would play chess in the ideal case, absent manipulation. Users @TenShino, @deagol, @JoshuaB , @harfe , and @prigoryan held most of the clout during this period, through either voting stake or activism. The total voting stake in this period was ~15k YES shares, about ᛗ1,800 at market value.

@deagol posted an analysis of the situation at this point. Even understanding that we'd likely eventually resign, he saw his YES shares as a "self-renewable lottery ticket on a daily draw" via the daily subsidies I'd been putting into the policy markets. He argued for a drawn-out game to keep the subsidies going for as long as possible [2]. However, we mostly kept up an aggressive strategy -- I suspect because it seemed like more fun.

At turn 20, the inevitable came to pass -- @jack started voting "Resign", with about half of the total voting stake. He offered to swap his position given a sufficient bribe, but nobody took the offer. FairlyRandom finally landed on Resign on turn 21. [3]

Analysis

So obviously this market structure is flawed. I still think there's something to take away from the voting mechanism, and it's produced a lot of ideas about how to run a more successful chess game.

I'd originally designed the voting mechanism to be modeled off stock market shares -- shareholders vote on issues to steer the company. @A noted in a comment that there's a fixed number of voting shares in traditional stocks, and that with such a limit (perhaps as implemented by certificates?) it would be more difficult for a whale to gain control.

@deagol made a good point when he looked at the value of YES shares on rounds 10-19. In a normal prediction market, the market % balances at the true estimate of the probability. But in this case, that was disrupted, since YES shares were deriving value from something other than their final value at market resolution. I feel like there's potential for fun games here, or at the very least some cyclic whalebait gambling markets.

@Jason asks, why even allow "Resign" as an option? The reason is that we're not really trying to win chess, we're trying to build a prediction-market-based mechanism that can reliably choose the best option in the face of uncertainty. "Resign" means that if we fail to do that, it's really really obvious.

Future

@ms_test has issued a challenge:

I think this is possible -- see my YES position. A number of users are proposing market structures that will play good chess.

Modified Futarchy

In a traditional futarchy as proposed by Robin Hanson, decisions are made with paired conditional markets, where one will resolve and the other will N/A. I think it's looking more and more likely that it's the only structure that doesn't fail to manipulation. However, there are some practical issues that make them hard to implement. Let's look at these.

Too Many Markets

First, it's just hard to manage the markets that would be required. The methods that have been tried so far use one daily free-response market, but creating a binary market for every possible chess move would be tedious and error-prone if not automated.

A bot could go a long way for managing the markets that would be required here. I'm working on my own at the moment, and I'd be willing to lend support to other futarchy organizers that would benefit from bot actions. Grouped Binary Markets also would help to alleviate this issue, once they're implemented.

There are also some ideas that would cut down the number of markets needed. @jack's proposal uses a "weak stockfish" to generate 5 candidate moves, and the market will pick the best.

I'm focusing on solutions that avoid using an engine, and so I proposed instead using a bot command to create policy markets only when a user requests a given move.

@harfe, who intends to run Round 3, has another suggestion: use a free-response market to gather moves, and then pass only the top two into conditional markets. The free-response market still has a strong profit incentive, because only the move that wins its conditional market pays out in the free response. I think this is my favorite solution so far, except that I would pass more than two markets into the conditional round.

Policy Backlog

Once the move for a policy market has been made, it turns into a clone of the "Will we win?" evaluation market. If you keep policy markets open, then they all need to be arbitraged, at the expense of anyone placing new bets. If the policy markets close, then everyone who participated in each winning market has their mana locked up until the end of the game. Multiplied by the number of turns, it could add up very quickly.

On top of that, the cumulative nature of policy markets means you need honest users to stay on top of blocking out the blunders -- otherwise, as described by @jfjurchen, a motivated manipulator could swoop in to take advantage of the illiquidity.

The general solution to the backlog is to find a way to resolve them early. This is dangerous: the resolution of your policy markets needs to be highly correlated with winning the game, or else a manipulator might find a way to pick a policy market that does well on your metric but nevertheless results in a loss.

@jack again suggests using the evaluation of a "weak stockfish" after a 5 moves. This means the market is trying to simulate an engine that has the ability to "look into the future" 5 turns ahead, which is hopefully enough of a superpower to make weak stockfish strong. "Weak stockfish" might even be implemented by some simple heuristics, such as material count.

@harfe and I again focus on solutions that don't involve engines. @harfe's proposal is to use the win probability of the evaluation market on the next turn, averaged over an hour to mitigate manipulation. I've proposed something similar, adding exponential decay over a longer time period for reward attribution. I think these solutions are less vulnerable to goodhart problems than a non-market heuristic, but @jack is worried that using the evaluation market would be more vulnerable to manipulation.

Proposed structures

@jack's full proposal is here:

@harfe's is here.

I'll put mine up once I've got a bit more of the python filled in.

Definitely offer up your own suggestions! This is an experimental space, there's loads of inefficiency to exploit.

Other Games

We're running chess because it's a simple and well-understood test case, but there's no reason we should be limited to just chess. In fact, manifold is already running a futarchy (and a dictatorship, and a democracy) in the model U.N. game run by @a .

Once we beat chess, I'd like to try running Onitama. It's another two-player, turn-based, perfect-information game, which means it's mechanically a lot like chess. The difference is there's no easily available online engine and it's likely unfamiliar to many players, so I think there will be more room for honest playmaking by market analysts. Another alternative might be Mind MGMT.

In the long run, I'd love to play diplomacy on manifold, but I have no clue how to make that kind of hidden information game work. Do let me know if you have any ideas.

If we can beat board game diplomacy, then real world futarchy diplomacy shouldn't be too hard, right?

I've compiled this post as a place to collect and organize our scattered comments and analsysis for manifold chess.

Please let me know if I've missed anything or if there's new information, I'll try to keep adding as we accumulate understanding.

Definitely let me know if I've misrepresented your position, and I'll do what I can to fix it.

[1] I use ᛗ because it's a single unicode codepoint that doesn't require a modified font, and because the name is cool ("Mannaz").

[2] In response, I renewed a warning that I'd given earlier that I wouldn't continue to run an intentionally delayed game. Mostly, this was because the daily operations were not totally automated, and because the subsidies have been showing up as a loss on my profile. If I run a future version, it will be totally automated and on a bot account, which should solve those issues. In that case I'd be happy to let the game go on without limit.

[3] As a consolation prize, several of us picked up a bonus when @jack forgot to cancel a limit order in @jfjurchen's alt market.

Jack

The key point in my proposals isn't so much the use of an engine, that's in my mind just an example of a possible implementation method that is easy to work with. I think the key point in my mind is that you need a way to generate candidate moves and evaluate them that is resistant to manipulation. Anything based on markets is easy to manipulate, especially when the liquidity is low. Things that are more resistant to manipulation include algorithms that take the board state and spit out some result (aka chess engines), analysis by a hopefully-unbiased committee, polls, etc.

J. F. Jurchen

"But it's risk free, so a whale can come in and arbitrarily decide the vote." - I think it's a little more complicated than that. The version 2 design doesn't have the property we see in other whale-watching markets where the single participant with the largest bankroll can fully determine the outcome, and everybody else has wasted their money. Here, someone with 50% as much mana as the biggest whale has 50% as much influence on the outcome, not 0%.

I think the more important dynamic is that the "good moves" (YES) crowd needs to win 20+ votes in a row, while the "resign" (NO) crowd needs to win only one. So imagine there are ten whales with similar bankrolls. Each takes a directional YES/NO position and then additionally accumulates 5000 votable shares of YES and 5000 shadow shares of NO for Ṁ5000. If nine of them are team YES and only one is team NO, there's a 10%[1] chance of resigning each move. If winning takes 20 moves, by my math there's an 88% chance that we resign before we can win.

[1] - Actually a little lower, because the YES voters will have more than 5000 YES shares due to their directional position. The NO voter faces a tradeoff between "more voting power" and "larger directional position," whereas for YES voters they go together.

J. F. Jurchen

Thanks for writing this up! One minor thing - my overall loss of mana in the first game wasn't just because of bad RNG, it was also because I was timing my bets poorly. I still couldn't explain the math here, but it seems that in Manifold DPM markets you can change the market percentages more capital-efficiently[1] by betting last. So I'd spend Ṁ500 pushing "blunder the queen" to 50%, and then someone else would come in and spend Ṁ250 pushing "a good move" to 75% and "blunder the queen" down to 10% (numbers very made up). I finally won when I bit the bullet and stayed up until 4am, but it took me a few rounds of losses before I realized that was necessary.

[1] - In a normal market this incentivizes betting early. Under most circumstances you don't want the price of an outcome to go up a lot when you bet on it! But in the chess markets you did.