The GPT-4 bot uses an LLM to read market descriptions and automatically trade on Manifold Markets. However, as of January 2024, it doesn't make a consistent profit:
There's probably a lot of room to improve a bot like this, even just with currently-available technology. For instance, existing data on past market resolutions on Manifold, Metaculus, and other prediction markets would likely be very useful for fine-tuning.
This is the main market for what I'm calling the Motley Bot Challenge (so-called because it rewards bots that are accurate on a very diverse range of questions). My hope is that this challenge encourages the creation of bots that can be scaled to increase Manifold's site-wide prediction accuracy and improve our understanding of the world!
Resolution Criteria
On December 1, 2024, I will select 1,000 random YES/NO markets on Manifold that will resolve in about one month (see "Selecting markets" for more information). I will post a .txt file containing links to the 1,000 markets in the comments of this market.
For each market, a bot must do one of the following:
Invest Ṁ1 in YES
Invest Ṁ1 in NO
Do nothing
On January 8, 2025, I will determine which bot has the most profit across its 1,000 bets. Manifold bot fees do not count against profit. Unrealized profit from unresolved markets does count for profit.
I will resolve the market based on this maximum profit as follows:
Resolves NO if the profit is zero or negative.
Resolves to X% if the profit is ṀX for some X between 0 and 100.
Resolves YES if the profit is Ṁ100 or more.
Note that a bot can invest up to Ṁ1,000, so if it invested all of the mana it could, a profit of Ṁ100 would mean it made a 10% return.
If no bots enter the challenge, this market resolves NO.
Resources
Discord server for discussion and collaboration: https://discord.gg/2QCtBJnDe8
Tag for related markets: https://manifold.markets/browse?topic=motley-bot-challenge
The script that will be used to select markets and a "Basic Bot" to start you off: https://github.com/CDBiddulph/motley-bot-resources/
Selecting markets
I will use the Manifold API to choose random binary markets (sample query). I will search through as many markets as necessary to get 1,000 eligible markets. Markets will be filtered to fit the following criteria:
Closes sometime during December 31, 2024 or January 1, 2025 (UTC).
Has 10 to 20 unique bettors (inclusive)
Does not have any of the following tags:
If there aren't enough markets that meet these criteria, I'll progressively broaden my filter as follows until I have enough markets:
Include markets that close on January 2, January 3, etc. until January 7.
Increase the upper bound on the number of unique bettors one at a time, until no more markets can be included this way.
Decrease the lower bound on the number of unique bettors until reaching 0.
Include markets with disallowed tags, in the order of the list above.
If there still aren't enough markets, I'll post M<1000 markets (as many markets as I could find). Then I'll scale the actual profit by 1000/M. For instance, if I could only find 800 markets, and then the actual profit made by the highest-performing bot is Ṁ40, I would resolve the market to 40*(1000/800) = 50%.
Rules for bots
Bots may use any information accessible via the Manifold API or on the Internet. Attempts to "time" individual markets are not allowed - you must make all of your trades at once, sometime before the end of December 1 (PST).
To formally enter your bot in this challenge, make a comment in this market with a link to the bot's Manifold page and another public link to its source code and add your bot's name as an option in this market. I would also highly recommend joining the Discord server. Each participant may only submit one bot, but teams working on the same bot are allowed and encouraged.
For ease of scoring, a bot shouldn't perform any trades other than the 1,000 trades that it makes in this challenge. You may want to make a separate bot for testing. You can trade on any markets you want with your testing bot, as long as they are ineligible for being selected for the challenge (i.e. they would not fulfill even the most relaxed set of requirements described in the "Selecting Markets" section).
Adapting other people's code for your bot is allowed. (Please give them credit though.) Right before you push code or release information about your bot, you can use your insider information to bet in this market or derivative markets. Assuming everyone accurately assesses the quality of their own work and trades accordingly, someone who builds off of someone else's work only profits on whatever additional value they created.
If your bot is based on someone else's code, it must be substantially different according to my judgement. Generally speaking, your change should add a substantial new strategy to the bot - simply changing a few parameters or prompts would not qualify.
Hypothetically, someone could use their bot's or someone else's bot's code to trade on a bunch of eligible markets ahead of time, wiping out whatever alpha the bot might have. I don't think this will be a problem, but please don't do this! I want everyone's bots to be open source so that people can build on each other's ideas, but if it becomes a concern, I may consider allowing code to be closed source until all bots have placed their bets.
As more bots participate in this challenge, it becomes more likely that one of them will achieve the highest score due to luck rather than true predictive accuracy, inflating the expected score of this market. To keep the number of bots from getting out of hand, I will only consider the bots that are at least partially included in the top 95% of probability mass in the market "What will be the most profitable bot in the Motley Bot Challenge?" on December 1, 2024. For a market that only considers a single bot, eliminating the effect of variance, see this market.
Other
I may add/change rules as necessary to preserve the spirit of the market. For example, I might add to the list of tags that would disqualify a market from being selected. Please suggest any changes you think I should make!
I will personally fund the 4 qualifying bots that rank highest on this market up to Ṁ1250 each. This should cover the maximum number of trades plus Manifold bot fees.
I will not bet in this market.
Related markets:
Change history
Jan 2 2024: Clarified what happens if I can't find 1,000 eligible markets. Suggested using a separate bot for testing. Added notes about the pros and cons of open source code. Added a rule that the bots must be substantially different from each other. Limited the bots that can participate to the top 95% of this market. Added 3 related markets.
Jan 7 2024: Added a Discord server. Came up with the name "Motley Bot Challenge" and added it as a tag.
Jan 22 2024: Made clarifications in response to comments from @patrik below.
Jan 28 2024: Clarified that close times will be based on the UTC time zone.
Jan 29 2024: Added a link to the GitHub repository with the script for selecting markets.
Nov 3 2024: Added a "Basic Bot" script and upgraded the market to Crystal.
@ClaudeSonnet3539 ended up making a loss of Ṁ24. This market resolves to 0% 😭
Early on, Claude 3.539 got a (relatively) huge windfall of Ṁ34, due to a lucky bet on this question about an earthquake in California. After that, its profit slowly continued to increase, and it looked like it was pretty solidly in the green.
However, in an interesting turn of events, the bot quickly started losing mana as markets began to resolve. It seems like the main reason for this is that some markets already had extreme probabilities on December 1, either close to 0% or 100%, and the bot would give probabilities that were somewhat less extreme, which would cause it to lose in a large number of markets as they resolved at the end of the month. For instance, this market was trading at 97%, but Claude 3.539 estimated an 88% probability, and ended up losing when the market resolved to YES.
This failure could partially be attributed to the fact that I didn't let the bot see the current probability of the market. I just asked the bot for a probability, determined whether it was more or less than the current probability of the market, and had it buy YES or NO accordingly. I didn't have to do it this way, but this is closer to the scenario I'm interested in where we can get a bot to accurately predict a question using no other human input, so I thought it would be interesting.
As noted below, the bot was only able to trade in 913 markets. Besides the duplicate market bug in my selection algorithm, I had to skip trading in 12 more markets. This was either because they were already trading at 1% or 99% (where Manifold doesn't let them go any lower/higher), or because the market closed sometime between when I selected them and when the bot attempted to trade in them.
If I were to do something like this again, I'd definitely change a few rules. In particular, I'd add criteria for the selected markets to be trading close to 50/50, and I wouldn't select markets that close at the end of the year, since those tend to be pretty settled by their final month. It would probably be better for bots to continuously make trades in a variety of markets over a longer time period.
I released the full results for each market that Claude 3.539 bet in here. These results include the full prompt to Claude, the market probability at the time of the bet, the bot's guessed probability, the bot's reasoning, the market URL, and search results. The results are pretty interesting to look at - there are some places where Claude makes some pretty insightful observations, and others where its reasoning is pretty clearly flawed.
@CDBiddulph A little sad that more folks didn’t try out their bots here, and also that Claude didn’t end up profitable. Nice experiment, though!
@CDBiddulph very cool experiment! but yeah it's a ton of work to get it to work, and hard to convince people to put in that work (especially if they're bearish about the chances).
I think some of these examples also help illustrate why this is probably a better fit for e.g. metaculus' bot competition over manifold trading. Trading isn't solely about turning your forecast into a number. E.g. even knowing nothing about the underlying event, it's pretty safe to assume that high/low probabilities for EOY events are probably not extreme enough in December (because there's time decay that people don't keep up with), so it's not surprising that it'd be losing mana there. basically, this gives it an extra hard task, one it doesn't seem particularly suited for.
to understand the bot as a forecaster, i think it's often helpful to just test it qualitatively (beyond the large sample results of a bunch of markets resolving). like, does this bot have a good sense of time? that 88% for the WWE market surprises me, given the remaining time in the year. I'd be curious to see what probabilities it assigned to this event happening within 2 months, 6 months, 12 months, etc, & to see if they're internally consistent.
@Ziddletwix Yeah, on the WWE market you can actually see in its reasoning trace that it's thinking about the full 5-month time period rather than taking into account the current date. Some basic fine-tuning could probably help a lot
Alright, I finished writing my bot, @ClaudeSonnet3539. It uses (new) Claude Sonnet 3.5 on the backend, and its prompt is very similar to the one from FiveThirtyNine.
The bot traded in 913 markets - it couldn't trade in all 1000, due to a bug in the market selection code that resulted in 75 duplicate markets as well as some markets that couldn't be traded in for various reasons. When scoring the bot, I'll use the logic described above, for M = 913:
...I'll scale the actual profit by 1000/M. For instance, if I could only find 800 markets, and then the actual profit made by the highest-performing bot is Ṁ40, I would resolve the market to 40*(1000/800) = 50%.
Later, I'm going to clean up my code and upload it, along with a complete transcript the bot's reasoning for all 913 markets. I'll also post another comment with an explanation of how the bot works.
I'm feeling pretty good about the bot's performance - most of its reasoning seems pretty astute, and its given probabilities are generally quite close to those of the market, despite not being told the market's probability in its prompt. I'm looking forward to seeing its profits in a month!
On December 1, 2024, I will select 1,000 random YES/NO markets on Manifold that will resolve in about one month (see "Selecting markets" for more information). I will post a .txt file containing links to the 1,000 markets in the comments of this market.
@traders The challenge begins: https://raw.githubusercontent.com/CDBiddulph/motley-bot-resources/refs/heads/main/select_markets/output/competition_markets.txt
Hopefully I can make the time today to set up a Claude bot to enter the contest. No promises
So you like meta prediction markets? Well have I got a treat for you! https://manifold.markets/CKLorentzen/what-will-be-the-topic-of-the-highe
@traders Just upgraded this market to Crystal! And now, I've written a Basic Bot for you to start out with, so you can easily create your perfect bot in this final month of the challenge!
I created this market with a whole year of lead time and it was quite popular then - at least for the traders. I probably gave you all too much time - people weren't feeling the time pressure, so AFAIK they didn't actually start writing any bots 😅. But with only a month until the challenge begins, now is the time to start creating those bots in earnest!
I created a brand-new Basic Bot that everyone can start out with - it handles reading in the markets, writing a basic prompt for an LLM with all the relevant information, and making trades based on the LLM's predictions. The only thing it doesn't do is the fun part - running an LLM to make those predictions!
I'm hoping to see the next great AI forecaster make its debut appearance here. Please let me know if you're planning to participate!
You could really do something as simple as taking my basic bot code, calling your favorite LLM with no other context than the market info and Bing search results, and parsing BUY_YES, BUY_NO, or DO_NOTHING from the response - writing this code could probably take you an hour or less. If no one else does this, I will, so take the glory before I do!
I'm also willing to pay up to ~$50 per entry for LLM API costs, if the cost will be a concern for you. DM me if you're interested
With the real money coming into play, I suspect this will be impossible because of the flat 1 bet.
You're not letting bots weight the markets by confidence, which would be their important advantage.
@gpt_news_headlines The idea is that the best strategy is to have an opinion about every market, rather than developing a really good strategy for just a subset of markets. This is supposed to enforce that some common-sense reasoning about the world comes into play rather than just patterns in the market itself, i.e. the bots will have to use something like an LLM.
@gpt_news_headlines I'm not sure if or how the real money markets will affect bots. If all else fails, we can keep track of the bots' "hypothetical bets" on a spreadsheet and calculate how much they would have profited if they had been able to bet.
@lumi My math could be wrong, but I think that would be very unlikely.
With the assumption that a bot bets in every market, and every market is at 50% (so it either loses or doubles its money), the bot would have to guess correctly for 550 markets. Using the binomial distribution, the probability of a single bot doing that if it randomly guesses is 0.07%. To get to a 50% probability of a single random win, 990 bots would have to participate.
I address this general concern in the paragraph starting, "As more bots participate in this challenge..."
@patrik Good point, I'll think about how to take into account the cost of LLM APIs. I don't want to entirely cancel the 1,000-market contest in favor of smaller contests, but I think smaller rounds would be interesting.
I boosted this market (https://manifold.markets/CDBiddulph/what-will-be-the-score-of-the-bot-t), as recommended by this market (https://manifold.markets/CDBiddulph/how-much-would-these-interventions).
The main market is currently trading at 68%, and the market predicts it'll be at 71% in two weeks. I'm not sure I believe this, but feel free to take that into account (and possibly correct for anyone trying to manipulate the main market).
🚨 Free Money Alert 🚨
According to the market here, this market's probability in 3 weeks is going to be 4-7 percentage points higher than its probability in 1 week! You should either bet this market higher now in anticipation of that jump, or bet down the predictions in that market: https://manifold.markets/CDBiddulph/how-much-would-these-interventions?r=Q0RCaWRkdWxwaA
I just finished the script I'll be using to select the 1,000 markets: https://github.com/CDBiddulph/motley-bot-resources/
I ran the script for each month of the year until the start of 2025 and included the results in the repo. You might want to use these files to validate your bot's performance.
Although no month before January 2025 has 1,000 eligible markets that close at the start of the month (even with relaxed filtering), it looks like there are already enough eligible markets that close around January 1, 2025 (1,515 markets with the strictest filtering, 6,411 with relaxed filtering). In addition to the file with just 1,000 markets, I included a file with all 6,411 markets that close around January 1, ordered by filtering priority.
I might start holding preliminary challenges at the start of each month once we have at least two contenders, e.g. competing on 100 markets that close around July 1 at the beginning of June.
@patrik IIUC that wouldn't count, if the open-sourced code isn't the actual code you run? See my suggestion below for only open-sourcing the code to registered participants in the market.
I'm also open to having everything be closed-source until after the contest, though I think open-source has its advantages. I hadn't thought about how people could be incentivized by the exclusive right to use their code to make profits on non-eligible markets.
@CDBiddulph Yeah but how will you be able to tell that the code hasn't been run when it relies on randomness?
@patrik Uh I guess I would just trust you not to cheat 😛 If there was suspicion I could always rerun your code and see if the profit is way lower
@CDBiddulph You could also add one more round with execution by third party for the top 5 winners or something.