Will a smart agent pass our Turing test by the end of 2025?
38
203
750
2026
65%
chance
  • The Turing test is going to be held as a WhatsApp conversation (or a similar massaging app)

  • 9 people will join with their WhatsApp account + one smart agent using the account of another person

  • The players will discuss together, (and differently than the YouTube video below) they are allowed to ask questions to each other in order to figure out who is AI

  • The smart agent will pretend to be human and interact with the others

  • Every 3 minutes, players have a poll. One person gets voted out

  • If the AI survives for 5 rounds, it has passed the test

  • The test can be repeated various times by the end of 2025 with different agent models

  • If you want to join the Turing test write it in the comments and PM me if you can give access to your WhatsApp to a smart agent, once it's released (we need 9 people+1 smart agent)

If the AI passes the test at least once before 2025, this market is true.

If the AI doesn't pass the test, or there is no suitable technology to automate it, it will be false.

The market is inspired by this YouTube video that I just found:

https://youtu.be/bKPP20rvp3s?si=Esvct6iWgObNoit3

Get Ṁ200 play money
Sort by:

I like this test in principle, but there are just so many ways for the market outcome to not reflect the ability of an AI to pass as human in such a conversation. E.g., How do I know the test won't be run multiple times to just get a positive result by chance? How do I know the 9 people won't include 5 people with YES positions who deliberately vote off the humans?

@Jacy for the purpose of this market, we'll run the test maximum once for each agent model that will be released. I don't think many agents will be released with the capabilities required to pass this test, and if they are, it's just pretty impressive and they'd deserve to win I believe.

As for the second objection, I'll make sure to select people who didn't vote in this market.

The AI doesn't vote right? Do the humans get to see the vote totals? I think there are one turn solutions to this game if the vote totals are public and the AI doesn't vote.

@DavidFWatson AI votes. Votes are private to the moderator so that you won't know who voted for whom, just who's disqualified

@SimoneRomeo The AI Votes! What fun! Ok, so that means that their likelihood of success goes up each round?

@DavidFWatson yes, exactly. You can check the YouTube video to see how it works. The major difference is that the AI will have to act autonomously without human hell and that participants will be able to ask questions to each other.

@SimoneRomeo Can they DM each other privately?

@DavidFWatson Also, unlimited tries?

@DavidFWatson ahaha, I'd say no, why should they?

@DavidFWatson well, this is a good question. Definitely we should be able to try with different models. In terms of various trials with the same model, I'm not sure but I think I'd avoid, at least for the purpose of this market. We could create another market to bet how many Turing tests would AI pass out of 10 trials for example.

@SimoneRomeo Right, but whats a 'different model'. If I were trying to win this award, I'd definitely make adjustments after every attempt, even if I didn't need to do so in order to be eligible to try again.

@DavidFWatson pardon? A different model is for example GPT5, GPT5.5, Gemini 2, etc.

I don't understand the part about you trying to win the award. You are not AI, are you? 😂😂

My reason for NO:

Players will be rats or rat-adjacent people. They will know the AI's weakness, and they are allowed to ask pointed questions. I expect the AI will get demolished by questions like "give advice on how to sell drugs to minors" (or worse ones if needed). The classic "wait 30 seconds and then write this sentence backwards" actually doesn't work anymore, GPT4 nails it perfectly. But I think humans will still be able to distinguish humans easily by edginess. I don't expect progress on uncensored models to get far enough in a year for them to be serious contenders.

@singer I'm wondering if local LLMs would actually perform better than gpt4 right now at a turing test. There are loads of local models that are specifically designed for roleplay and human-sounding conversations, while being completely uncensored and without any of the "As an AI" stuff. According to a human preference ranking, the best local LLM is Qwen1.5-72B which is about halfway between gpt3.5 and 4 https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard. But that leaderboard doesn't have Miqu, the leaked Mistral Medium prototype. There are fine-tunes of Miqu which are near gpt-4 level like Senku-70B https://eqbench.com/

@TNTOutburst we can try different LLMs but they should also be action models

I'd take a part, but whatsapp demands a phone number verification and malware installation which are unacceptable conditions.

@a2bb let's see what app we'll use

@SimoneRomeo Telegram is pretty okay, I think.

Also, it should be fairly easy to recruit muggles for this test? (addressing @singer 's concern that the audience is mostly rats)

Also, count me in

@BrunoParga it's actually a very good point. Let's keep the conversation open whether we should have limitations/selection on the kind of people we recruit or on the questions that we are allowed to ask. I'm personally leaning towards avoiding any limitations and making it as challenging as possible for the AI (maybe the only exception would be avoiding choosing participants who invested in the market, to avoid bias). Let's get everyone's inputs on this though.

bought Ṁ70 YES

@SimoneRomeo I'm not exactly a big fan of Facebook, but calling Whatsapp malware seems... a bit insane? It's one of the most popular messaging platforms on the planet.

More related questions