This is based on the inaugural longbets.org bet. I think Kapor will win and Kurzweil will lose, i.e., that a computer will not pass [what Kurzweil calls a valid] Turing test by 2029.
((Bayesian) Update: But I admit the probability has jumped up recently!)
See also https://www.metaculus.com/questions/3648/longbets-series-by-2029-will-a-computer-have-passed-the-turing-test/
Real-money version for anyone confident that Kurzweil's side has a good chance: https://biatob.com/p/11788533128982732233 (updated link with new odds offered)
Where do I see what counts as a "turing test" here?
@ShaiNatapov This question is based on this bet: https://longbets.org/1/
You can scroll up to Detailed Terms to see the full definition.
@dreev @IsaacKing What are your thoughts here?
What am I missing?
@NathanpmYoung I see that models are pretty good currently and by 2029 they'll be even better. They don't have to be perfect, they just have to be better than some humans.
@NathanpmYoung it's worth reading the setup of the test for resolving this if you haven't already. They're outlined on the Metaculus question:
Each Turing Test Session will consist of at least three Turing Test Trials. For each such Turing Test Trial, a set of Turing Test Interviews will take place, followed by voting by the Turing Test Judges as described below.
Using its best judgment, the Turing Test Committee will appoint three Humans to be the Turing Test Judges.
Using its best judgment, the Turing Test Committee will appoint three Humans to be the Turing Test Human Foils. The Turing Test Human Foils should not be known (either personally or by reputation) to the Turing Test Judges.
During the Turing Test Interviews (for each Turing Test Trial), each of the three Turing Test Judges will conduct online interviews of each of the four Turing Test Candidates (i.e., the Computer and the three Turing Test Human Foils) for two hours each for a total of eight hours of interviews conducted by each of the three Turing Test Judges (for a total of 24 hours of interviews).
The Turing Test Interviews will consist of online text messages sent back and forth as in a online "instant messaging" chat, as that concept is understood in the year 2001.
Even if they're very human-like, that's a lot of interviewing time, which gives the human foils plenty of opportunity to demonstrate their versatile abilities and thinking styles — and any known shortcoming of LLMs will be tested for.
how often will the test be performed? and can a single AI participate multiple times?
@BjornJurgens We're piggybacking on the Longbets.org bet so that's up to them, but the test might not be needed at all if Kurzweil or Kapor concede. Again, if the bet were resolved today (or there are no more big breakthroughs by 2029) then Kurzweil would concede that AI cannot pass the version of the Turing test that he and Kapor agreed on.
I think a lot of people betting in this market see us NO-predictors as failing to appreciate how smart ChatGPT already is. Personally, I'm utterly gobsmacked by ChatGPT. But I also know that I personally can't be fooled by it in an extended conversation and that it will take more than incremental improvements before that changes.
@dreev It's been fine-tuned to not fool you. Hence all the "as a large language model trained by Open ai". A model optimized for fooling would behave differently.
(Please nobody create an AI optimized for fooling, I like this planet)
@MartinRandall Yeah, that makes it currently an unfair test of the underlying prediction here. But I stand by my claim. We could do a trial with a 3-way group chat -- me as judge, a human, and a human being a proxy for ChatGPT. The human proxy would swap out any intentional giveaways like "as a large language model trained by OpenAI, I couldn't possibly answer that" with their own words but not otherwise insert any human intelligence. I'm certain that, even on the forthcoming GPT-4, I'll be able to suss out the bot.
(Alternate version: the human GPT proxy faithfully reproduces what GPT says and the other human emulates a future version of GPT that is as smart as the human can make it but also still demurs the same way current GPT does on the kinds of questions that OpenAI has trained it not to answer.)
In short, we can factor out the "problem" that GPT is trained not to be deceptive about the fact that it's a bot.
It's unlikely the AI won't be able to pass the turing test in late 2023, let alone 2029.
@AdriaGarrigaAlonso I'm leaning less toward "we won't have the technical capacity" and more toward "the models will be neutered for safety."
Am I correct to assume that this market will resolve the same way as longbets bet it references?
I think this market is updating on this tweet https://twitter.com/sama/status/1590416386765254656 or on insider information. I find it surprising!
the last link doesnt work for me, can anyone tell me where i can bet real money on this?
@JohannMuhlbach Thanks for catching that! The bet expired so I made a new one and replaced the link. Should work now (for another few months before it expires again).
@dreev thank you, i signed up but then i realized i of course want to bet against kurzweil xD
I agree with Daniel Reeves comments, especially since he's the one judging this market. I also think manifold points will likely be worthless is this resolves to YES
@L Presumably Lorenzo believes we'll all be dead.
@IsaacKing I just think I will spend much less time on manifold
@Lorenzo what if one of the manifold bots is the thing that passes the turing test tho :p
All these clowns who’ve never interacted with anyone outside the cognitive elite will call the Turing test meaningless soon enough.
Of course it will pass—and could pass today. It’s extremely narrow AI with an abundantly clear objective function—does a person think it sounds human.
it’s about as difficult as tuning StabilityAI prompts—some version of these giant models with fine-tuning could do this given ~10-100k iterations of the imitation game to focus on chat and remove the less human sounding responses
(There’s a tad more to it than that, but not much. Not a good measure of intelligence. )
Kurzweil himself explicitly disagrees. The version of the Turing test Kurzweil and Kapor have agreed on (and which Kurzweil confirmed last month he's still on board with) is one where experts probe the AI for hours to determine if it's actually human-level, not just whether it sounds human in free-form chatting.
The people who thinks the Turing test is hard have literally never interacted with anyone outside some narrow, niche bubble.
This is the most narrow form of AI imaginable and trivial for ~85 IQ today, doable for average in due time, and probably much harder for the “145 IQ” version.
Don’t confuse impersonation of your social circle with the average human.
Typical humans cannot convincingly convey a cohesive fictional back story when probed by experts for hours. See also: espionage. A machine able to pass this "strict" test would have to be much more intelligent than humans at this task.
I am reminded of this survey of sveral UK MPs:
(I'm sure there was an amusing video along with it, that I now fail to find after quick look.)
@Gigacasting This is one of your best takes on here.
I think Gigagasting is all wrong about this but the great thing is that this prediction market is exactly how we're operationalizing who the one who's all wrong is!
To clarify, I think that as LLMs improve it'll keep getting harder to suss out human-level understanding via text chat -- a few years ago it was easy to do with a single question -- but the extensive Turing test that's the subject of this prediction market will continue to be able to do so. We'll see in at most 6 years now!
I listened to Kurzweil talk about the Turing test a bit on a recent podcast -- https://www.youtube.com/watch?v=ykY69lSpDdo&t=66s -- and he's clear that he's talking about a version where an expert grills the AI for as long as it takes. Ie, this question is a proxy for "will AGI happen by 2029?". I think 50% is still much too high for this market.
@dreev I see the market still thinks AGI by 2029 is likely. I think the market is wrong but not sure how much more mana I want to pour in. My meta prediction is that the market probability will keep climbing as new AI capabilities are hyped, before finally dropping as 2029 approaches and Kurzweil admits that we're still not there (as he very clearly admits currently).
@dreev I think the big question is how large the probability space is where AGI is created and we are willing to let a random human talk to it/them for arbitrary lengths of time, but it does not kill Kurzweil and all humans.
@MartinRandall Yeah, or a possibly more general way to put that is that it only makes sense to bet YES on this if you think we'll get AGI that somehow doesn't make mana worthless (for better or worse).
I think AGI by 2029 is honestly quite a bit below 50% probability, but AGI by 2029 and a world where it matters that you won mana betting YES here? That's even lower probability.
@Gigacasting I just reread the rules at longbets.org/1 and I think the biggest question mark is whether Kurzweil and Kapor will agree on choosing experts as the judges. (And perhaps also whether they agree on choosing articulate, conscientious human foils.)
If so -- and reading Kurzweil's wild sci-fi reasons he expected to win the bet, I think that would be fair -- then you've got experts grilling the AI for essentially as long as they need and that really requires AGI for them to be fooled. Which is what the spirit of the bet was about.
It used to feel obvious to me that we were nowhere near getting computers to pass the Turing test because it was trivial to make up a single common-sense question like "what's bigger, your mom or a french fry?" and the computer would immediately fall on its face, with no hope of actually understanding what was being asked. That changed in the last year or so. Now large language models genuinely understand questions like that. (At least they're getting close to consistently answering them impeccably and I don't know how else to define "genuinely understand".)
But it's still easy to unmask the AI with a handful of follow-up questions. The leap we made recently is mind-boggling but even bigger leaps are still needed before we'll pass the Kurzweil/Kapor version of the Turing Test.
Using experts in ML/AI as judges and 145+ IQ “foils” make it somewhat trickier, but don’t change the fact this does not require “AGI”
It’s a parlor game for which the best thing to do is simply gather vast amounts of data on what people judge as “human sounding”; not only could a machine win within a year (with $1M budget for mechanical turks) but it measures zero higher-primate abilities such as long term planning, emotional states, etc.
Everyone will soon agree this was a dumb test, just as they are “not impressed” that GPT-3 makes Joe Biden look like a lower-Iq-intelligence bird, or that Dall-e has created almost all of the best art made in 2022.
A better test for useful AGI are Rodney Brooks’ househouse servant and hospital-architect tests, and a much simpler one is beat a human in a de novo game made up on the fly. (Ie zero shot tasks a human can adapt to on the fly, not things you can apply supervised learning to, which the Turing test trivially is.)
@Gigacasting Can't you do the do novo game test via text? That sounds like a beautiful example of how an expert judge in the Turing test can test for AGI.