Seems like this resolves no, unless @vluzko decides that gpt +vision thing counts as being able to do this.
@VictorLevoso I am not going to resolve the 2029 market based on the GPT + vision thing
For people asking for clarification about what is needed, here is the original suggestion from the linked New Yorker article. TL;DR: can watch an arbitrary TV program or YouTube video and answer questions about its content. Examples given: war documentaries, Breaking Bad, the Simpsons, Cheers.
I'm not Gary or Vincent, but I'm going to go ahead and posit that this ability won't be acknowledged as achieved by Gary at least, unless it can be done in real-time, with the AI listening to the audio, as a human would, without subtitles. Arbitrary videos don't have subtitles, in any case.
As I have learned from two decades of work as a cognitive scientist, the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers. So, in the hope of channelling that energy towards a project that might bringing us closer to true machine intelligence—and in an effort to update a sixty-four-year-old test for the modern era—allow me to propose a Turing Test for the twenty-first century: build a computer program that can watch any arbitrary TV program or YouTube video and answer questions about its content—“Why did Russia invade Crimea?” or “Why did Walter White consider taking a hit out on Jessie?” Chatterbots like Goostman can hold a short conversation about TV, but only by bluffing. (When asked what “Cheers” was about, it responded, “How should I know, I haven’t watched the show.”) But no existing program—not Watson, not Goostman, not Siri—can currently come close to doing what any bright, real teenager can do: watch an episode of “The Simpsons,” and tell us when to laugh.
@firstuserhere This is impressive! It seems this market will need some clarifications, like what counts as a “movie” (minimum length, does it need to be released in theaters, does it need to work in multiple genres - ex maybe a nature clip is easy where sound is not important, but the average person hears “movie” and thinks of a feature film with lots of dialogue that is important to understanding the plot)
@JoshuaHedlund so my criterion for this one is basically "whatever Vincent Luczkow's criterion is unless its straightforwardly wrong" so you'll have to ask on the original market.
@VictorLevoso like maybe it takes longer than this, (and now that @firstuserhere bought yes up to 50% I don't think I would buy more yes) especially if openAI is willing to take longer to publish a video-gpt or whatever, but also seems like something that could come up around december for example and I wouldn't be that surprised.
I mean gpt4 can already answer questions frame by frame.
Actually I guess figuring out how to mix new modalities sounds more like a deepmind does it first thing.
Maybe they release flamingo2 now with video.
I guess maybe we get a short video version first and it takes 1 more year until something like that.
But at the same time that seems like you should be able to coble together something that resolves the market using a shorter context version.
@VictorLevoso turns out video stuff was slower than I though, I think partly cause people can't buy enough gpu and openAI was likely more focused on perfecting images than on video.