Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.
It does *not* have to be *undetectable* as AI generated, merely "realistic enough".
It must be able to consistently generate realistic videos >=30 seconds long to count.
DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).
Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate
@MrLuke255 I am pretty sure Sora cannot consistently produce coherent 30 second videos even in chunks, but feel free to share examples
@vluzko What do you mean by “coherent”?
I don’t have a subscription and if I bought one, I would be able to generate only a single 30 second video I believe(?). A single example wouldn’t suffice I guess?
So, is this market conditional on someone providing a proof you deem sufficient?
@vluzko additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."
https://openai.com/sora Looking pretty DALL-E 2 quality 👀 to me (reasonable to wait for the dust to settle re possibility of cherry picking though)
@CalebW The examples on that page meet the quality bar and one of them is >30 seconds long. I think it is very likely that this will resolve the question YES, but I am going to wait to make sure they're not cherry picked.
@vluzko thanks for the info. Can you say more about how you will determine this? E.g., at what approximate percentage will the model need to take a prompt of the difficulty ("Give me a video of a puppy playing with a kitten"), specified to be over 30 seconds if that specification is possible, need to produce a video of the "Tokyo street" video?
And does a YES resolution require third-party access such that you or a trusted person can test cherry-picking?
@Jacy I'm going to go back to the informal evals done with DALL-E 2 when it was released to get a rough sense of what fraction of generated images were reasonable at different levels of prompt complexity. I'll accept a video generator if its success rate (for 30 second videos) is, say, >=66% of DALL-E 2's.
@vluzko thanks! I take that to mean third-party access will be required so you or someone you trust can run that test. Personally, I think the progress in text-to-video is really impressive, but I expect there to be major challenges in getting video of the quality of what's in the company's announcement showcase—similar to what we saw with Pika a few months ago.
Were getting there real quick.