Will there be realistic AI generated video from natural language descriptions by the start of 2024?
resolved Jan 9

Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.

It does *not* have to be *undetectable* as AI generated, merely "realistic enough".

It must be able to consistently generate realistic videos >=30 seconds long to count.

DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).

Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate

Get Ṁ600 play money

🏅 Top traders

Sort by:

Dammit seems I missed by a few months and lost tons of mana because of that.

Should have also bough a lot of yes on the 2025 market duno why o didn't do that.

My predictions for these kind of stuff keep being wrong on the seme direction by a few months, I should take that into account in the future I guess.

I sold all my remaining yes because I think nothing fits the criterion for the record. Most avaliable models don't do coherent 30+ second videos and aren't really realistic.

Due to the size of this market I'm going to leave it unresolved for another few days. I am not going to accept or even review single videos or promotional material.

Stable Video Diffusion also doesn't produce long enough videos.

I cannot find any examples from Runway that are longer than a few seconds, so it also does not meet the length bar.

I've reviewed Pika - it does not meet the length criteria (also almost all of the examples I can find are of it editing existing video).

"Will there be realistic AI generated video from natural language descriptions by the start of 2024?"

That kinda already exists. It depends on your standards. There's one that can last an entire minute but it's not really super-coherent.

predicted YES

Does it have to be a single model or would you accept a solution which combines multiple tools, but from your perspective is a black box: receives a text prompt as input and outputs a 30 second video containing roughly what was described (a single concept or scene)?

predicted YES

What’s quite easily possible right now: https://www.instagram.com/p/C0eDxUhtO8a/

If a script was made, asking an LLM to create image descriptions based on initial prompt, then used them to generate scenes, which then would be joined into at least 30-second video, should it count?

@MrLuke255 This would count as long as it met the other criteria. As far as I can tell this does not actually exist outside of demo land.

I won’t lock my Mana for 10% so I won’t bet but I am 99.99% certain this will resolve as No

Pika and Runway are nowhere near satisfying the crtieria.

how do u define realistic enough?

@HanchiSun there is information in the description and more information in the comment threads

@vluzko So a video generator better than Dall-E 2 suffice?

predicted NO

if it could do clips that lasted more than 3 seconds with a few simple motions, they'd show them lol

@jacksonpolack it's a 54 second trailer

which has 15 different 3 second clips in it??

@jacksonpolack yeah I was saying they're trying to show off a variety of video styles in a short time so no long clips

@derikk @jacksonpolack and, just to be clear, we have no way of knowing how those promotional clips were made, and that sort of capabilities jump is implausible. The content users are producing with their discord seems way worse.

predicted NO

The gap between still DALLE-3 videos and those short clips is big, but not insurmountable, and I entirely believe those clips are AI. I've also seen better clips than those. It makes sense that AI video will master short clips where not much changes other than camera angle a year before it masters long scenes where people act purposefully, interact, etc.

@jacksonpolack I'd agree that predictions a year in advance are tough. It's these one-month predictions for which I don't see a tenable argument.

@cherrvak cool but it looks like they only generate a few frames?

predicted YES

@vluzko well do that a couple times in succession and you’ll get there

@vluzko i would probably accept this quality of video

predicted YES

@vluzko So, you haven’t answered yet if first generating an image using for example Stable Diffusion and then using Stable Video Diffusion a couple of times would be enough to resolve YES. I assumed that yes

@MrLuke255 Are you asking if doing that will resolve the market, or could resolve the market? I would not reject that procedure (I am fine with, say, a composite model that first generates major frames and then interpolates between them). However I seriously doubt that the procedure you described would actually work.

predicted YES

@vluzko If you gave me time until the end of week I could try and assemble such a pipeline

predicted YES

If it's too much time, do what you have to do 😅

More related questions