Will there be realistic AI generated video from natural language descriptions by the start of 2025?

101

1.1kṀ30k

resolved Jan 17

Resolved

ALL

Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.

It does *not* have to be *undetectable* as AI generated, merely "realistic enough".

It must be able to consistently generate realistic videos >=30 seconds long to count.

DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).

Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate

Update 2024-23-12 (PST) (AI summary of creator comment): - Videos must be coherent throughout the full duration - meaning they must maintain consistency with the original prompt for the entire video without shifting between unrelated scenes
- Looped scenes do not count
- A single example of a successful video is not sufficient for resolution
- The video must show continuous action/motion (like "two people walking down a city street having a conversation") for the full duration

Update 2024-24-12 (PST): - The success rate must be at least 66% of DALL-E 2's rate, not a flat rate. (AI summary of creator comment)

Update 2025-05-01 (PST) (AI summary of creator comment): Evidence must be publicly available. Having the model publicly available does not suffice.
- If sample videos meeting the criteria are found, the market will be delayed until more information is available.

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ5,279
2		Ṁ1,087
3		Ṁ731
4		Ṁ331
5		Ṁ319

People are also trading

Will there be realistic AI generated video from natural language descriptions by the start of 2026?

56% chance

Will AI generate realistic video of animal movement before 2026?

93% chance

Will video generation AI make more product revenue than text models in 2025?

17% chance

Will AI-generated content be nominated for a major award (like the Oscars or Grammys) by the end of 2025

4% chance

Will we have end to end AI generated anime series by the end of 2025?

3% chance

Will AI-generated video be used to get away with a criminal (felony) loss of life before the end of 2025?

11% chance

Will AI generates film (not video) by using a person's script before 2026

10% chance

Will AI be able to create a video with this much detail by 2030?

92% chance

Will real-time text-to-video generation be viable by 2027?

35% chance

Will AI video become indistinguishable from reality by 2030?

Sort by:

From what I've seen of Veo 2 it meets all of the criteria except duration, and I think it's very likely that if it could generate 30 seconds videos Google would have posted those. I will give this another week ish just in case they're holding out for some reason, and then resolve NO.

@vluzko So the model seems capable of longer duration, the limit is in the interface.

As reported by techcrunch:

Veo 2 can create two-minute-plus clips in resolutions up to 4k (4096 x 2160 pixels)...
It’s a theoretical advantage for now, granted. In Google’s experimental video creation tool, VideoFX, where Veo 2 is now exclusively available, videos are capped at 720p and eight seconds in length.

Is this market about what exists, or what is publicly available? Does it make a difference to you if we can find longer sample videos?

@robm Evidence publicly available yes, model publicly available no. If you can find sample videos that meet the criteria I will at least delay resolving the market until more information is available.
I suspect the above quote is technically true bullshit, in that the model will happily spit out frames until it OOMs.

@vluzko I agree it should resolve NO, but that is not the only criterion Veo 2 does not meet.

additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."

This was a stronger reason to bet NO than limited length.

I think it's unlikely that Veo's outputs are cherry picked or that it wouldn't pass this, but yes it does need to pass this as well.

I do not see this resolving YES given op's clarifications.

@jgyou yet another maximally entertaining market lol, could have resolved the other way given a few more weeks and/or if goal posts had been placed slightly more permissively

It doesn't say that 30 seconds must be done in one go, so why doesn’t Sora count?

@MrLuke255 I am pretty sure Sora cannot consistently produce coherent 30 second videos even in chunks, but feel free to share examples

@vluzko What do you mean by “coherent”?

I don’t have a subscription and if I bought one, I would be able to generate only a single 30 second video I believe(?). A single example wouldn’t suffice I guess?

So, is this market conditional on someone providing a proof you deem sufficient?

@MrLuke255 Coherent as in it needs to be able to stick to a prompt the whole time - I've seen many examples of videos where the video shifts every few seconds to a different scene, but those don't count. I've also seen videos that are basically a single scene on loop for thirty seconds, those don't count either. You should be able to prompt it with something like "two people walking down a city street having a conversation" and get that for thirty seconds.

Single examples do not count.

If you mean "will I pay money to resolve this market" the answer is no. I wouldn't recommend spending money on Sora to try to resolve this, I haven't seen Sora make anything that would resolve this market.

bought Ṁ250 YES

@VincentLuczkow resolves YES

@Bayesian Can you link an example that you think resolves this market positive?

@ElliotDavies most of these in my opinion: https://x.com/shlomifruchter/status/1868974877904191917 https://www.youtube.com/watch?v=_q4YR_Jzjag https://www.youtube.com/watch?v=dFMjA-9khy8

x.com

@NielsW None of these are >= 30 seconds long and so they do not resolve the market.

@vluzko SORA can compose multiple clips though

@jgyou eg https://x.com/anukaakash/status/1870395272544858341?t=TapL66ESZu6Gs24ZRs5NQw&s=19

@vluzko additionally, you mentioned that there needs to be sufficient access to show from testing a success rate of at least ~66% [edit: sorry, 66% of DALL-E 2's rate, just to be clear] at consistently generating such videos. And then that test needs to pass. There's cause for a lot of skepticism at each step, even if many published videos are sufficiently "realistic."

@vluzko the veo model can create longer clips, but it's limited in the interface currently available.

@jgyou this is not even close to what I would accept in terms of a coherent 30 second video

@Jacy note that I said 66% of DALL-E 2's rate, not flat.

bought Ṁ10 YES

Sora.

bought Ṁ50 NO

Still not released. I've thought from the start that Sora is an extremely expensive to operate tech demo aimed at partners in the movie industry. NOT a consumer product.

Furthermore, the example prompt of "puppy playing with kitten" is beyond demonstrated Sora capabilities.

bought Ṁ200 YES

https://openai.com/sora Looking pretty DALL-E 2 quality 👀 to me (reasonable to wait for the dust to settle re possibility of cherry picking though)

@CalebW The examples on that page meet the quality bar and one of them is >30 seconds long. I think it is very likely that this will resolve the question YES, but I am going to wait to make sure they're not cherry picked.

@vluzko thanks for the info. Can you say more about how you will determine this? E.g., at what approximate percentage will the model need to take a prompt of the difficulty ("Give me a video of a puppy playing with a kitten"), specified to be over 30 seconds if that specification is possible, need to produce a video of the "Tokyo street" video?

And does a YES resolution require third-party access such that you or a trusted person can test cherry-picking?

@Jacy I'm going to go back to the informal evals done with DALL-E 2 when it was released to get a rough sense of what fraction of generated images were reasonable at different levels of prompt complexity. I'll accept a video generator if its success rate (for 30 second videos) is, say, >=66% of DALL-E 2's.

@vluzko thanks! I take that to mean third-party access will be required so you or someone you trust can run that test. Personally, I think the progress in text-to-video is really impressive, but I expect there to be major challenges in getting video of the quality of what's in the company's announcement showcase—similar to what we saw with Pika a few months ago.