Will there be realistic AI generated video from natural language descriptions by the start of 2025?
➕
Plus
86
Ṁ17k
Jan 2
72%
chance

Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.

It does *not* have to be *undetectable* as AI generated, merely "realistic enough".

It must be able to consistently generate realistic videos >=30 seconds long to count.

DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).

Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ10 YES

Sora.

bought Ṁ50 NO from 76% to 72%
bought Ṁ50 NO

Still not released. I've thought from the start that Sora is an extremely expensive to operate tech demo aimed at partners in the movie industry. NOT a consumer product.

Furthermore, the example prompt of "puppy playing with kitten" is beyond demonstrated Sora capabilities.

bought Ṁ200 YES

https://openai.com/sora Looking pretty DALL-E 2 quality 👀 to me (reasonable to wait for the dust to settle re possibility of cherry picking though)

bought Ṁ10 NO from 80% to 79%

@CalebW The examples on that page meet the quality bar and one of them is >30 seconds long. I think it is very likely that this will resolve the question YES, but I am going to wait to make sure they're not cherry picked.

@vluzko thanks for the info. Can you say more about how you will determine this? E.g., at what approximate percentage will the model need to take a prompt of the difficulty ("Give me a video of a puppy playing with a kitten"), specified to be over 30 seconds if that specification is possible, need to produce a video of the "Tokyo street" video?

And does a YES resolution require third-party access such that you or a trusted person can test cherry-picking?

@Jacy I'm going to go back to the informal evals done with DALL-E 2 when it was released to get a rough sense of what fraction of generated images were reasonable at different levels of prompt complexity. I'll accept a video generator if its success rate (for 30 second videos) is, say, >=66% of DALL-E 2's.

@vluzko thanks! I take that to mean third-party access will be required so you or someone you trust can run that test. Personally, I think the progress in text-to-video is really impressive, but I expect there to be major challenges in getting video of the quality of what's in the company's announcement showcase—similar to what we saw with Pika a few months ago.

@Jacy Third party access (but not necessarily general/public access) will be required

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules