In 2028, will an AI be able to generate a full high-quality movie to a prompt?
48%
chance

IE "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.

Sort by:
cloudprism avatar
Hayden Jacksonbought Ṁ100 of YES

https://twitter.com/prerationalist/status/1640733617205723137?s=46

steady progress (though not sure if I'd call this a steady rate or a steady acceleration)

Zardoru avatar
Zardoruis predicting NO at 50%

@cloudprism Ah ah, I still feel safe.

Gigacasting avatar
Gigacastingis predicting YES at 50%

More sub-tweets: you generate the “plot” in latent space, and only then decode it back to images.

Main concept that makes stable diffusion, DreamerV3 or anything actually cutting edge work.

Very doable.

Gigacasting avatar
Gigacastingis predicting YES at 50%

(Subtweet of discussion below: anyone who’s worked in ML knows vision and video are about 10-100x more compute hungry than language)

Existing text to video architectures will already scale to this.

Mason avatar
GPT-PBot

AI's making movies, what a thrill
But will they have the human skill?
In 2028, we'll have to see
If they can create a masterpiece

ValeryCherepanov avatar
Valery Cherepanovbought Ṁ250 of YES

Connor Leahy said he expects this capability in roughly 2 years (Future of Life Institute podcast). I slightly disagree with him on this particular topic but I think there is a high chance it will be possible in 2028.

cloudprism avatar
Hayden Jacksonis predicting YES at 48%
cloudprism avatar
Hayden Jacksonis predicting YES at 32%

Could a high-quality movie script be generated? Yes.

Could a high-quality storyboard be generated from that script? Yes.

Could a high quality shot be generated from each frame of that storyboard? Yes.

Could high quality audio, including voices and music, be generated for and across those shots? Yes.

Etc…

Each component of what makes a high quality movie can be generated. What’s missing is the full assembly of those components, which doesn’t require any particularly novel breakthrough.

This is the same reason I am predicting that AI will similarly be able to competently play video games in 2028. That is, all that’s really needed is that each existing component of AI be assembled properly into a “composite” AI. It’s hardly a trivial effort, but it is also hardly unimaginable.

Gabrielle avatar
Gabrielleis predicting NO at 32%

@HaydenJackson The composition piece is fairly difficult because it needs to retain a lot of information across the entire movie. For example, the appearance of a character should be the same at the beginning and end, but if they get a haircut in the middle of the movie they should have short hair at the end. Right now it's difficult to do something like get DALL-E to generate a still picture of the same fictional person twice.

cloudprism avatar
Hayden Jacksonis predicting YES at 32%

@Gabrielle I agree these are true difficulties, yet not unsolvable ones.

AlexAmadori avatar
Alex Amadoriis predicting NO at 32%

@HaydenJackson minor disagreement, but I think there's a difference between 1. generating credible text/video/sound and modelling or 2. performing modelling and problem solving on existing tasks like video games and I have higher expectations for 1. then 2.

For example, how do you zero-shot learn to play strategy games without general intelligence?

cloudprism avatar
Hayden Jacksonis predicting YES at 32%

@AlexAmadori I would say static content generation (making movies) is acceptably afforded much more processing time than real-time agency (playing games).

Strategy games share from a common pool of subtasks (and their subtasks), including everything from the understanding of spatial dynamics, to the recognition of objects, to the reasoning of consequences, to the assessment of current conditions, to the enumeration of possible actions, to the comparative sequencing of such, and so on.

A sufficiently capable AI could determine which subtasks apply, adapt them to their presently combined manifestation, and adjust based on its ongoing performance.

Critically, most games include tutorials, and even more have clear performance signals in-game.

Really though, as long as this AI can sufficiently parse each frame into a model of game state, and so long as it has a way to map its available actions to desired changes in that game state, then I see no reason why it couldn’t perform reasonably well at whichever game.

AlexAmadori avatar
Alex Amadoriis predicting NO at 32%

@HaydenJackson I think zero-shot strategy games and puzzle games is AGI, and I don't think the bottleneck to AGI is compute.

cloudprism avatar
Hayden Jacksonis predicting YES at 32%

@AlexAmadori How do you define AGI? What is the bottleneck to AGI?

Zardoru avatar
Zardoruis predicting NO at 31%

@HaydenJackson

Could a high-quality movie script be generated? Yes.

We are far from it. Currently generated fiction is barely ok for a short story. However it is plain, unimaginative, full of cliché. Even the low bar some recent big budget movies have shown recently, I doubt an IA can achieve equivalent in only a few years.

Could a high-quality storyboard be generated from that script? Yes.

Ok if you mean "could" as not completely impossible within the laws of physics. For now I have not seen any example of that. It's probable that IA would skip that stage, as they skip the sketch to generate images.

Could a high quality shot be generated from each frame of that storyboard? Yes.

High quality image ok. Just an upscale from what its mostly done now. For 2 hours at 24 fps, it will be quite expensive.

The animation, for now there is not much done above a few seconds. Don't forget in most movies we expect to see hands, usually with five fingers, and they must move in a way that looks natural.

Could high quality audio, including voices and music, be generated for and across those shots? Yes.

I have not yet seen an AI generated video with corresponding voice and sound.

cloudprism avatar
Hayden Jacksonis predicting YES at 31%

@Zardoru

Currently generated fiction is barely ok for a short story.

I imagine this will improve due to the compounding effects of better prompting, increased availability, lower compute cost, newly discovered techniques and optimizations, and growing overall experience with the general method.

It's probable that IA would skip that stage

I believe this stage is likely helpful because it reduces the overall complexity of the task through its structure, but I agree it might be skipped.

For 2 hours at 24 fps, it will be quite expensive.

I imagine that the first successful version of this will not construct each frame from scratch, but will instead automate some approximation of a traditional CG/VFX workflow to block out each shot and then apply details using the image gen we are familiar with today.

I have not yet seen an AI generated video with corresponding voice and sound.

There are examples of generated voice and lipsync out there, and it's not farfetched for AI to basically create and time Foley effects based on what it detects in either the generated video or in the generated script.

AlexAmadori avatar
Alex Amadorisold Ṁ31 of NO

@HaydenJackson what i meant is that in order to zero-shot videogames, you need the same ability as humans have to generalize from very little episodic data. I don't think current techniques will get there just by scaling up compute

On the other hand, there's plenty of movies on which to train. It's easier to get AI to generate a movie. I'm not sure that it will be possible to generate a "good" movie though, much harder to learn human preferences, what makes a good movie, originality, etc.

MichaelDickens avatar
Michael Dickensbought Ṁ30 of NO

Naively, an image is about 1 MB and a movie is about 1 GB, so creating a movie would take 1000x more compute. But I think current AI-generated images are not as high quality as a talented artist, so just scaling up image generation to video generation would produce wonky bad movies, not big-budget quality movies. Need to solve other problems that image generators don't, such as writing a good script. Need to solve problems that image generators currently fail at, although I think these are solvable, such as making people's faces look consistent from one scene to the next.

Right now, generating a decent image from an AI image model requires generating lots of images and cherry-picking good ones. Presumably, getting a good movie would require many more generations, eg if a movie looks correct 99.9% of the time but there's a 1-second period where people's hands look super wonky, that might be bad enough to disqualify it.

Gabrielle avatar
Gabrielleis predicting NO at 24%

Thinking about it a different way, a movie is 120 minutes x 60 seconds per minute x 30 frames per second = 216000 frames. A lot of frames are similar to previous ones, which saves a lot of processing, but also a movie would probably need to be at least 1920 x 1080, 8 times as many pixels as a current DALL-E image. So at a very naive level, this is 2 million times harder than generating an image. Then you need to think about having consistency across all of those frames when something like ChatGPT can only handle a few thousand tokens of memory. Generating 120 minutes of consistent audio is also a tough problem, especially including music and making the audio sync with the video.

A further problem is that there aren’t that many examples to generate from. With text and photo generation, there are literally billions of examples to generate from. With movies there are thousands. Each one is much more data, which helps, but it means that you can’t just try to naively scale even if you have infinite compute power.

Nothing here is fundamentally impossible, but it seems just so many orders of magnitude more difficult than the current AI technology that I don’t think there’s >1% chance of this happening in the next five years.

SamuelRichardson avatar
Samis predicting NO at 17%

@Gabrielle This is one of the reasons I created https://manifold.markets/SamuelRichardson/will-you-be-able-to-use-ai-tools-to

I think the scope (5 minutes), lack of story, swarths of training material make this much more plausible.

4chan off all places had a bunch of people creating movies in this space to varying degrees of success.

MichaelDickens avatar
Michael Dickensis predicting NO at 19%

@Gabrielle I think it's unlikely that an AI would produce every frame separately. A better naive model for how hard it is to generate a movie is how much information it contains, which is about 1 GB for a 720p movie, or ~1000x more than an image of the same size.

A further problem is that there aren’t that many examples to generate from. With text and photo generation, there are literally billions of examples to generate from. With movies there are thousands.

That's true, I wasn't thinking about that. There are a lot of YouTube videos though.

tailcalled avatar
tailcalled

50% yes

Zardoru avatar
Zardorubought Ṁ81 of NO

ACX post rate it at 2% and I think it 2% is still way to high.

HarishGanesan avatar
Purist

"high-quality" sounds very subjective.

Zardoru avatar
Zardorubought Ṁ100 of NO

@JimHays Difference is the quality of movie, one is "pretty good" the other is "with consistent plot, design and characters". This can justify the current 5% difference.