In early 2028, will an AI be able to generate a full high-quality movie to a prompt?
3.6k
11kṀ8.2m
2028
36%
chance
3

EG "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.

Get
Ṁ1,000
to start trading!
Sort by:

Seeing the progress on three fronts in just the past 3 months really moved me from a soft yes to a strong yes on this one:
1. GPT 4.5 has improved in writing quality to such a massive degree that I find some paragraphs it generates actually compelling and interesting. (I publicly shamed 4.5 when it came out for the poor benchmarks and high cost, but it seriously is good in writing usage)
2. Runway Gen-4, Veo and the Chinese video models have shown there's a ton of gains to still be made there and past problems like lip-sync, consistent characters, and camera motion are diminishing. Also there was a decent jump forward in this domain just recently improving story and character consistency for much longer cohesive shots: https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
3. Agentic systems like Deep Research are showing that more advanced orchestration of models can achieve much larger and more impressive outputs than individual models with no tool assistance.

One can imagine a system that reinterprets the simple prompt and generates a few high level story lines, then turns one that scores highly on some heuristic into a screenplay, then refines this through a few passes, and then generates the shots and audio accordingly and strings them together.
I'm not saying this will be great, it will almost certainly be slop, but it can be done today you just need all the parts to get better. The standard here is "more or less comparable to a big-budget studio film" and that is a surprisingly low bar if you look at some of the slop film on Netflix, especially for kids. These have big budgets and are made by studios.

If we don't see reasonably strong iterations on all of the fronts mentioned above over 3 years (it's been 2.5 since ChatGPT dropped) I will be very surprised. I'm not even banking on GPT3 -> GPT4 type jumps, just 4-5 rounds of refinement to the biggest issues in these component models and some effort to tune the overarching system that stitches them together.

@FraserPaine
>1. GPT 4.5 has improved in writing quality to such a massive degree that I find some paragraphs it generates actually compelling and interesting. (I publicly shamed 4.5 when it came out for the poor benchmarks and high cost, but it seriously is good in writing usage)

Yeah, I have been using Gemini and the script-writing portion is pretty much solved at this point.

> 2. Runway Gen-4, Veo and the Chinese video models have shown there's a ton of gains to still be made there and past problems like lip-sync, consistent characters, and camera motion are diminishing. Also there was a decent jump forward in this domain just recently improving story and character consistency for much longer cohesive shots: https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf

I agree that there are gains to be had. But progress has been much slower than I expected. As a reminder, Sora was over a year ago and it remains SOTA.

> 3. Agentic systems like Deep Research are showing that more advanced orchestration of models can achieve much larger and more impressive outputs than individual models with no tool assistance.

Just for fun, I tried using a browser agent to generate a movie.

Attempt 1: told it to use Sora/Clipchamp. It failed to log in and immediately quit
Attempt 2: told it to use my custom movie-making app. It opened the app, changed the title to "star trek vs star wars", clicked "create movie" and then exited

Attempt 3: wrote detailed step-by-step instructions on how to use the movie editor. It opened the movie editor, wasn't able to interact with the timeline object, and went into an infinite loop

I'm sure that with sufficient amount of finagling the UX and prompt engineering I could get it to create something, but I sincerely doubt it would outperform my custom python script for making movies. This has been my experience with agents in general, if the base model isn't already capable of doing something you don't get any extra oomph from using an agent framework.

@LoganZoellner sorry to clarify I didnt mean any of the agent franeworks today unlock this capability, DeepResearch is almost certainly all done in code not some drag and drop low code orchestrator. I agree with everything else youve said except that Sora is SotA, I think Veo is much better than even the original unreleased Sora for standard shots before they lobotomised it, and Runway Gen 4 that just came out is less smooth sometimes but gets some phenominal midjourney style artistic shots nearly perfect.

People are saying that the performance on shots over a minute isnt good enough but you should look up "average shot duration in films" its been falling since the 60s and its now around 5 seconds on average with almost 0 unbroken shots running longer than 45 seconds.

@FraserPaine I don't think anyone is saying "Yeah AI video is great for 5 second shots but it can't do a 60 second unbroken take". The problem is that AI can't make a 60 second *scene* without humans putting in large amount of scaffolding.
The Tom and Jerry videos required almost 800k words of human annotation, specifically commissioned for that project, to fine-tune. The Coke ad required a whole team of editors behind the scenes. Jason Zada said his "Fade Out" film took "a few days, a few hours here and there" and even with that time investment and access to Veo 2, it still has crap continuity.

@FraserPaine

> I think Veo is much better than even the original unreleased Sora

I agree that Veo is better than Sora, but it is not 10x better. You cannot, for example, give Veo a 10k word prompt and get out a coherent 10 minute long video (without significant human editing)

> People are saying that the performance on shots over a minute isnt good enough but you should look up "average shot duration in films" its been falling since the 60s and its now around 5 seconds on average with almost 0 unbroken shots running longer than 45 seconds.

The problem is not literally "the video model cannot create a 10m unbroken shot". In fact, a human can create a 10m unbroken shot with any of the SOTA models (Veo, Wan, Kling) by extending shorter clips.

The problem is that AI agents are all terrible at long-term planning, and the rate at which they are becoming less-terrible is not fast enough (absent some breakthrough) for a 2028 resolution.

Deep Research is the exception that proves the rule. By breaking research down into a map-reduce framework, each of the individual "threads" of the agent only has to stay on task for a short period of time.

Unfortunately, there does not appear to be a way to break down a movie into such a map-reduce framework because movies have causality. If there is a spinning top in scene 1 of Inception, it needs to be the same top in the final scene.

Now, it's possible that there is a clever way to break a full movie into a set of tasks each of which only require a minute or two of "attention". Maybe you can write a script overview, then generate a set of characters, props and settings, then combine these into a sequence of clips. I and others have been trying to do this and will continue trying to do this. But the models simply aren't ready yet. We need something like 4o imaggen but for video. No one is building such a model, however, because inference costs would be astronomical.

So, basically, we won't have the tools we need until prices come down (by at least 2 orders of magnitude). But by that time, the bitter lesson suggests we probably won't need any of this scaffolding. In ~5 years (assuming the current rate of improvement continues) we will just be able to prompt GPT-9 "make a movie" and it will just work.

Now, if something is possible in 5 years, then it will be almost possible in 4 years, and probably someone will come up with some clever way to do it (scaffolding, dedicated movie model). But we need at least one "breakthrough" for this to happen before 2028.

@LoganZoellner I think we're on the same page about all of this, I just think the scaffolding is more achievable than is being assumed, your point about generating characters props and settings is exactly right. You can use GPT 4o to generate the first frame of each shot for improved consistency of characters, objects, etc.
My main argument here is not that there will be a breakthrough to Avatar 2 level film making from a single 10 word prompt, it's that the threshold being set of "big budget studio film" is actually quite low. If it's an animated film then photo-realism is removed, lip-sync is easier, and story quality drops down to what a kid would accept on Netflix.
I suspect that you could, today; have a 4.5 type model generate a structured shot list, iterate through those shots, retrieve relevant objects and characters and prompt the first frame from 4o, then pass those frames and the shot description to a SotA model for ~3 variations. You would likely get a usable sequence of shots for a ~5 minute scene with a person picking the best of 3 shots, sound would still be a challenge.
My argument is that, without a breakthrough; the shot quality will go up, the planning will improve, the audio will get more tightly integrated, and the orchestration will get more advanced as people test it.
I think the films produced will be slop, and will have obvious faults, but again, the criteria of this market includes "it doesn't have to pass a full Turing Test as long as it's pretty good" this is extremely subjective. but I can very easily picture this approach being better than the garbage on Netflix for Kids right now and that would tick this box for many interpretations.

If the maker of this market wants to update and clarify that the bar is much higher than I'm assuming then I'll change my bet, but 3 years is a long time, it's been 1 year since Sora. Also there's an unfortunately large incentive for someone to solve slop generation for making Youtube Kids content.

@FraserPaine >My main argument here is not that there will be a breakthrough to Avatar 2 level film making from a single 10 word prompt, it's that the threshold being set of "big budget studio film" is actually quite low. If it's an animated film then photo-realism is removed, lip-sync is easier, and story quality drops down to what a kid would accept on Netflix.

When Scott Alexander introduced this prediction on his blog, he specifically said that mediocre cartoons were not what he had in mind.

He also specified that the AI should be able to make a motion picture "to your specifications", such as a Star-Trek/Star-Wars crossover. That would preclude a model that could not make photorealistic films like Star Trek and Star Wars.

@GG "big budget studio film" is a much higher bar than you think. Think about the big budget animated films, like Pixar and Disney blockbusters. They're not "photorealistic" but they still have stunningly high visual quality. I don't think that making a movie that is of the level of quality of a Pixar or Studio Ghibli movie (even given the recent Ghiblification trend , lol) is a much lower bar than making a normal live action movie.

@FraserPaine

> I suspect that you could, today; have a 4.5 type model generate a structured shot list, iterate through those shots, retrieve relevant objects and characters and prompt the first frame from 4o, then pass those frames and the shot description to a SotA model for ~3 variations. You would likely get a usable sequence of shots for a ~5 minute scene with a person picking the best of 3 shots, sound would still be a challenge.

This is almost an exact description of the process I used to produce this video

The caveats being: I used Gemini 2.5 pro instead of GPT 4.5 (but I don't think that was the problem since Gemini is a stronger model) and I used Gemini 2.0 flash instead of 4o imagegen (because 4o imagegen does not have an API).

When I say "the tools are currently not ready", it is because I use them on a regular basis. I think even to get to 5 minutes (with best out of 3), we are going to need something like 4o imagegen but for video.

OpenAI generally "teases" features about 1 year before the release them. That means that if 4.5 videogen was on its way in 2026, we would at least have seen a preview of it by now. The best model currently (if you need consistent characters/objects) is Runway Gen 4,

but if you look at one of their shorts you will see that even a professional artist spending (what I assume was 10s of hours) cannot put together a 2 minute clip with rock-solid character consistency. There is absolutely no way that an AI agent would do better than the human who created this short.

@GG This is useful clarity, they should include that kind of specification in the market description tbh.
If they mean any movie can be made from one system with a single prompt, the probability plummets, if the bar is what they've written here, a "pretty good" "big budget studio" film, then it could be a passable 60 minute kids movie on Netflix which I give roughly 70% odds.

I think people have shared good thoughts in this thread and I take them into consideration, but I also think 3 years is a pretty long time.
Progress often seems slow when development is happening behind the scenes, the issues are relatively well understood and don't require a breakthrough, they require continued refinement, integration, and greater reliability. This will all happen incrementally across many companies all competing for one of the largest attention markets on earth (short and long form video)

@FraserPaine I absolutely wish we had better resolution criteria. One of my main problems with many markets. Ultimately this market resolves to Scott Alexander's subjective opinion of "pretty good" since there no doubt will be AI generated movies in 2028.

If I had written it I would have included specific criteria like:

A casual viewer does not notice AI artifacts such as:

  1. plot points that are missing, repeated or otherwise incoherent

  2. characters whose appearance/outfit shifts spontaneously

  3. the gooey "warping" feeling most AI generated videos have

Essentially a weaker Turing-test. If you took your kid to see this movie at the theatre, would your first reaction be "this is AI generated slop" or not?

Alternatively, could just resolve on the highest grossing AI movie having a WW box office >$1B (inflation adjusted)

I know people have repeatedly asked the market creator or SA to add better resolution criteria... I guess I'm one of those people.

I'm so confused about how low this market is

@AndrewG

There are basically 2 types of people voting "no".

1. people who think AI progress is going to slow down or stop. Every time a new model comes out, they will post "haha, it still can't [insert problem here]" and assume this is the best it will ever get.

2. people who think a "yes" resolution is possible in principle but that the current rate of improvement is too slow. When I first started really paying attention to AI generated video, I assumed progress would follow a 10x/year cost reduction, but the rate of progress hasn't been nearly that good. The original sora announcement was over a year ago. Video models have gotten better since then, but not 10x better. AI-agents in general have been improving slower than many expected. 2024 was originally supposed to be the "year of the AI agent", with GPT-5 being able to follow instructions like "order a pizza with toppings X,Y Z" nearly flawlessly in a web browser. Instead of 10x/year, AI agents appear to be improving at a rate of 2x/7months. If you look at the current best video models, they can produce clips of length ~30s/1minute with a 50% success rate. Unfortunately the math of 30s times 5 seven month doublings does not get us to a full movie by early 2028.

I'm not fully in camp 2 yet (and originally I was fully in camp "yes" based on a more optimistic 10x/year) but if another year passes and AI is 50/50 at producing clips of length 2-4 minutes I might get there.

Of course there's still plenty of room for a breakthrough (fully multimodal LLMs, a massive investment push, new model architectures, new hardware, AI enters RSI ), but a 2028 resolution is no longer the default-case. It requires at least 1 surprising thing to happen between now and then.

@AndrewG I'm so confused about how high this market is XD

@bens

Would you describe yourself as camp 1. or camp 2?

Do you agree or disagree with the statement "there are a lot more ways the progress/year curve can bend up than ways it can bend down"?

@LoganZoellner I'm in the camp of:

-progress seems to be slowing rather than speeding up, or at least the rate of increase of progress has absolutely been slowing

-2.7 years is not a lot of time

-the amount. of compute required to do such a task -- even in 2.7 years -- is so high as to make the existence of a single prompt window --> 2 hr movie pipeline a very unrealistic application to have exist

-there are multiple hard problems lying between what we currently have (generation of ~1 minute of loosely connected video clips of poor to average quality) and where it needs to be (generation of a hollywood quality movie that takes hundreds of thousands of human-hours to generate with current methods and technology). these multiple hard problems cannot just be hand-waved away with "scaling laws" and "exponential lines drawn on a crude plot".

even SO... I do think some sort of intelligence takeoff scenarios or breakthroughs could lead to something like this... it's just not in the 39% ballpark, but more like the 3.9% ballpark, idk

@AndrewG the rules are really a very high bar, Scott Alexander gave this 2% and despite how cool the video AIs are I still don't really think we're anywhere near this

bought Ṁ1,000 NO

@AndrewG This market should be at 3% or near that.

@elf this is fascinating! First positive update on this market I’ve seen in about a year, tbh!

@elf Still think there's a long way to go here but this is definitely the type of thing that puts this way closer to possible that it was a week ago

sold Ṁ38 NO

@elf I was thinking of selling my no shares anyways due to ~AI 2027 and other stuff updating my timelines a bit shorter in general, but this is a good specific nudge. XD

A lot of human effort that went into fine tuning that model.

So humans had to write annotations for each 3-second segment of training data. With 81 five-minute episodes, that's 8,100 annotations. Each annotation was ~98 words. So that's 793,800 words of human effort to fine-tune the model.
Naively extrapolating 1-minute Tom and Jerry movie to a 120 minute Star-Wars/Star-Trek crossover, we would need 810 hours of training footage, broken down into 3 second segments, each annotated by 98 words, requiring 95 million words of human annotation in sum. (I'm not sure if the training footage could be a mix of Star Wars and Star Trek movies, or if it would have to be specifically from crossover movies) Once you had that model, you'd be able to generate more 2-hour Star-Trek/Star-Wars crossovers with very little human work. But it wouldn't generalize into say, a new Fast and Furious movie.
So, napkin math, this method needs 23 doublings in efficiency before it can generate arbitrary 2 hour movies with 10 words of human prompting, ex "make me a 120 minute Star Trek / Star Wars crossover".
HOWEVER, even though 95 million words is way more effort than a consumer would put into a movie for their own enjoyment, it would be a trivial cost for a Disney budget. 100 people, working in parallel, could do it in a few months. So if we're asking "When will Hollywood make movies from AI, with no human actors, cameras or 3d modeling,” then human effort is less of a constraint than training footage. We don't have 810 hours of Star Wars movies. You can get about 200 hours if you include all movies, spinoff movies, and tv shows, but if the AI fine-tuned on all of those it would output a jarring mix of styles. If we want our training data to maintain the vintage original trilogy style, we'd have to limit our training data to only the 7 hours available in the OT. That would mean the Tom and Jerry model would need to double in efficiency 6.8 times before it could make Star Wars movies.

The market asks if AI can generate an AI video in response to "a prompt". The sample prompt listed is 10 words long. If someone makes an "AI movie" by splicing together hundreds of different videos on a model they had to pre-train and micromanage, that doesn't meet the standards gestured towards in the market description.
To truly see how far away we are from meeting the market resolution, ask yourself "What's the best movie people can make today with 10 words of prompting? No pretraining the model beforehand, off-the-shelf software only. Ignore videos made by humans who invested hours behind the scenes.

@GG They can use a longer single prompt than 10 words, but yeah, the general idea sounds correct. Once again, this market looks to me like "will we get noticeable RSI in AI research in the next couple years sufficient to greatly accelerate AI progress overall, therefore enabling movie generated AI as side effect."

If AI task horizon trends & reasoning model math/coding/etc. improvements continue at current rates, plus there aren't any multi-year unexpected delays due to difficulties in agent-scaffolding for unpredictable reasons, data-wall isn't unsurmountable in the near-term, etc. it could totally happen. But that all seems less than 1/3 by the resolution date.

@GG Why does it matter if it's a 10 word prompt?

I mean, you could literally have LLM take a 10 word prompt, and turn it into 100 word prompt. Even some image generators can do that. Take your prompt, and then improve it.

@JonTdb03 You're correct, an AI could take the end-user's 10 word word prompt, feed it into an LLM who expands on the prompt, and then send the expanded prompt to a different AI that makes the video. The important thing is that the human asking for a movie only has to put in 10 words of work. The rest of the work needs to be done by AI.
The "AI movies" that go viral these days still require tons of human work behind the scenes. Example 1, example 2, example 3. These movies, even if they were feature length and Hollywood quality, would not resolve this market YES, because they require far more human collaboration than the 10-word prompt given in the example.
ChatGPT 03-mini-high estimates producing a Hallmark movie takes ~20,000 person-hours of work, and producing a Hollywood romantic comedy takes ~120,000 hours. An AI that automated 90% of that labor would reek havoc on those employed in the industry, but still wouldn't meet the criteria to resolve this market YES.

@GG Again, your general point is correct, but I really don't think Scott meant the literal 10 word portion of the resolution criteria to be lode bearing. 1 prompt worth of work, where the prompt is a 150 word exercise in clever prompt engineering, would almost certainly not prevent a yes resolution, and I'm happy to make a market to bet on it if you disagree. ;P

@DavidHiggs Agreed, and maybe we're talking past each other, but the line needs to be drawn somewhere. The difference between writing 10 words and writing 150 words is small, logarithmically, compared to amount of work that needs to be done now to make a 5-minute AI movie, to say nothing of a 2 hour movie. The window in which AI can make a movie with 150 words of human help, but not 10 words, is short, so I doubt Scott would have to make a determination. (I suspect he would come down on your side though).
My bigger concern is if AI video is good enough to eliminate 90% of human labor demand, but is still far short of "ChatGPT for feature-film movies". Some people would think that should resolve the market YES, when I think it would clearly be NO.

Comment hidden
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules