EG "make me a 120 minute Star Trek / Star Wars crossover". It should be more or less comparable to a big-budget studio film, although it doesn't have to pass a full Turing Test as long as it's pretty good. The AI doesn't have to be available to the public, as long as it's confirmed to exist.
People are also trading
@JonasVollmer my thoughts exactly! Generating a Hollywood-quality movie from a single prompt is certainly something that far exceeds the baseline of “AGI”. It requires the combined efforts of hundreds of humans working for several years.
@Joshua You can also short humanity/money, and then shorting Warner Brothers doesn't make sense.
@Maloew I think I disagree with you about everything here. The main thing is that I don't think there are any promising leads on the taste problem - transformers, even with RL, haven't broken past "hardworking, dedicated, but wholly untalented person" in any task I feel qualified to evaluate them on, including coding and math. There are maybe worlds where we get a kind of superintelligence on this timeline, but those are worlds where the taste problem turns out to be irrelevant and RL on coding is all you need, which are worlds where nobody is even remotely interested in making movies with AI, least of all the AI.
@speck "hardworking, dedicated, but wholly untalented person" is probably a higher benchmark than AGI under my preferred definition of "able to automate 90% of economically valuable work".
@LoganZoellner I agree. I don't think we're all that close to saturating that metric, multimodality is probably needed for that and progress there has been slower than I expected, but it absolutely seems possible to me that we achieve some form of AGI within the next decade, maybe even early in the decade. But that still leaves a lot of creative or systems design work un-automated. I think the main point I want to make is that even if you think some sort of AGI is possible, there are AGIs which are not superintelligent, and there are some reasons to believe that that's what we're on trajectory for.
@speck
> there are AGIs which are not superintelligent
I think it's more accurate to say that intelligence is "spiky". AI is already superhuman in many domains (chess, go, poker). At some point we will cross the line of "well, 90% of tasks can now be automated" but that won't magically imply the other 10% has been.
@Maloew You are missing everything; the whole statement is false.
"Generate a full high-quality movie to a prompt" is already superintelligence. Obviously no human can do that.
@robm is it though?
I watched the example short films on their website. Despite having a human "director" who prompted, aggregated, and added a soundtrack to the clips, none of them have even, what I'd call, a plot, dialogue, or really... anything that we haven't seen before?
@bens I'm talking specifically about consistent characters. You can find my comment from a few weeks ago criticizing the earlier gen4 release for missing on this. Any demos on their website from before today are irrelevant to my point, this is a new feature dropped today.
imho, the 3 hardest things remaining are:
writing great scripts
audio generation and sync
consistent characters
If this is as good as the samples they posted today, then one of those 3 barriers is down.
@robm the example they give of consistent characters on their website has like 3 outfit changes across the same clip and only vaguely looks the same in that it is a girl with dark hair and bangs in every clip. I think this is wool-over-eyes at best, and selective editing more likely.
@bens this one?
Here is me in this thread literally making the same critique when that was posted 4 weeks ago.
https://manifold.markets/ScottAlexander/in-2028-will-an-ai-be-able-to-gener#vfe32gtb1nn
Check the stuff from their yt channel today. Things have changed.
Seeing the progress on three fronts in just the past 3 months really moved me from a soft yes to a strong yes on this one:
1. GPT 4.5 has improved in writing quality to such a massive degree that I find some paragraphs it generates actually compelling and interesting. (I publicly shamed 4.5 when it came out for the poor benchmarks and high cost, but it seriously is good in writing usage)
2. Runway Gen-4, Veo and the Chinese video models have shown there's a ton of gains to still be made there and past problems like lip-sync, consistent characters, and camera motion are diminishing. Also there was a decent jump forward in this domain just recently improving story and character consistency for much longer cohesive shots: https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
3. Agentic systems like Deep Research are showing that more advanced orchestration of models can achieve much larger and more impressive outputs than individual models with no tool assistance.
One can imagine a system that reinterprets the simple prompt and generates a few high level story lines, then turns one that scores highly on some heuristic into a screenplay, then refines this through a few passes, and then generates the shots and audio accordingly and strings them together.
I'm not saying this will be great, it will almost certainly be slop, but it can be done today you just need all the parts to get better. The standard here is "more or less comparable to a big-budget studio film" and that is a surprisingly low bar if you look at some of the slop film on Netflix, especially for kids. These have big budgets and are made by studios.
If we don't see reasonably strong iterations on all of the fronts mentioned above over 3 years (it's been 2.5 since ChatGPT dropped) I will be very surprised. I'm not even banking on GPT3 -> GPT4 type jumps, just 4-5 rounds of refinement to the biggest issues in these component models and some effort to tune the overarching system that stitches them together.
@FraserPaine
>1. GPT 4.5 has improved in writing quality to such a massive degree that I find some paragraphs it generates actually compelling and interesting. (I publicly shamed 4.5 when it came out for the poor benchmarks and high cost, but it seriously is good in writing usage)
Yeah, I have been using Gemini and the script-writing portion is pretty much solved at this point.
> 2. Runway Gen-4, Veo and the Chinese video models have shown there's a ton of gains to still be made there and past problems like lip-sync, consistent characters, and camera motion are diminishing. Also there was a decent jump forward in this domain just recently improving story and character consistency for much longer cohesive shots: https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
I agree that there are gains to be had. But progress has been much slower than I expected. As a reminder, Sora was over a year ago and it remains SOTA.
> 3. Agentic systems like Deep Research are showing that more advanced orchestration of models can achieve much larger and more impressive outputs than individual models with no tool assistance.
Just for fun, I tried using a browser agent to generate a movie.
Attempt 1: told it to use Sora/Clipchamp. It failed to log in and immediately quit
Attempt 2: told it to use my custom movie-making app. It opened the app, changed the title to "star trek vs star wars", clicked "create movie" and then exited
Attempt 3: wrote detailed step-by-step instructions on how to use the movie editor. It opened the movie editor, wasn't able to interact with the timeline object, and went into an infinite loop
I'm sure that with sufficient amount of finagling the UX and prompt engineering I could get it to create something, but I sincerely doubt it would outperform my custom python script for making movies. This has been my experience with agents in general, if the base model isn't already capable of doing something you don't get any extra oomph from using an agent framework.
@LoganZoellner sorry to clarify I didnt mean any of the agent franeworks today unlock this capability, DeepResearch is almost certainly all done in code not some drag and drop low code orchestrator. I agree with everything else youve said except that Sora is SotA, I think Veo is much better than even the original unreleased Sora for standard shots before they lobotomised it, and Runway Gen 4 that just came out is less smooth sometimes but gets some phenominal midjourney style artistic shots nearly perfect.
People are saying that the performance on shots over a minute isnt good enough but you should look up "average shot duration in films" its been falling since the 60s and its now around 5 seconds on average with almost 0 unbroken shots running longer than 45 seconds.
@FraserPaine I don't think anyone is saying "Yeah AI video is great for 5 second shots but it can't do a 60 second unbroken take". The problem is that AI can't make a 60 second *scene* without humans putting in large amount of scaffolding.
The Tom and Jerry videos required almost 800k words of human annotation, specifically commissioned for that project, to fine-tune. The Coke ad required a whole team of editors behind the scenes. Jason Zada said his "Fade Out" film took "a few days, a few hours here and there" and even with that time investment and access to Veo 2, it still has crap continuity.
@FraserPaine
> I think Veo is much better than even the original unreleased Sora
I agree that Veo is better than Sora, but it is not 10x better. You cannot, for example, give Veo a 10k word prompt and get out a coherent 10 minute long video (without significant human editing)
> People are saying that the performance on shots over a minute isnt good enough but you should look up "average shot duration in films" its been falling since the 60s and its now around 5 seconds on average with almost 0 unbroken shots running longer than 45 seconds.
The problem is not literally "the video model cannot create a 10m unbroken shot". In fact, a human can create a 10m unbroken shot with any of the SOTA models (Veo, Wan, Kling) by extending shorter clips.
The problem is that AI agents are all terrible at long-term planning, and the rate at which they are becoming less-terrible is not fast enough (absent some breakthrough) for a 2028 resolution.
Deep Research is the exception that proves the rule. By breaking research down into a map-reduce framework, each of the individual "threads" of the agent only has to stay on task for a short period of time.
Unfortunately, there does not appear to be a way to break down a movie into such a map-reduce framework because movies have causality. If there is a spinning top in scene 1 of Inception, it needs to be the same top in the final scene.
Now, it's possible that there is a clever way to break a full movie into a set of tasks each of which only require a minute or two of "attention". Maybe you can write a script overview, then generate a set of characters, props and settings, then combine these into a sequence of clips. I and others have been trying to do this and will continue trying to do this. But the models simply aren't ready yet. We need something like 4o imaggen but for video. No one is building such a model, however, because inference costs would be astronomical.
So, basically, we won't have the tools we need until prices come down (by at least 2 orders of magnitude). But by that time, the bitter lesson suggests we probably won't need any of this scaffolding. In ~5 years (assuming the current rate of improvement continues) we will just be able to prompt GPT-9 "make a movie" and it will just work.
Now, if something is possible in 5 years, then it will be almost possible in 4 years, and probably someone will come up with some clever way to do it (scaffolding, dedicated movie model). But we need at least one "breakthrough" for this to happen before 2028.
@LoganZoellner I think we're on the same page about all of this, I just think the scaffolding is more achievable than is being assumed, your point about generating characters props and settings is exactly right. You can use GPT 4o to generate the first frame of each shot for improved consistency of characters, objects, etc.
My main argument here is not that there will be a breakthrough to Avatar 2 level film making from a single 10 word prompt, it's that the threshold being set of "big budget studio film" is actually quite low. If it's an animated film then photo-realism is removed, lip-sync is easier, and story quality drops down to what a kid would accept on Netflix.
I suspect that you could, today; have a 4.5 type model generate a structured shot list, iterate through those shots, retrieve relevant objects and characters and prompt the first frame from 4o, then pass those frames and the shot description to a SotA model for ~3 variations. You would likely get a usable sequence of shots for a ~5 minute scene with a person picking the best of 3 shots, sound would still be a challenge.
My argument is that, without a breakthrough; the shot quality will go up, the planning will improve, the audio will get more tightly integrated, and the orchestration will get more advanced as people test it.
I think the films produced will be slop, and will have obvious faults, but again, the criteria of this market includes "it doesn't have to pass a full Turing Test as long as it's pretty good" this is extremely subjective. but I can very easily picture this approach being better than the garbage on Netflix for Kids right now and that would tick this box for many interpretations.
If the maker of this market wants to update and clarify that the bar is much higher than I'm assuming then I'll change my bet, but 3 years is a long time, it's been 1 year since Sora. Also there's an unfortunately large incentive for someone to solve slop generation for making Youtube Kids content.
@FraserPaine >My main argument here is not that there will be a breakthrough to Avatar 2 level film making from a single 10 word prompt, it's that the threshold being set of "big budget studio film" is actually quite low. If it's an animated film then photo-realism is removed, lip-sync is easier, and story quality drops down to what a kid would accept on Netflix.
When Scott Alexander introduced this prediction on his blog, he specifically said that mediocre cartoons were not what he had in mind.


He also specified that the AI should be able to make a motion picture "to your specifications", such as a Star-Trek/Star-Wars crossover. That would preclude a model that could not make photorealistic films like Star Trek and Star Wars.
@GG "big budget studio film" is a much higher bar than you think. Think about the big budget animated films, like Pixar and Disney blockbusters. They're not "photorealistic" but they still have stunningly high visual quality. I don't think that making a movie that is of the level of quality of a Pixar or Studio Ghibli movie (even given the recent Ghiblification trend , lol) is a much lower bar than making a normal live action movie.
@FraserPaine
> I suspect that you could, today; have a 4.5 type model generate a structured shot list, iterate through those shots, retrieve relevant objects and characters and prompt the first frame from 4o, then pass those frames and the shot description to a SotA model for ~3 variations. You would likely get a usable sequence of shots for a ~5 minute scene with a person picking the best of 3 shots, sound would still be a challenge.
This is almost an exact description of the process I used to produce this video
The caveats being: I used Gemini 2.5 pro instead of GPT 4.5 (but I don't think that was the problem since Gemini is a stronger model) and I used Gemini 2.0 flash instead of 4o imagegen (because 4o imagegen does not have an API).
When I say "the tools are currently not ready", it is because I use them on a regular basis. I think even to get to 5 minutes (with best out of 3), we are going to need something like 4o imagegen but for video.
OpenAI generally "teases" features about 1 year before the release them. That means that if 4.5 videogen was on its way in 2026, we would at least have seen a preview of it by now. The best model currently (if you need consistent characters/objects) is Runway Gen 4,
but if you look at one of their shorts you will see that even a professional artist spending (what I assume was 10s of hours) cannot put together a 2 minute clip with rock-solid character consistency. There is absolutely no way that an AI agent would do better than the human who created this short.
@GG This is useful clarity, they should include that kind of specification in the market description tbh.
If they mean any movie can be made from one system with a single prompt, the probability plummets, if the bar is what they've written here, a "pretty good" "big budget studio" film, then it could be a passable 60 minute kids movie on Netflix which I give roughly 70% odds.
I think people have shared good thoughts in this thread and I take them into consideration, but I also think 3 years is a pretty long time.
Progress often seems slow when development is happening behind the scenes, the issues are relatively well understood and don't require a breakthrough, they require continued refinement, integration, and greater reliability. This will all happen incrementally across many companies all competing for one of the largest attention markets on earth (short and long form video)
@FraserPaine I absolutely wish we had better resolution criteria. One of my main problems with many markets. Ultimately this market resolves to Scott Alexander's subjective opinion of "pretty good" since there no doubt will be AI generated movies in 2028.
If I had written it I would have included specific criteria like:
A casual viewer does not notice AI artifacts such as:
plot points that are missing, repeated or otherwise incoherent
characters whose appearance/outfit shifts spontaneously
the gooey "warping" feeling most AI generated videos have
Essentially a weaker Turing-test. If you took your kid to see this movie at the theatre, would your first reaction be "this is AI generated slop" or not?
Alternatively, could just resolve on the highest grossing AI movie having a WW box office >$1B (inflation adjusted)
I know people have repeatedly asked the market creator or SA to add better resolution criteria... I guess I'm one of those people.