Ie it is generating or loading 3d assets and animations then "filming" them as would occur in a 3d physics simulation like Roblox
It can still be doing some image generation, but big things like how the camera moves, how object permanence is done etc are mainly done through just generating them once as an asset then filming that to make the video. Of course the model is still writing the script, planning cuts and camera angles, choosing character design etc.
https://en.wikipedia.org/wiki/Machinima
And this will be known and basically accepted by mid 2024


🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ364 | |
2 | Ṁ72 | |
3 | Ṁ57 | |
4 | Ṁ57 | |
5 | Ṁ50 |
People are also trading
(speaking personally and NOT as a moderator) Can we resolve this NO? Their public claims about how it works are extremely inconsistent with this (as are many of the videos they've released). They'd have to be lying which seems very unlikely.
"Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps.
Sora is capable of generating entire videos all at once or extending generated videos to make them longer. By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily."
It's in the research paper, it's still a diffusion model.
@ImDaniel interesting, so object permanence is done via it recognizing when an item reappears later on. I wonder what happens when that fails?
@Ernie to clarify, on the open AI article, there is a "mishaps" section (the list starting with the backwards treadmill). 3-4 of those videos show exactly what you're talking about.