This is a duplicate of the market that resolved NO on 12/31/2024: https://manifold.markets/Jacy/will-ai-generate-realistic-video-of
Note: This is an effort to make relatively objective, transparent Manifold markets that predict AI capabilities. I won't trade in these markets because there will inevitably be some subjectivity, and I'll try to be responsive with clarifications in the comments (which I will add to the market description). Feedback welcome.
Specifications:
The background of the video doesn't matter (e.g., there can be unrealistic animals or scenery in the background).
The AI-generated video needs to be indistinguishable from a 5-second clip of a single nonhuman animal in action (not just standing, walking, sitting down, etc.). Examples could include a moth flapping their wings, a snake slithering, a cheetah making a sharp turn, or a whale jumping. [Note: Running will probably not count, but I think an exceptionally good running video (e.g., 10 seconds, animal starts from walking then begins running, close-up details of muscle and joint movement) would count.]
This requires more than one example (i.e., not just a fluke), but it doesn't require robustness or high success rates. If a company releases a handful of examples and reliable evidence that they can make videos like this without human assistance (i.e., text-to-video), that's sufficient for YES even if the examples are cherry-picked; the idea here is that even if videos like these take 10 tries each, they could still be commercially viable, and they indicate that the model isn't just getting lucky—even if it still has a lot of hallucination problems.
Indistinguishability approximately means that in a YouTube compilation of 20 clips presented as real animal footage, fewer than 10% of casual, attentive viewers would suspect the AI-generated clip wasn't a real animal. It should be a real animal species, but it doesn't need to pass expert review. (The human observer test isn't a strict or precise requirement, in part because the results would depend a lot on how much people are thinking about AI at the time of the test.)
Most of the animal should be in the video and shouldn't be obscured (e.g., smoke, a blizzard, a dirty camera lens, excessive hair or fur). If the animal is moving quickly, the frame rate needs to be good enough to tell that the animal movement is realistic.
The model needs to be generating novel content. It can't just regurgitate real footage, even if it does some adjustment or recombination. (Thanks to @pietrokc for raising this in the comments. There may be quite a bit of subjectivity here, particularly because there tends to not be much public information about the training data of SOTA models these days.)
The spirit of this market (which will be used to resolve ambiguities that aren't resolved by explicit criteria) is whether the AI seems to have a world model of how animals look and move. YES resolution doesn't require the detailed knowledge of a scientist or sculptor but the general, intuitive understanding that almost all human adults have.
The resolution criteria seem to imply a good bit of work needs to be done to determine how it will resolve. @Jacy - who do you anticipate will do this work?
@WilliamGunn I'm not sure what you mean. The resolution only requires looking at the videos available, then assessing them based on the criteria. If anyone thinks I have missed all the videos that would qualify, they can post a link in the comments.
Note there is still inevitable subjectivity here, so I'm not taking a position. I'll do my best at resolving because I think these milestones are important to track for understanding the societal impact of AI.