Will AI generate realistic video of animal movement before 2025?

1.3kṀ7273

resolved Jan 4

Resolved

ALL

Note: This is an effort to make relatively objective, transparent Manifold markets that predict AI capabilities. I won't trade in these markets because there will inevitably be some subjectivity, and I'll try to be responsive with clarifications in the comments (which I will add to the market description). Feedback welcome.

Specifications:

The background of the video doesn't matter (e.g., there can be unrealistic animals or scenery in the background).
The AI-generated video needs to be indistinguishable from a 5-second clip of a single nonhuman animal in action (not just standing, walking, sitting down, etc.). Examples could include a moth flapping their wings, a snake slithering, a cheetah making a sharp turn, or a whale jumping.
This requires more than one example (i.e., not just a fluke), but it doesn't require robustness or high success rates. If a company releases a handful of examples and reliable evidence that they can make videos like this without human assistance (i.e., text-to-video), that's sufficient for YES even if the examples are cherry-picked; the idea here is that even if videos like these take 10 tries each, they could still be commercially viable, and they indicate that the model isn't just getting lucky—even if it still has a lot of hallucination problems.
Indistinguishability approximately means that in a YouTube compilation of 20 clips presented as real animal footage, fewer than 10% of casual, attentive viewers would suspect the AI-generated clip wasn't a real animal. It should be a real animal species, but it doesn't need to pass expert review. (The human observer test isn't a strict or precise requirement, in part because the results would depend a lot on how much people are thinking about AI at the time of the test.)
Most of the animal should be in the video and shouldn't be obscured (e.g., smoke, a blizzard, a dirty camera lens, excessive hair or fur). If the animal is moving quickly, the frame rate needs to be good enough to tell that the animal movement is realistic.
The model needs to be generating novel content. It can't just regurgitate real footage, even if it does some adjustment or recombination. (Thanks to @pietrokc for raising this in the comments. There may be quite a bit of subjectivity here, particularly because there tends to not be much public information about the training data of SOTA models these days.)

The spirit of this market (which will be used to resolve ambiguities that aren't resolved by explicit criteria) is whether the AI seems to have a world model of how animals look and move. YES resolution doesn't require the detailed knowledge of a scientist or sculptor but the general, intuitive understanding that almost all human adults have.

Technology

OpenAI

Technical AI Timelines

AI Video Generation

Relatively objective and transparent AI capabilities markets

OpenAI Sora

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ601
2		Ṁ495
3		Ṁ332
4		Ṁ316
5		Ṁ173

People are also trading

By 2029, will an AI be able to generate Video Games comparable to ~2023 'AA' Mid Market Games?

47% chance

Will there be realistic AI generated video from natural language descriptions by the start of 2026?

56% chance

Will AI generates film (not video) by using a person's script before 2026

50% chance

Will AI-generated video be used to get away with a criminal (felony) loss of life before the end of 2025?

11% chance

Will AI image=> video be pwrfl enough to make a person in an image rotate/walk/change clothes by EOY 2025?

85% chance

Will video generation AI make more product revenue than text models in 2025?

21% chance

Who will create the first AI model to generate a high quality movie before 2028?

Will a video made entirely with AI reach 100 million views on YouTube in 2026?

91% chance

Will an AI generated video have significant impact on US politics before 2029?

Sort by:

Anyone want to make a case for YES? The closest examples I've seen are videos of cats walking, but "walking" was explicitly excluded. In hindsight I should have explicitly addressed running, but I don't think that edge case has come up. Also note that background doesn't matter for this market. I think a YES resolution was certainly plausible, particularly with Veo 2!

For the record, I have not yet seen a 5-second video of something like "a moth flapping their wings, a snake slithering, a cheetah making a sharp turn, or a whale jumping." I have seen pretty good videos of common animals (e.g., cats) walking, but the criteria explicitly say "not just standing, walking, sitting down, etc."

I would be very interested in any state-of-the-art examples people have in mind.

This is what a cheetah turning looks like: https://www.nytimes.com/video/science/100000002276663/a-cheetahs-turn.html

This is a really good attempt at an unambiguous market but I think the following case is not dealt with.

Imagine I take one real 30s video of a dog walking. Like, a real video of a real dog. Then I train an AI to output that exact video regardless of text prompt. That's very easy to do. So, would that count? What if I do the same but with 5 real videos, and the AI has a 20% chance of outputting each of the 5?

What I'm getting at is that it's very easy to train a model to output any fixed thing. So maybe your question really is, "will AI generate realistic video of animal movement to most reasonable prompts?" With all your (very well thought-out) conditions.

@pietrokc that concern makes sense. The challenges with "most reasonable prompts" are: (i) I'm trying to get at whether the model has understanding, even if that understanding is unreliable, and failing/hallucinating frequently doesn't seem to curtail understanding, (ii) pragmatically, a model that fails even 99 out of 100 times at this task could still be very impactful, especially if selecting the 1/100 is easy; e.g., Hollywood studies could incorporate it even if they have to do that selection, (iii) knowing whether the realism works for particular prompts requires public access, and I'm trying to capture model capabilities rather than, e.g., when companies decide it's advantageous to do a public release.

But I think I can exclude your case with something like, "The model needs to be generating novel content. It can't just regurgitate real footage, even if it does some adjustment or recombination." Of course that's subjective, but I think we'll have a good sense of the degree of the limitation—except for when we don't know if it's doing things very similar to proprietary training data; that's a pretty intractable issue, but probably we should have a higher bar for, say, a no-name start-up that shows 20 examples but we have no idea if those were just the only 20 animal movements for which they had robust training data. What do you think?

@Jacy That all makes sense and I agree with your proposal. It's a tough market to define but so are most (?) interesting markets! Thanks for thinking it through.

bought Ṁ100 YES

How about this?

https://twitter.com/bennash/status/1758200859547025779/video/1

@CertaintyOfVictory Already at 3-4seconds he takes two steps with the same foot.

@SophusCorry I do that all the time.

@CertaintyOfVictory As @SophusCorry mentions, the legs aren't really in sync with the torso—among other issues. Also, the cat is merely walking, and this is about a wider range of animal motion (e.g., leaping, turning, rolling, playing).

I'll try to avoid repeating myself in the comments to avoid clutter, but I'll say again that none of the Sora examples meet the bar of this market in my opinion—so a YES resolution would require Sora to become much better or a competitor to take the lead. Personally I would not put the probability of this at 93% (the current market price).

Would i. e. this video pass your judgement? https://twitter.com/AngryTomtweets/status/1758265847334732288

@Lion Or this one? https://twitter.com/AngryTomtweets/status/1758265864896344290

@Lion no, none of the current Sora clips would. The golden retriever puppies aren't showing enough movement, aren't shown fully enough to tell if the motion is realistic, are obscured by snow, and have various warping/shimmering/etc. artifacts that I think would at least produce an uncanny valley effect. The dalmatian is only walking and passes through the blinds, but I actually think that is the best Sora example of animal motion so far.