Will there be realistic AI generated video with full sound by 2024?

520Ṁ9204

resolved Jan 3

Resolved

ALL

Minimum video length of 2 minutes, and must maintain coherence. "Sound" means dialogue and background noise.

The visuals, any dialogue, and sound must all be of "reasonable" quality: it does not need to be indistinguishable from human made video, but there shouldn't be significant artifacts.

Technical AI Timelines

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ224
2		Ṁ190
3		Ṁ55
4		Ṁ41
5		Ṁ26

People are also trading

Will I have a convincing video call with a fully synthetic AI avatar before July 2025?

25% chance

Will there be realistic AI generated video from natural language descriptions by the start of 2026?

56% chance

Will AI generate realistic video of animal movement before 2026?

86% chance

Will AI be able to create a video with this much detail by 2030?

92% chance

Will AI video become indistinguishable from reality by 2030?

78% chance

Will AI generates film (not video) by using a person's script before 2026

50% chance

By 2029, will an AI be able to generate Video Games comparable to ~2023 'AA' Mid Market Games?

44% chance

Will most digital entertainment be AI generated by 2028?

23% chance

Will Video Generation AI be able to 24-7 livestream in real time by EOY 2024?

50% chance

Will real-time text-to-video generation be viable by 2027?

Sort by:

I'm resolving this NO now - I'm leaving some of the other video markets unresolved for longer because there are some things that are almost close, but to my knowledge no one is even really attempting this one.

I think it's unlikely that we won't have the video part when we already have low quality video of the required length and it looks like it's just a question of scaling things up.

For audio it should be posible to either do a multimodal model that does both things somehow, train a video 2 audio model on YouTube or generate the audio separately from a text2audio model, or separately whith a text2voice model and a video2background-sound model or sonethimg like that.

predictedNO

@VictorLevoso What is your analysis 11 months later as to where things are at? Particularly with the audio integration

@JoshuaHedlund so things have been much slower in terms of video generation than I expected(and other things) , probably partly due to lack of enough gpu slowing things down(bottleneck seems to be Nvidia rather than money)

I still think we might see good enough video towards the end of the year.

Problem is audio, we definitely have good voice generation by now, but I haven't seen any background noise generation thing and it seems less likely for q company to do both before the end of the year. As oposed to just doing high quality video and figuring out audio latter.

I think the main problem rn is probably most people don't have enough compute to train models on video.

However that said we still have some conference deadlines left, and last year a lot if the video generation stuff came out at the end of the year so there's still time for that.