Will there be realistic AI generated video with dialogue by 2024?
Minimum video length of 2 minutes, and must maintain coherence. The visuals, dialogue, and sound must all be of "reasonable" quality: it does not need to be indistinguishable from human made video, but there shouldn't be significant artifacts.

"Significant artifacts" is the kicker. Is obvious uncanny valley an "obvious artifact"? What about uncanny valley that's only obvious to people who pay close attention to video quality? Or is the bar for "realistic" much lower than I'm assuming?

@JonathanElliott if it's obviously uncanny valley to me that would not count as realistic.

Does By 2024 mean start of 2024?

What counts as "maintaining coherence"? What prevents the AI system or the user stitching multiple videos together?

@VincentLuczkow Does using multiple AIs count?

Is this from a prompt of like "A video of two people having a conversation", or something with more input data, such as the transcript of the dialogue and a starting picture of two people talking to each other?

@Nikola I will accept either of those.

@vluzko https://www.youtube.com/watch?v=jz78fSnBG0s In what ways does this not pass the test? Because of the video creator splicing the clips together?

@Nikola The splicing hurts it, but the main thing is that this is a question about being able to generate many kinds of video, not any video. Think DALL-E 2 but for video with sound (although I do not require the inputs to be purely text)

@vluzko Two minutes is a really long time, and we don't even have a good DALL-E 1 equivalent for long video, let alone video + sound