Will real-time text-to-video generation be viable by 2027?

This market resolves YES if, by January 1, 2027, I can prompt a text-to-video generator to create a 1 minute video and get a result meeting the following criteria:

  • I have to wait no more than one second for the video to load (either at the beginning or anywhere in the middle)

  • At least 1080p resolution

  • At least 24 frames per second

  • Subjectively, at least as realistic and visually appealing as the "stylish woman walks down a Tokyo street" example from Sora

  • If I wanted to, I could do this 100 times on the same day and pay less than $20 (this accounts for various pricing schemes, e.g. subscription-based or pay-per-video)

It's fine if the entire video isn't generated by the time it starts playing - it can "stream" to my device. I will make sure I have good Wi-Fi and try up to 5 times if necessary.

I will use the same prompt as the original Sora video:

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Similar market for 2030:


I'd vote yes on before 2029

All of these except for the 1 second turnaround time seem very likely to me, just because of the demand to batch multiple user inputs. (I haven't actually done out any calculations on the 1 second turnaround so maybe that's easier than my intuition says)

