
Resolves yes if there is a model that receives a natural language description (e.g."Give me a video of a puppy playing with a kitten") and outputs a realistic looking video matching the description.
It does *not* have to be *undetectable* as AI generated, merely "realistic enough".
It must be able to consistently generate realistic videos >=30 seconds long to count.
DALL-E 2 (https://cdn.openai.com/papers/dall-e-2.pdf) counts as "realistic enough" *image* generation from natural language descriptions (I am writing this before the model is fully available, if it turns out that all the samples are heavily cherry picked DALL-E 2 does not count but a hypothetical model as good as the cherry picked examples would).
Duplicate of https://manifold.markets/vluzko/will-there-be-realistic-ai-generate
Related questions
big arbitrage opportunity!
https://manifold.markets/GeorgeVii/will-this-yudkowsky-tweet-hold-up-2
@DanW The tweet: (similar to criteria of this market)
https://twitter.com/ESYudkowsky/status/1563282607315382273

@DanW note that this market resolves 1 year prior to the linked market and requires consistency
https://twitter.com/emollick/status/1650213239559421953?t=zarQeYTu1_Qxzn40UdNUJQ&s=19
Thoughts on this?

@Harry_ I wouldn't consider this close to realistic. I think there's still a lot of work to be done

In addition to softball prompts and lacking realism, I think people are taking these "panning" and "zooming" videos way too seriously. Having multiple shots, complex actions (e.g., a puppy playing with a kitten), etc. is just clearly so much harder than the cropping or zooming of an image.
@JacyAnthis We've had machine-generated panning and zooming that look a lot better than this for a while now, too.
Same goes for effects like fire, water splashes, glowing, etc. where there doesn't really need to be a whole lot of coordination of fine details over time.


This convinced me AI video is coming Pepperoni Hug Spot - AI Made TV Commerical - YouTube

@TimothyCurrie That is definitely the most creepy thing I've seen this week 😆

More concretely - for me, this is definitely NEGATIVE evidence for this market. The video is terrible, and I'm sure each prompt resulted in just a couple seconds of video, and the creator had to stitch them together
@TimothyCurrie I seriously doubt that script was machine-generated. It has that same affectation as a bunch of fake "AI generated comedy" where it has the big picture correct but screws up a lot of little details, which is exactly the opposite of the mistakes that a text generator will make. GPT doesn't say things like "are you ready for best pizza of life" because those small-scale errors are the easiest to avoid. When GPT makes a mistake it's a larger-scale one, where fixing it would require looking at a much larger section of text; for example, when it repeats itself over the course of a few paragraphs.
This can be pretty easily verified by just going to ChatGPT, asking it for a pizza commercial script, and seeing that it looks nothing like this.
Clarification: if the model fails in a large majority of cases but succeeds in some subset of particularly easy cases (probably ones that don't have a whole lot of movement outside of the camera panning) this should still count as NO, correct? Since that would be equivalent to the "cherry-picking" you describe for DALL-E 2?
i.e. the model must succeed across a wide variety of prompts

Q1 of 2023 coming to a close and we had this today:
Hugging face:
https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis/tree/main
## Example clips:
Star wars clip using text to video model:
https://twitter.com/victormustar/status/1637461621541949441
Is this a llama clip xD? https://twitter.com/justincmeans/status/1637517337426550785
Shark skiing across the desert! https://twitter.com/hanyingcl/status/1637424841950392321
Astronaut riding a horse, surfing spiderman: https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis
@firstuserhere It's hard to say what "realistic enough" means, but to me those are all pretty far from the bar still. There's a ton of blur and shifting motion that seems like it should be theoretically easy to fix (maybe an extra processing step?), but right now it's definitely not there.

@Gabrielle Oh yeah no way it's realistic enough but we're in march only :) Stable diffusion year's march felt quite similar to me

























