Based on this tweet https://twitter.com/ArthurB/status/1528991584309624832 Question resolves positive if a model is capable of generating arbitrary videos of reasonable quality from text prompts and demonstrates "object permanency" in the sense that it can resolve full object oclusions correctly(for example a mountain being temporarily hidden by a low cloud should still be there after the cloud moves) most of the time. If it's unclear whether some existing model has the capabilities by the deadline I'll use my judgment to decide how to resolve the market, and will lean towards yes in cases where the model mostly does it correctly for simple videos but fails at cherrypicked edge cases.
Edit:
For extra clarification published means that at least a paper or some oficial announcement has to come out about it before that date.
Also the fact I haven't resolved yes doesn't necesarily mean that I think none of the stuff that is out yet counts, I'm likely going to wait until June in any case and resolve based on the best model avaliable by then, unless something pretty clearly counts before then (in case someone was updating on how harsh my judgment is based on seeing that I haven't resolved despite whatever model coming out).
Edit2:
Also note that the criterion isn't based on the original tweet and I might resolve yes based on my own intepretation of what counts as dall-e equivalent and "object permanence" despite Eliezer and Arthur considering Eliezer was right about the original spirit of that question .
Edit3:
By reasonable quality and non crappy I mean something like "dall-e" level quality not "gets hands and details perfectly right and theres no weird artifacts" levels of quality.
Stuff like gen2 might qualify but will have to play arround with it or whatever better model replaces it by the deadline to decide.
Edit4:
I'm going to wait untill I have acess to gen2 to decide whether it fits the object permanece criterion(imo it fits the non-crappy requirement)
@EsbenKran After expending some time playing with it I decided it didn't fit the object permanence criterion.
It is good enough for the "generating arbitrary videos of reasonable quality from text prompts" criterion imo.
Sorry if the title implied a different criterion.
I got a subscription and expend some time trying prompts and being pretty undecided.
I thought of resolving 50% instead but saw failures bad and consistent enough that I decided to resolve no.
Part of the problem is that the videos generated are short and it was hard to get actual examples of object permanency.
But even the example they have on their webpage has mountains magically apear after a baloon hides a chunk of the image.
If my criterion was "Runway ML model is an equivalent video model than Dall-E 1 is an image model" I migth have resolved yes, and maybe I should have picked that instead of committing to something specific but now I have to resolve based on the criterion i wrote on the description wich lots of people were basing their decisions on.
@VictorLevoso I do think soon we'll have models that would have resolved yes soon and it probably takes a bit of improvement from were the runway models are and I don't think we are that far from even moth video level quality.
Although I want to note that from what tried the runway model is significantly worse than in looks on cherricpicked promo videos unless you expend lots of time trying and finding good prompts.
Though maybe I'm just bad at prompting duno, I couldn't try that much cause its pretty expensive.
I wish the runway models were a bit better or a bit worse so resolving was easier and less contested.I was originally inclined to resolve yes based on the things they showed before trying it myself, and now I'm wondering whether I should have resolved to some percentage instead.
@twink_joan_didion I don't think clearly. Maybe NO, Maybe YES, if it were me I'd just resolve PROB I think. It's very subjective.
@Sky not yet, only a bit with free credits will have to pay to try it a bit more.
Sorry been busy with other stuff these days(mostly working on intepretability and running an intepretability discord) and have been procrastinating on this a bit I'll try to do it before the end of this week I guess.
Attempted to do some more in-depth research on this topic within a blog post. I don't think we are there and I think this will give a good case for why not. While we're at the point of useful 4-second clips which can be used by editors and cherry-picked to put together longer videos, I just don't think we're anywhere near the original level of Dall-E 2 and I'm fairly confident, like at least 30% confident that the market resolver will come to this realization when they eventually get around to using gen-2. https://patdel.substack.com/p/how-far-away-are-we-from-non-crappy
@PatrickDelaney "Non-crappy" is subjective, and the person said they think it's non-crappy. Dall-E 2 wasn't even mentioned, they were talking about dalle1 and dalle1 gets things wrong a lot.
@ShadowyZephyr "Non-crappy" for this market is "object permanency". Gen-2 video (that are currently the best among AI generation) are definitely crappy in the usual sense. They trigger either laugh, disgust or both. They are so short it's difficult to demonstrate successful object permanency. Most examples I have seen, even by yes better, displays errors in object permanency. Clouds that appears from nowhere, things that morph into another things, people doing weird movements so there limbs seems to change sizes.
@ShadowyZephyr Video is not a new technology. We can estimate quality of a video, AI generated or not. Anyway I have only 16 shares on this, I sold the others because I was sensing resolution was potentially problematic.
@ShadowyZephyr I can't bet anymore on this market as I already sold out (lost money) but if I could, I would vote NO because I think the market maker can be convinced that we're at a NO. Check out my article. If I have written something wrong in there I would like to know.
@PatrickDelaney (No one can buy or sell on this market because it’s closed. If it was open you could still bet as normal.)
@PatrickDelaney The thing you wrote wrong was comparing to Dalle-2 when it wasn't mentioned at all.
Also 30fps is not necessary. Most animators use 12fps.
@PeterBuyukliev Uhhh no it does not? Object permanence means something in the background can be obscured temporarily, and then it will still show afterwards. A camel growing a second head doesn't prove anything. I think it can show object permanence sometimes.
Looking at some of the examples, and how mixed it can be, I think resolving 50% is fair.
@ShadowyZephyr it literally does not even have object permanence for objects in the foreground.
@PeterBuyukliev Object permanence has literally nothing to do with the foreground. It's the ability to know objects exist even when they are behind something/not shown. Yes, it can make mistakes in the foreground, but it doesn't always do that, and that isn't one of the conditions.
@ShadowyZephyr The other condition is that the videos generated should be "of reasonable quality." If it regularly adds extra heads to camels, that condition doesn't really seem to be met to me.
@NLeseul The other videos it produces that I've seen are of reasonable quality, it seems like that's a one off scenario
@ShadowyZephyr I cite "generating arbitrary videos of reasonable quality from text prompts and demonstrates "object permanency" in the sense that it can resolve full object oclusions correctly(for example a mountain being temporarily hidden by a low cloud should still be there after the cloud moves) most of the time."
Reasonable quality is not here. You get somewhat quality only by sherry-picking.
As for "object permanency" in the sense that it can resolve full object occlusions correctly. It generally fails here. But "full object occlusions" is only the difficult case of object permanency. If we expect the object to be permanent after being temporarily hidden, that means it also should be permanent in the easy case where it stays in plain sight. So it should not disappear or morph with no reason.