Instant deepfakes of anyone (doing anything) by the end of 2027?
449
16kṀ640k
2028
78%
chance

By 2027-12-31, will we be able to synthesize basically any photo from a text description and sample images of people we want in the photo?

Resolution criteria: I'm doing my best to make this objective by adding to the FAQ below and am hopeful that it won't come down to a judgment call. If it does, I'll do my best to gather input and hew to the spirit of the prediction. I'm betting in this market myself and promise to be transparent about my reasoning in the comments. AMA!

FAQ

1. Does the photo have to be undetectable as AI-generated?

No, if the photo is perfectly believable on casual inspection but can be determined to be AI-generated with some forensics, that still counts as YES.

Of course detecting that the image is fake via real-world knowledge -- like knowing things about the human test subjects or just knowing that the depicted scene is fantastical -- isn't grounds for resolving NO.

2. How well does it have to understand the text descriptions?

Much better than 2022 DALL-E but it doesn't have to be quite human-level. If you have to work to find convoluted descriptions that confuse it, that still counts as YES. If there are perfectly clear descriptions that you can't get it to understand even with multiple rephrasings, that's a NO.

UPDATE: And still better than 2024 DALL-E.

3. Does it have to be able to generate deepfakes of anyone?

Yes, if it can only do public figures or any fixed database of people, that doesn't count. To resolve YES it needs to work for anyone you provide sample images of. Even people who do not exist.

4. Does it have to be available to anyone?

Yes, if OpenAI or Google or such demonstrate the ability but there's no reasonable way for outsiders to try it, this resolves NO. This matters because the question isn't just about the cutting edge of the technology but also how impactful/dangerous it will be.

And we're using $100 as the threshold for how much time/cost it can take for an outsider to get an image like this generated and still count as "available to anyone".

5. Any constraints on the sample images?

Up to 50 of them. If it required a massive number of sample images or the sample images had to be taken with special equipment, like doing a 3D body or face scan or something, that's getting outside the spirit of the question. But of course it's fine for the AI to make up any details about the subjects that are impossible to tell from the sample images.

The rule of thumb is that someone judging the deepfake, who only knows the subjects from the same sample images the AI saw, would not be immediately suspicious that the generated image was fake. (Unless the image was depicting something fantastical. You get the idea.)

6. How instant does it have to be?

One hour. The "instant" in the title is more about being on-demand and fully automated than about exactly how long the image takes to generate.

7. Does it have to nail it on the first try?

No, as William Ehlhardt points out, there's a beyond-astronomical space of possible images so the ability to generate, say, one in a hundred that match the prompt isn't very much less gobsmacking than the ability to do it on the first try. The AI is still meaningfully succeeding at the task. We'll go with best-of-ten when resolving this market.

8. Does it have to be able to do NSFW photos?

No, if it has filters that prevent it from generating reasonably narrow/specific categories of images such as NSFW, that doesn't prevent this from resolving YES. The only way for content filtering to cause a NO resolution is if it's somehow so broad that it covers up an inability to handle arbitrary descriptions.

9. Does it have to be able to handle multiple different people in the same image?

Yes, up to 5 different people all composed in the image doing whatever the prompt describes.

10. Does it have to preserve tattoos and scars?

Yes, but not necessarily well enough to withstand side-by-side scrutiny. (See FAQ1 about forensics.)

If the tattoo or whatever is inconsistently visible in the sample photos such that the human judge doesn't see the deepfake as the clear odd-one-out, then a failure to preserve the feature wouldn't necessarily yield a NO resolution.

Again, we're talking about fooling someone who doesn't know the human subjects in real life and is not doing meticulous side-by-side comparison.

11. What if it only achieves all this for, say, white males?

Since "anyone" appears prominently in the title, I'm comfortable saying that if the AI can't handle racial/gender/whatever minorities, that'd be a NO for this market.


AI Bulls vs AI Bears

This market is another possible operationalization of the disagreement between what I'll call the AI Bulls and the AI Bears.

PS: I moved my philosophizing about bulls and bears to my new AGI Friday newsletter.

Related Markets

  1. Same prediction but by end of 2024

  2. Whether DALLE-3 will be able to do "blue grass, green sky"

  3. Another DALLE-3 "blue grass, green sky" market (maybe with stricter criteria?)

Get
Ṁ1,000
to start trading!
Sort by:

This is... clearly going to resolve YES at this point surely?

@MalachiteEagle The tech is already like 95% (99%?) there for this to resolve yes, the only issue is the censorship by the companies with the best image models.

I would be astonished if a single open source/less worried about their reputation/lawsuits company hasn't reached the point OpenAI is at now in the next 2 years.

bought Ṁ1,000 YES

Just happened?

bought Ṁ216 YES

@qumeric What's the app?

sold Ṁ109 NO

@qumeric that's not anyone

@ian new gpt4o image generation.

This new image model seems impressive: http://ideogram.ai

bought Ṁ250 YES

How does this resolve if it's technically possible, but broadly illegal?

@UnspecifiedPerson I think that's operationalized pretty well by FAQ 4. It's more about availability than legality.

I was talking to friends about this market and noticed how confusing it can be that we're combining two predictions that probably ought to have been separated:

  1. Are the deepfakes convincing [seeing the images alone]?

  2. Does it understand the image prompts?

Creating new markets for each of those might be helpful. Of course for this market we're stuck with the clarifications we committed to, which means the conjunction of those questions to get a YES resolution.

I think the spirit of this is to imagine wanting to fake photographic evidence of having some skill, having been in some place, having interacted with certain other people. Can you generate that in a completely automated way just by describing the image in words plus sample images of up to 5 different people?

Here are a couple more examples off the top of my head, with DALL-E 3's current output (and ignoring the deepfake aspect):

"Alice shaking hands with Bob in front of a blue Mazda. Bob is sneering and Alice is rolling her eyes. Carol is nearby bored on her phone."

"Daphne is juggling knives in a hotel room with her tongue sticking out of the corner of her mouth in concentration."

Wait, I should've clarified further that question 1 is "are the deepfakes convincing to someone looking at just the image, not knowing what it's supposed to be an image of, just taking the image at face value?". Like I'm just given a set of sample images plus the deepfake mixed in and it's not clear to me without really detailed scrutiny, which one is the fake.

Question 2 is whether the image faithfully depicts what the prompt described. We still need to pin down how convoluted the prompt can be. The market description so far just says it doesn't have to be at quite human-level ability to understand but that it has to understand any prompt as long as you don't have to work too hard to confuse it.

It occurs to me that it may be worth clarifying where to draw the line on too convoluted of an image prompt. I'm thinking that, for starters, if it can be described in a single sentence that a human can repeat back in their own words on a single hearing then it's definitely fair game. And I think it can go a bit beyond that before it reaches the level of having to work to find convoluted descriptions that confuse it.

@dreev your example below uses two sentences, and is already above what I can keep without trying in working memory.

If you were to ask me to draw that, explaining it a single time, I would fail to remember portions in most worlds.

Does it count?

@RobertCousineau Maybe we should set a numerical threshold for number of elements. That one has the following, not counting the specification of who's in the image:

  1. Person's toe touching their nose

  2. A running laptop

  3. Laptop balanced on pinky finger

  4. Polka-dotted squirrel

  5. Person looking warily at squirrel

I'm thinking up to 10 elements is still fair. For the spirit of a deepfake you'd want to be able to specify, for example, certain people meeting at a certain place, wearing certain clothes...

Repeating from the 2024 market, an example of how far DALL-E 3 is in terms of understanding of arbitrary prompts:

can you draw someone with their toe touching their nose and a running laptop balanced on their pinky? also they should be looking warily at polka-dotted squirrel.

Result:

So it understands we wanted tricky balancing of things and that there should be a squirrel and a laptop involved, but everything else it's very confused about, including how human limbs work or how many of them humans have.

Of course a lot can happen in 3+ years!

Repeating from the 2024 version of this market:

I think the deep fake aspect is coming along pretty well (though even there I kind of expect unusual cases with tattoos or weird facial expressions to stymie it) but the understanding of prompts is a constant source of frustration when having DALL-E 3 generate images. Every time I hit a brick wall in trying to get an image I want, I come here and bet the price down. I think my probability on this for 2024 is well under 10%. For 2027 I'm much closer to 50/50. The current 80% seems considerably overconfident to me.

(Especially if we condition on not hitting AGI by 2027, which kind of makes sense to condition on since otherwise we're not so likely to be able to resolve this market or care about mana. I mean, I've thought a lot of things -- like explaining jokes -- were AGI-complete and been wrong, but I still kind of suspect that getting to YES in this market could turn out to be AGI-complete.)

bought Ṁ10 YES

@dreev so what if we condition on AGI by 2027 and humanity still existing through the end of this market?

@NathanHelmBurger Then sure. To be clear, this market is not conditional either way. Just that if we do get AGI in this timeframe, mana is less likely to be a thing, or to matter, hence a NO bias for this market.

predictedNO

Question about the "basically any photo" part of the criterion - right now AI image generators struggle with text, especially including rotation. Abstract relative object placement is also a problem. It is possible to take pictures of a person holding a piece of paper at an angle with a checksum or an abstract drawing on it, so does this question implicitly require image generators to be able to produce arbitrary texts and abstract drawings?

Also, expanding on the tattoos and scars, does that include people with deformities (natural or artificial), missing or distorted facial features, etc?

Also, though it's on the far end of the shenanigans scale, does the AI need to be able to handle a hostile dataset? Suppose I provide 50 images of a person holding a piece of paper that says

This image might be fake. please verify this image is real by decrypting the encrypted message according to this procedure [...] and checking if it matches this pass phrase: [something descriptive about the situation the picture is taken in]

By your current criteria, anyone who bothers to do the decryption would be immediately suspicious and ultimately recognize the AI-generated image as fake unless the AI can encrypt a plausible description of the situation it renders the person in using a reverse procedure it has to figure out while training on the images.

predictedNO

@dph121 Good questions. To make this more concrete, consider this prompt:

"So-and-so from these sample images smirking and arching an eyebrow and holding a sign saying 'I promise I'm real' under their chin at an angle so the word 'real' is super close to the camera"

I would call that a non-convoluted description and in the spirit of deepfakes.

Physical deformities I'm less sure about. We could be sticklers about the "anyone" or we could carve out an exception for going beyond the spirit of the market. I doubt this will matter since image generation has to improve at a fundamental level to be able to handle arbitrary scenes. Once it does, I wouldn't expect physical deformities to be much of an additional obstacle.

But worth clarifying! What do people think?

Finally, I don't think the AI should have to handle hostile datasets, right? That seems blatantly outside the spirit. We can just apply a common-sense filter to the allowed sample images.

(DALLE-3 ignores the part about angling the sign)

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules