By 2027-12-31, will we be able to synthesize basically any photo from a text description and sample images of people we want in the photo?
Resolution criteria: I'm doing my best to make this objective by adding to the FAQ below and am hopeful that it won't come down to a judgment call. If it does, I'll do my best to gather input and hew to the spirit of the prediction. I'm betting in this market myself and promise to be transparent about my reasoning in the comments. AMA!
FAQ
1. Does the photo have to be undetectable as AI-generated?
No, if the photo is perfectly believable on casual inspection but can be determined to be AI-generated with some forensics, that still counts as YES.
Of course detecting that the image is fake via real-world knowledge -- like knowing things about the human test subjects or just knowing that the depicted scene is fantastical -- isn't grounds for resolving NO.
2. How well does it have to understand the text descriptions?
Much better than 2022 DALL-E but it doesn't have to be quite human-level. If you have to work to find convoluted descriptions that confuse it, that still counts as YES. If there are perfectly clear descriptions that you can't get it to understand even with multiple rephrasings, that's a NO.
UPDATE: And still better than 2024 DALL-E.
3. Does it have to be able to generate deepfakes of anyone?
Yes, if it can only do public figures or any fixed database of people, that doesn't count. To resolve YES it needs to work for anyone you provide sample images of. Even people who do not exist.
4. Does it have to be available to anyone?
Yes, if OpenAI or Google or such demonstrate the ability but there's no reasonable way for outsiders to try it, this resolves NO. This matters because the question isn't just about the cutting edge of the technology but also how impactful/dangerous it will be.
And we're using $100 as the threshold for how much time/cost it can take for an outsider to get an image like this generated and still count as "available to anyone".
5. Any constraints on the sample images?
Up to 50 of them. If it required a massive number of sample images or the sample images had to be taken with special equipment, like doing a 3D body or face scan or something, that's getting outside the spirit of the question. But of course it's fine for the AI to make up any details about the subjects that are impossible to tell from the sample images.
The rule of thumb is that someone judging the deepfake, who only knows the subjects from the same sample images the AI saw, would not be immediately suspicious that the generated image was fake. (Unless the image was depicting something fantastical. You get the idea.)
6. How instant does it have to be?
One hour. The "instant" in the title is more about being on-demand and fully automated than about exactly how long the image takes to generate.
7. Does it have to nail it on the first try?
No, as William Ehlhardt points out, there's a beyond-astronomical space of possible images so the ability to generate, say, one in a hundred that match the prompt isn't very much less gobsmacking than the ability to do it on the first try. The AI is still meaningfully succeeding at the task. We'll go with best-of-ten when resolving this market.
8. Does it have to be able to do NSFW photos?
No, if it has filters that prevent it from generating reasonably narrow/specific categories of images such as NSFW, that doesn't prevent this from resolving YES. The only way for content filtering to cause a NO resolution is if it's somehow so broad that it covers up an inability to handle arbitrary descriptions.
9. Does it have to be able to handle multiple different people in the same image?
Yes, up to 5 different people all composed in the image doing whatever the prompt describes.
10. Does it have to preserve tattoos and scars?
Yes, but not necessarily well enough to withstand side-by-side scrutiny. (See FAQ1 about forensics.)
If the tattoo or whatever is inconsistently visible in the sample photos such that the human judge doesn't see the deepfake as the clear odd-one-out, then a failure to preserve the feature wouldn't necessarily yield a NO resolution.
Again, we're talking about fooling someone who doesn't know the human subjects in real life and is not doing meticulous side-by-side comparison.
11. What if it only achieves all this for, say, white males?
Since "anyone" appears prominently in the title, I'm comfortable saying that if the AI can't handle racial/gender/whatever minorities, that'd be a NO for this market.
AI Bulls vs AI Bears
This market is another possible operationalization of the disagreement between what I'll call the AI Bulls and the AI Bears.
PS: I moved my philosophizing about bulls and bears to my new AGI Friday newsletter.