Will there be an image AI by 2025 that understands which subjects are doing which actions and in which positions?

Question

Note: this is for image generating AIs, not AIs that describe what is happening in an existing image.

Every new DALL-E and Midjourney version I have tried the following prompt:

"A red sports car with an old lady driving it and eating a live octopus, with a blue footed booby eating a hamburger in the passenger seat."

So far all versions fail to get all of the details right. They mix up who is eating the octopus vs who is eating the hamburger, who is driving vs in the passenger seat, who is eating vs getting eaten, etc.

YES criteria would be being able to ask an image generating AI 10 random variations on this prompt (substituting random variants of vehicle, person, animals, seat etc) consecutively and having all subjects be in the correct places doing the correct actions all 10 times.

Manifold Markets · Accepted Answer

No — resolved on Jan 22, 2025 by Manifold Markets prediction market.

#	Trader	Total profit
1		Ṁ39
2		Ṁ20
3		Ṁ19
4		Ṁ2
5		Ṁ0

🏅 Top traders

People are also trading

Related questions