If any public AI image generator (which I can access online for <100$/month) can successfully handle the following prompt 15/20 times, this is YES. I will run the prompt 5x to produce 4 images and use my judgment to evaluate.
The prompt is: "A small green cube on top of a large red cylinder which is on top of a blue torus". I will judge pretty strictly - it has to have those elements a little bit. It can have other background or side elements, but they should not be the focus. The image should be of what is requested.
If necessary, I will vary the colors/shapes/order to verify if it's a real capability or not, although that is very unlikely to be necessary.
Specific prompt generation
In general, I will test the prompt above. If it seems the engine is hardcoding that specific one, more general but similar prompts will be generated in this way, which doesn't increase the complexity:
Definitions
adjective = [ small | large | <none> ]
color [ red | green | purple | yellow | black | white ]
shape = [cube | cylinder | torus | pyramid | sphere]
object = A [adjective] [color] [shape]
relation = [on top of | next to | behind | in front of]
prompt =
"a [object] [relation] a [object] which is [relation] [object]"
"A small green cube on top of a large red cylinder which is on top of a blue torus"
Here are some current best attempts:
Bing Image Creator 9/2023
Midjourney 5.2
Ideogram
Related questions
@Ernie Or will you vary the number in the stack, and if so, across what range? And will you vary the object/color, and if so, across what range? I really liked Carson's precision in his finger market, which made it a nice object-level prediction instead of predicting the choices of the market creator.
I will test image generation models by asking "Please produce an image of two hands, with the left having # fingers and the right having # fingers". The number of fingers will be between 3 and 7.
@Jacy yes, good point. I've expanded the definition, in the description
I still think it's best to continue using the original prompt, unless there's a reason to think that it's not representative of the powers of the generators.
@Ernie I'm not sure how you'll detect hard-coding, but I think that combinatorial approach is great! Well-said.
@Jacy e.g.
This is dalle3, 12/27/2023 - I'm picking the best of 2 - this is not for judgment, just examples of what it can do now.