A public AI generator can correctly make images of complex 3d object stacking, Jun 1 2024
10
65
210
Jun 2
22%
chance

If any public AI image generator (which I can access online for <100$/month) can successfully handle the following prompt 15/20 times, this is YES. I will run the prompt 5x to produce 4 images and use my judgment to evaluate.

The prompt is: "A small green cube on top of a large red cylinder which is on top of a blue torus". I will judge pretty strictly - it has to have those elements a little bit. It can have other background or side elements, but they should not be the focus. The image should be of what is requested.

If necessary, I will vary the colors/shapes/order to verify if it's a real capability or not, although that is very unlikely to be necessary.

Specific prompt generation

In general, I will test the prompt above. If it seems the engine is hardcoding that specific one, more general but similar prompts will be generated in this way, which doesn't increase the complexity:

Definitions

adjective = [ small | large | <none> ]

color [ red | green | purple | yellow | black | white ]

shape = [cube | cylinder | torus | pyramid | sphere]

object = A [adjective] [color] [shape]

relation = [on top of | next to | behind | in front of]

prompt =

"a [object] [relation] a [object] which is [relation] [object]"

"A small green cube on top of a large red cylinder which is on top of a blue torus"

Here are some current best attempts:

Bing Image Creator 9/2023

Midjourney 5.2

Ideogram

Get Ṁ200 play money
Sort by:

@Ernie Or will you vary the number in the stack, and if so, across what range? And will you vary the object/color, and if so, across what range? I really liked Carson's precision in his finger market, which made it a nice object-level prediction instead of predicting the choices of the market creator.

I will test image generation models by asking "Please produce an image of two hands, with the left having # fingers and the right having # fingers". The number of fingers will be between 3 and 7.

@Jacy yes, good point. I've expanded the definition, in the description

I still think it's best to continue using the original prompt, unless there's a reason to think that it's not representative of the powers of the generators.

@Ernie I'm not sure how you'll detect hard-coding, but I think that combinatorial approach is great! Well-said.

@Jacy e.g.

This is dalle3, 12/27/2023 - I'm picking the best of 2 - this is not for judgment, just examples of what it can do now.

Unofficial test of MJ v6

Definitely better than before but not there yet

More related questions