Will inference-time scaling improve the generation of images with correct geometric shapes? (in generative AI)

Ṁ100Ṁ86

2027

76%

chance

ALL

Resolves as YES if there is strong evidence that inference-time scaling methods significantly improve the generation of images with correct geometric shapes before January 1st 2027.

This must have been demonstrated in a leading/frontier text-to-image model, and clearly indicated by the developers/researchers (e.g. in a blog post, podcast or research paper).

Market context

Generative AI

Artificial Intelligence

AI Image Generation

Geometry

Inference-Time Scaling

Get

1,000

to start trading!

People are also trading

Will there be a test-time scaling overhang for AI aesthetics?

79% chance

By 2030, will Ai picture generators be able to do basic geometry and projections?

91% chance

Will Anthropic's first image generation capability be based on an in-house image model?

40% chance

AI Warning Signs: Will a tool that can easily generate generative AI deepfakes be broadly accessible before 2030?

90% chance

Will an AI agent be able to draw a mid-2024-diffusion-model-quality image in a standard drawing program by 2027?

51% chance

Will geometric deep learning turn out to be as influential an idea as transformers?

17% chance

Will Google add AI image generation as an option to Google Images by the end of 2026?

71% chance

Sort by:

I think this question resolves YES with the latest gpt-image-2 model's image reasoning stack

@mods this question resolves YES

@0xseraphim I also did a test with ChatGPT (free tier):

I'm interested in verifying that inference-time scaling improve the generation of images with correct geometric shapes. Your image generation tools are approximately state of the art, so I would like to come up with a sort of stress test image that would prove that geometric correctness has been achieved. I think we can either workshop a test image prompt or you can just come up with the prompt and generate the image yourself if you feel confident. This should really test the boundary of geometric understanding while also being visually simple enough to verify (eg no wildly complex curves that are technically correct but hard to verify against intent).

I didn't measure anything but the square looks a little squished horizontally, the dashed lines don't meet the corners at C and D exactly, the large circle isn't tangent at R, and the hexagon appears to be mis-specified.

To try and give it another shot I workshopped another prompt:

Draw a square. Inscribe a circle tangent to all four sides. Place a regular octagon on the circle. Draw both diagonals. Mark all intersections.

And that one failed pretty bad.

So: I think we can agree that inference-time scaling methods have been employed for image models like the one ChatGPT uses. The question now is if this has succeeded to "significantly improve the generation of images with correct geometric shapes." My take would be that it has improved compared to a 2025-01-01 baseline, but not dramatically so. Without new evidence I would recommend to leave this open until the end date and see if any new developments occur.

@wasabipesto hmmm this looks like a good test. Is the free tier image model the same as the paying tier image model?

apparently the fee tier model doesn't have the "images with thinking" feature

first example using the plus tier model in chatgpt:

second example using the plus tier model in chatgpt:

@wasabipesto almost got the second one apart from the octagon

@0xseraphim Thanks for running those. My instinct is still to wait until the end of the year when hopefully it is more clear whether this has crossed a significant threshold. If the market closed June 1 I would probably lean towards resolving YES (or maybe ~75%) but I think the bar for resolving a market 6+ months early should be pretty high.