Will any image model be able to draw a pentagon before 2025?
229
1.4k
1.1k
2025
66%
chance

Current image models are terrible at this. (That was tested on DALL-E 2, but DALL-E 3 is no better.)

The image model must get the correct number of sides on at least 95% of tries per prompt. Other details do not have to be correct. Any reasonable prompt that the average mathematically-literate human would easily understand as asking it to draw a pentagon must be responded to correctly. If the image model is not publicly available, I must be confident that its answers are not being cherry-picked.

If the input is fed through an LLM or some other system before going into the image model, this pre-processing will be avoided if I can easily do so, and otherwise it will not.

Pretty much neural network counts, even if it's multimodal and can output stuff other than images. I will ignore any special-purpose image model like one that was trained only to generate simple polygons. It must draw the image itself, not find it online or write code to generate it.

Get Ṁ600 play money
Sort by:

I'm starting to think that either people cannot understand this market, or I'm wildly misunderstanding it.

Can I get confirmation that "Draw a rectangle, but with 1 extra side" must result in the image model giving a correct answer 95% of the time?

"Any reasonable prompt that the average mathematically-literate human would easily understand as asking it to draw a pentagon must be responded to correctly." I believe that any mathematically-literate person would easily draw a pentagon in response to the above prompt.

bought Ṁ100 NO from 75% to 69%
bought Ṁ100 YES from 69% to 71%
sold Ṁ68 YES

@ForTruth I tried your prompt and it worked the first time, then I bought a bunch of yes. Then I tried it a few more times and it failed (GPT-4o), so it's not near 95% atm. Sold most of my hasty 'yes', but I still think on balance there's a decent chance at this by the end of the year. 50-70% seems about right.

@Grizzimo I'd consider myself mathematically literate, but I don't know what "Draw a rectangle, but with 1 extra side" is intending to ask for. It seems to be asking for a specific type of rectangle that somehow has an extra side. But a five-sided shape isn't a rectangle, so that doesn't seem to be what's wanted. Maybe something like this is wanted?

Or perhaps a degenerate pentagon where three consecutive points are co-linear?


How about "Draw a polygon with one more side than a quadrilateral" as an alternative?

@JimHays Generally "but" precedes an exception to the prior statement, superseding it. Although yes the average interpretation is highly subjective and if this turned out to be the only statement that the image model has trouble with then I could see the argument to ignore it.

I'm mostly seeking confirmation that the market specifies that any reasonable prompt must have a 95% success rate for a YES resolution. A 95% success rate on any single prompt is not sufficient. In other words I believe it would need to succeed at your prompt 95% of the time, as well as succeed at prompts of the following nature:

"Draw a shape with more sides than a square and fewer than a hexagon."

"Draw a 5-gon."

"Draw a regular polygon with interior angles of 108 degrees."

"Approximate a circle as closely as possible using exactly 5 line segments."

If I'm right about my interpretation in general, but wrong about one of the above prompts in specific, then that would be good to know. If I am somehow completely fundamentally wrong about how this market will resolve, then that would also be very good to know.

Google Gemini Advanced can generate images of the US Pentagon.

@LeeWoods This doesn’t even meet “The image model must get the correct number of sides on at least 95% of tries per prompt”

Does outputting to SVG count?

@euclaise Doesn't matter what the output file format is, but it has to be drawing it "itself", not writing code to do it for it.

@IsaacKing note that you did specifically exclude an example below where ChatGPT generated an SVG. I don't know if that example involved it writing code, but if that's the reason you excluded it, it wasn't apparent.

SVGs kind of are code making it particularly blurry in their case.

@chrisjbillington I would argue that explicitly asking for any specific file format should immediately exclude the prompt from consideration, since no "mathematically-literate human would easily understand [that] as asking it to draw a pentagon."

I agree that SVG should probably be excluded entirely, since a human responding to a request to draw a pentagon by writing SVG does not feel like a "correct" response. It's a fine line, but I'd say if the model produces human-readable text, then that text cannot also be considered an image.

I'd really like clarification on what exactly counts as a correct image for this market.

@IsaacKing Sure but we have to define what "drawing" is. SVG stores images as code that describes shapes which make up the image - which is different from pixel rendering, though not necessarily less valid.

@ForTruth I don't think the readable text part matters - there are pixel image formats which store the image as only printable ASCII, but that decode directly to pixels. Likewise, many vector formats use a non-printable bytecode.

@euclaise I think you misunderstand me: being printable ASCII does not make something readable text. If the model outputs non-readable bytecode then I would say that's valid, even if it's a vector format. My point is that the file should not be human-readable, or human-writeable.

Allowing human-readable output changes this from an image model problem into just another text generation problem, which I believe is exactly what is trying to be avoided by excluding code.

If I can sit down and write SVG code to draw a pentagon, then an AI doing the same only proves that it can mimic my text writing capabilities, not my image drawing capabilities.

Still sucks at it to create images.

bought Ṁ450 YES

I didn’t test in depth but 4o got this first try

bought Ṁ1,250 NO

@JCE that's not an image model, that's a text model writing a Python script

@chrisjbillington Ah I see the problem.

GPT-4o can output images. Does this count as an image model?

@RaulCavalcante Yep. Anything that can output a general range of images counts, even if it can also do other stuff.

bought Ṁ25 YES

@RaulCavalcante This is totally gonna resolve yes in a few weeks.

bought Ṁ10 NO from 34% to 33%
bought Ṁ100 YES from 72% to 74%

Your prompt "A simple geometric pentagon on a plain white background. The pentagon should have five equal sides and angles, drawn in black, with no additional details or decorations" has a Flesch-Kincaid Grade Level of 9.7, which is well above the literacy of an average human.

Given this fact, how exactly are you determining whether a prompt would be understood by an average-intelligence human? I feel like you'd need to do a research study to figure out if most people draw a pentagon when instructed to do so. I'm fairly certain that far fewer than 95% would succeed, and quite possibly fewer than 50%.

@ForTruth Hmm. Fair point. How about average person who graduated high school?

bought Ṁ30 NO

@IsaacKing That's probably workable, so long as your example prompt remains about the most difficult one to understand.

Probably the type of prompts I'm most unclear on are also the simplest: "Draw a pentagon." Would that need to have a 95% success rate, with no further clarification? It's unclear to me if the average high school graduate knows the word "pentagon" even if the term is clearly taught in school.

Since I'm asking, I might as well also throw in another question: Are the number of sides in the image determined solely by the 2d outline, or by the viewer's impression? That is, if the image model draws some complex 3d object but it appears to have 5 clear sides if viewed as being only 2d (albeit with complex internal markings), then is that considered correct or incorrect?

@IsaacKing So, about model eligibility:
- Does any image model in general count?
- Does any image model not trained explicitly for pentagons count?
- Does the model have to be good or important or notable?

@bohaska Any general-purpose model counts. I won't count one that was specifically trained to create pentagons, or geometric shapes.

More related questions