
In order to resolve YES, someone (could be myself) must provide an image model and one or two prompts to test it with.
The image model is any program capable of generating arbitrary images. It can use any method to do so, but it must be general. An LLM that writes code to draw simple geometric shapes does not qualify.
If there's any question over whether a program should qualify, I'll require that it's able to generate the polygons with some other quality that current image models can already do. Maybe it has to be in a specific style, or a person is holding the polygon, or whatever. The submitter can choose anything that's sufficient to prove this is a general image-generation program.
If the input is fed through an LLM or some other system before going into the image model, this pre-processing will be avoided if I can easily do so, and otherwise it will not.
For side numbers 2-8, I will use the shape names from triangle to octagon. For side numbers >8, I will enter a number of sides, either using digits or with spelled out numbers, submitter's choice. I'll test all numbers from 9-20, and 5 random numbers from 21-50.
The prompt can be anything, but it must be consistent. The same string of text except for changing the descriptor of the shape I want. (One prompt for 3-8 and a different prompt for >8 is fine.)
For each attempt, at least 50% of the generated images must be unambiguously the specified shape. It's ok if there's other stuff in the picture, the polygon is pictured at an angle, or there are other distractions. But if there's any debate over whether the specified regular polygon is actually in the image somewhere, it doesn't count. If the resolution is too low for me to tell, I will assume it's not the correct shape.
If any attempt fails, the entire test fails. It must pass for every side number I test.
@IsaacKing Perhaps it means it would reach 100 because 192>100, and the properties of a 192-sided figure are similar to that of a 100-sided figure?
@jwith I believe it would have taken you less time to google the answer than to make this comment. :)
@MalachiteEagle It’s very low compared to the original resolution criteria, and in my opinion indicates a significant departure from the original version
@JimHays +1 on this being not an ambiguity but an explicit inconsistency with the original criteria! Concerning.
@IsaacKing wait what? I'm so confused by this change. I assumed it was just triangles through octagons? HUMANS can't even reliably produce images of a 49-sided polygon.
@bens This market isn’t benchmarked against human performance though, so I’m not sure why human performance would be relevant here?
Even sticking with named regular polygons you’ve got the chiliagon, myriagon, and megagon, which most people have never heard of, but which should all be necessary for a YES
@MalachiteEagle @bens Not sure why you guys had that impression, the market has always been very clearly about all polygons. The title says "every regular polygon", and the original description confirmed explicitly that those above 8 sides are included.
@JimHays @Jacy I added the 50-side cap because people were pointing out that I cannot test every positive integer, and thus need to test only a subset. There were questions about what this subset would be, so I formalized it now to avoid arguments about it later. (e.g. a NO bettor continuously claiming "well maybe none of the ones you've tested so far have failed, but I want you to test some more".)
I don't foresee this mattering? Any number >50 is just going to look like a circle anyway. But if you think this is relevant, I'll happily test higher numbers too. Feel free to suggest a procedure you'd be comfortable with. It's not my intention to change the market from its original, just to remove ambiguity in how it will resolve. (There were some concerns over the resolution of my previous market on pentagons, I don't want a repeat of that.)
@IsaacKing Thanks for working on formalizing the criteria, I do think that’s valuable to work out ahead of time.
While you’re right that above a certain limit the shapes should all look approximately identical, iI think it would be valuable, if the test even goes that far, to test some larger numbers to ensure that the model does properly generalize.
I’d propose at least adding chiliagon, myriagon and megagon (knowing these terms shouldn’t be part of the test, so if it asks what they are, giving a definition is fine), as well as the 21 digit number mentioned below in the second comment thread on the market.
If all of these work correctly, I think NO bettors should get a couple days to vote on up to, say, at least two additional prompts to test, in case they discover that there’s some kind of out of distribution error that the model struggles with that’s hard to predict upfront (E.g., it can’t do shapes where the number of sides looks like a year, or it messes up on 404, 538, or other numbers that have strong meaning associated with them).
To address this ahead of time, what if the model refuses to draw certain shapes, such as a 420- or 666-sided polygon?
@JimHays Ok here's a simple solution: the NO bettors can submit any ten integers > 20 that still fit within the prompt length limit and I'll include them in the test. (Must be denoted in decimal.) Is that satisfactory?
@IsaacKing okay but obviously it can't generate a 10^3-gon or whatever, since there's a finite number of pixels in an image? I think a reasonable assumption was that you meant the simple polygons that children know the names of, like "pentagon", "hexagon", and "octagon", and not literally infinite numbers of shapes
@bens This is already specified below that if the sides cannot be distinguished based on the image generator’s max resolution, it should still look correct for an embedded in that size grid. E.g, a regular 1,000-gon will look like a circle at most resolutions.
@JimHays hmm, okay, makes sense, I guess, although I have no clue how one would attempt to judge that
@bens The goal is to test whether it actually "understands", in a meaningful way, what a regular polygon is. If a human artist could draw 8 sides but upon being asked to draw 9 started giving me totally random geometric shapes instead, I would have some concerns about their mental capabilities.