Upon release, I'll give DALLE-3 the prompt "Exactly N circles on a white background", and generate 4 images.
I will start with N = 1. If 2/4 or more of the images have N (no more, no less) roughly circular shapes, I will increment N by 1.
I will repeat this until less than 2 of the images have N roughly circular shapes. This market will resolve to the option that contains N-1.
For example, if when I ask for 21 circular shapes, DALLE produces 1 image that contains 21 circles, and 3 images that contain the wrong number of circles, this market will resolve to the "16-20" option. I will count any circle which is more than half contained in the image, and not count any that are more than half cut off.
For context, this is DALLE-2 with N=5:
![](https://firebasestorage.googleapis.com/v0/b/mantic-markets.appspot.com/o/user-images%2Fdefault%2Fh3KrHcEsq1.png?alt=media&token=007ae1e6-2af5-4217-ad59-368a0260fafd)
Image 1: 6 circles, does not count.
Image 2: 4 circles, does not count.
Image 3: 5 circles (with 2 cut off more than half), would count
Image 4: 4 circles, would not count.
Note that I'm giving some leeway for the concentric circles in image #3.
So this would be 1/4, and DALLE-2 cannot count to 5.
Other Clarifications:
I'll use "circle" for n=1
Half the circle by area. I'm just going to eyeball it though.
I'll resolve N/A if it fails N=1, since I didn't account for it in the options.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ1,115 | |
2 | Ṁ119 | |
3 | Ṁ70 | |
4 | Ṁ65 | |
5 | Ṁ60 |
I got access to DALLE-3 this morning. Here is the test: https://docs.google.com/presentation/d/1xulIXOYSNyNCFk6v_zkd3neeAk3ObuaTMdwZoEsVEEQ/edit?usp=sharing
The result is 4, I believe fairly unambiguously. A clear failure on 5 with 0/4 images succeeding.
@AlexbGoode They're circles, just shaded, since the projection of a sphere onto a 2d surface is a circle. The spirit of this market was in the counting anyway, hence the description saying "roughly circular shapes".
@DanMan314 sorry for the misunderstanding. Your criteria were super clear, it wasn't meant as a complaint. I just expected the model to have a clearer distinction between a circle and a sphere.
Also thanks for the lecture on how drawings work.
@IsaacKing No, it has been given to almost everyone who is a chatGPT Plus member now. Their plan was to roll it out to all the paying members in 2 weeks from September 25 or so.
@IsaacKing You can use a version of DALLE-3 via Bing's image creator (that is free); but I doubt OpenAI will make it free anytime soon
@firstuserhere "Almost everyone"- starting to really feel like Altman and I have beef. Do you think he's still mad about what I did in that AirBnB?
@DanMan314 worth being aware that sometimes DALL-E 3 (at least via bing's image creator) returns fewer than 4 images. In this case I suppose you will generate some more, and only use the "first" four generated - considering them to be ordered e.g. from top-left going left to right, then top to bottom
@Thunderstar According to the procedure described, upon failing at 19, it'll resolve to 18 (and 20 will not be checked).
Assuming that what I'm getting from Bing is indeed DALL-E 3 (and indications seem to be that it is), I can't get it to count past 4. It can do 4, but not consistently, such that I expect the test will fail before 4. I haven't seen it generate 2/4 correct with N=5, out of several attempts. At one point it gave this error instead of generating anything.
![](https://firebasestorage.googleapis.com/v0/b/mantic-markets.appspot.com/o/user-images%2Fdefault%2Fo5ERTNnmiW.png?alt=media&token=4806e3dc-3958-446c-9291-79155b588dae)
How do you know it's not generating a 5th white circle in images 2 and 4?
Some loose evidence on counting I've seen so far:
https://x.com/OfficialLoganK/status/1704871244808822967?s=20
Good questions @JimHays
I'll use "circle" for n=1
Half the circle by area. I'm just going to eyeball it though.
I'll resolve N/A if it fails N=1, since I didn't account for it in the options.
@JimHays I would assume that it would resolve to the "1 - 6" bucket, which seems to be most in line with the spirit of the market.