How high can DALLE-3 count?
57
698
2.6k
resolved Oct 11
100%87%
1-5
10%
6-10
1.5%
11-15
0.7%
16-20
0.3%
21-30
0.3%
31-50
0.3%
51-100
0.3%
101+

Upon release, I'll give DALLE-3 the prompt "Exactly N circles on a white background", and generate 4 images.

I will start with N = 1. If 2/4 or more of the images have N (no more, no less) roughly circular shapes, I will increment N by 1.

I will repeat this until less than 2 of the images have N roughly circular shapes. This market will resolve to the option that contains N-1.

For example, if when I ask for 21 circular shapes, DALLE produces 1 image that contains 21 circles, and 3 images that contain the wrong number of circles, this market will resolve to the "16-20" option. I will count any circle which is more than half contained in the image, and not count any that are more than half cut off.

For context, this is DALLE-2 with N=5:

Image 1: 6 circles, does not count.

Image 2: 4 circles, does not count.

Image 3: 5 circles (with 2 cut off more than half), would count

Image 4: 4 circles, would not count.

Note that I'm giving some leeway for the concentric circles in image #3.

So this would be 1/4, and DALLE-2 cannot count to 5.

Other Clarifications:

  • I'll use "circle" for n=1

  • Half the circle by area. I'm just going to eyeball it though.

  • I'll resolve N/A if it fails N=1, since I didn't account for it in the options.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,115
2Ṁ119
3Ṁ70
4Ṁ65
5Ṁ60
Sort by:

@DanMan314 I should have bet so much more heavily on 1 - 5...

@DanMan314 these are not even circles... I'll need to find the polygon market

@AlexbGoode They're circles, just shaded, since the projection of a sphere onto a 2d surface is a circle. The spirit of this market was in the counting anyway, hence the description saying "roughly circular shapes".

@DanMan314 sorry for the misunderstanding. Your criteria were super clear, it wasn't meant as a complaint. I just expected the model to have a clearer distinction between a circle and a sphere.

Also thanks for the lecture on how drawings work.

@AlexbGoode haha no problem, sorry wasn't sure whether it was a legitimate critique of the grading.

@DanMan314 How is DALL-E access being given out? Is it a waitlist again?

@IsaacKing No, it has been given to almost everyone who is a chatGPT Plus member now. Their plan was to roll it out to all the paying members in 2 weeks from September 25 or so.

@firstuserhere Is it ever going to be available to free members?

@IsaacKing You can use a version of DALLE-3 via Bing's image creator (that is free); but I doubt OpenAI will make it free anytime soon

@firstuserhere "Almost everyone"- starting to really feel like Altman and I have beef. Do you think he's still mad about what I did in that AirBnB?

@DanMan314 worth being aware that sometimes DALL-E 3 (at least via bing's image creator) returns fewer than 4 images. In this case I suppose you will generate some more, and only use the "first" four generated - considering them to be ordered e.g. from top-left going left to right, then top to bottom

@chrisjbillington Yup, that seems reasonable.

bought Ṁ10 of 1-5 NO

How will you resolve this if it gets 20 circles correctly but not 19

@Thunderstar According to the procedure described, upon failing at 19, it'll resolve to 18 (and 20 will not be checked).

bought Ṁ300 of 11-15 NO

Assuming that what I'm getting from Bing is indeed DALL-E 3 (and indications seem to be that it is), I can't get it to count past 4. It can do 4, but not consistently, such that I expect the test will fail before 4. I haven't seen it generate 2/4 correct with N=5, out of several attempts. At one point it gave this error instead of generating anything.

How do you know it's not generating a 5th white circle in images 2 and 4?

I guess some things are just beyond the reach of human comprehension. Circle has to be visible to count.

Good questions @JimHays

  • I'll use "circle" for n=1

  • Half the circle by area. I'm just going to eyeball it though.

  • I'll resolve N/A if it fails N=1, since I didn't account for it in the options.

bought Ṁ20 of 101+ NO

What’s the resolution if it somehow fails N=1?

bought Ṁ10 of 1-5 YES

@JimHays I would assume that it would resolve to the "1 - 6" bucket, which seems to be most in line with the spirit of the market.

Half the circle by area or perimeter? This could matter for circles in the corners.

Will N=1 still use the word “circles”?

“Exactly 1 circles on a white background”