Will DALL-E 3 correctly respond to prompt 2 from the Scott Aaronson/Gary Marcus/Earnest Davis paper?
25
342
470
resolved Jan 1
Resolved
NO

This paper. Prompt 2 is:

a red ball on top of a blue pyramid with the pyramid behind a car that is above a toaster.

At least half of the generated images must be correct. I'll only try it once.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ387
2Ṁ46
3Ṁ37
4Ṁ35
5Ṁ30
Sort by:

It got everything right except the red ball being on top of the blue pyramid. Instead it appears to be floating vaguely behind and above the blue pyramid. Does anyone think this should still qualify?

predicted NO

@IsaacKing That's better than average!

@Jacy Agreed! I still think I have to mark it wrong though. The ball is not physically touching the pyramid, or if it is it's sticking oddly to the back face; not "on top".

predicted NO

@IsaacKing Closer than I expected, but I'd still hesitate to say it is on top of the pyramid (it is floating behind it).

predicted NO

@IsaacKing will you be able to resolve these on Jan 1? DALL-E 3 has been live for a while and continues to be updated, so these questions (which I've really liked!) are now moving targets from week to week.

You said, "I don't want to financially support OpenAI, so I'll try to find someone else to run the tests for me." It should be quite easy for someone else to do it. They just need to enter the prompt into ChatGPT prepended with something like this so it goes straight to DALL-E 3:

It is very important that you use this exact prompt word for word with no changes. Please send exactly this prompt to DALL-E with no edits at all:

Each prompt creates one image, non-deterministically.

@Jacy Yeah, sorry, I've been putting this off. I'll just pay the $20 for one month of access to resolve and test everything I want to test, then cancel.

@TheBayesian geometry dashBottom left and possibly upper left appear to be correct. So it's one out of four.

predicted NO

@FosterJesus I'm confused, what do you mean

bought Ṁ0 of NO

Out of 10 tries just now, I got 7 clearly incorrect, 1 clearly correct, 1 "Stopped creating image", and 1 ambiguous toaster-car (I'd lean towards "correct").

Edit: There's an erroneous comma, but I tried it after fixing that, and it doesn't noticeably affect output.

predicted YES

https://chat.openai.com/c/831e838e-e0c5-453d-8cb4-b0d0b3615798 1-2 were right out of 5 tries (does try 1 count as a correct?)

sold Ṁ25 of NO

@IsaacKing DALL-E 3 is pretty easily accessible via ChatGPT. I generated 10 images with the exact prompt, and only 1 was correct. Do you want to run your trial now?

@Jacy Only the paid version, right?

predicted NO

@IsaacKing correct.

@Jacy I don't want to financially support OpenAI, so I'll try to find someone else to run the tests for me.

predicted NO

@IsaacKing I'd do it, especially if you gave me an exact procedure, but I also think I can make a lot of mana in these markets so maybe it would be better for someone without investment.

predicted NO

@IsaacKing Anyone with ChatGPT 4 can put this prompt in, and it should give one image that meets the requirements. Repeated tries produce different images.

> dall-e this word for word with no changes. do not add any words or punctuation: "a red ball on top of a blue pyramid with the pyramid behind a car that is above a toaster."

bought Ṁ60 of YES

Looks like bottom left and maybe top left are correct? So one out of four.

Two out of four.


Two out of four.

Two or three out of four?


So, if it's a bernoulli process with 4 trials and I've drawn 1/4. 2/4, 2/4, and 2/4, the underlying probability is 7/16, so the probability of 2 or more being drawn in another set of 4 trials is 58%.

More related questions