Will DALL-E 3 be able to get DeWeese lab/Gary Marcus' "red conical block on top of..." prompt a majority of the time?
64
890
1.2K
resolved Oct 11
Resolved
NO

Gary Marcus made a post discussing the Imagen and DALLE-2 model's inability to fully grasp language, particularly around relational undestanding of objects in a prompt: https://garymarcus.substack.com/p/horse-rides-astronaut

OpenAI just released DALLE-3, https://openai.com/dall-e-3, which they claim "represents a leap forward in our ability to generate images that exactly adhere to the text you provide".

Once publicly available, I will run this prompt from DeWeese lab that is discussed heavily in the post:

A red conical block on top of a grey cubic block on top of a blue cylindrical block, with a green cubic block nearby

I will produce 10 images. If 5 or more of the images match the prompt exactly, following the color, shape, and positions specified in the prompt, this market resolves YES. Otherwise, it resolves NO.

I will not bet in this market in case there is ambiguity on some of the images.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ289
2Ṁ270
3Ṁ179
4Ṁ139
5Ṁ116
Sort by:
predicted NO

@DanMan314 If I understand your screenshots correctly, it looks like you gave ChatGPT the prompt, which presumably gave a different prompt to DALL-E 3, which wouldn't match the wording of the resolution criteria (and, in general, often gives very different results).

bought Ṁ30 of YES

@Nikola Whoops sorry these are actually different prompts.

bought Ṁ1,000 of NO

0/20 in my testing with Bing (supposedly DALL-E 3 from today), with enough random variation that I wouldn't be surprised for it to occasionally get one correct.

bought Ṁ200 of YES

@chrisjbillington not convinced it's DALL-E 3. GPT-4 prompts might be boosting image coherence, but I think Bing Image Generator is still DALL-E 2.

predicted NO

@RaulCavalcante If I understand it correctly, these are probably not the first images you'd get with this prompt, but rather a result of you and gpt4 modifying the prompt to get it right after it failed the first time.

predicted NO

@RaulCavalcante it absolutely is DALL-E 3. There's official information:
https://x.com/MParakhin/status/1707857086615548018?s=20
(Mikhail is the head of Bing Search and Bing Chat at Microsoft)

And it's just much better now at generating anything.

bought Ṁ100 of NO

@chrisjbillington I've been playing with bing recently and can confirm it is sub-imagen or whatever the thing the Google was doing, wrt. proper relationships instead of dropping all the keywords on a canvas.