Will any AI by OpenAI be able to create images of English alphabet characters rotated by 90 degrees?
166
1.5K
1.6K
2025
41%
chance

For example,

Draw me the image of the letter "A", rotated 90 degrees clockwise.

The intent here is that the user asks the model to draw the rotated letter and the model is capable of drawing the rotated character. If there's a rotated "A" in the image of a grandma cooking with "A" rotated on her apron... that's not in the spirit of the market.

(DALLE-2 and DALLE-3 fail at this task, at the time of market creation)

(fwiw, Midjourney's current version (V5) also fails at this task)

By the end of 2024, will any image generation model released by OpenAI be able to accomplish this monumental task, for all the characters of the english alphabet.

The rotation doesn't even need to be 90 degrees. I will accept rough approximations.

I will also trade in the market, because I wish to correct the market. There is no conflict of interest between my position and the judgement in cases of close calls.

Get Ṁ200 play money
Sort by:

What if it can do like 23 out of 26 letters?

@firstuserhere Sora can generate images with text. I haven't seen any examples of it so far, but it seems extremely likely that it would be able to rotate text. Would this count for this market?

@3721126 I don't know whether it will count or not, but why does it seem extremely likely that it will be able to rotate letters?

It's kind of an interesting deficiency that otherwise excellent models can't rotate letters, so I don't think there's a general expectation that an otherwise better model would be able to.

I also kind of expect that, in order to be able to generate video, Sora has made some compromises such that per-frame, it might be worse than DALL-E 3.

@chrisjbillington That's a great point. Honestly, it's mostly vibes-based and my working model of Sora's architecture is that it's a scaled-up DiT.

If the issue with the rotated text in latent diffusion models is caused mostly by autoencoder limitations (embedding space not rich enough to capture a rotated letter), then Sora would indeed not make a difference here. To investigate that, I quickly tested it in this notebook: https://colab.research.google.com/drive/1VM4JUT8BL4Kc-O2AQTM5fglZXd7CvDo7

Using even stable-diffusion-v1-4's VAE, the image of the rotated text was reconstructed flawlessly.

So, I would assume that the issue would lie in 1) the denoiser's ability to handle rotated text - not because of a fundamental limitation, but because of a lack of rotated text images in the dataset - or 2) its conditioning.

1) On the images with the fridge magnets from my previous comment, the more the magnets are rotated, the higher the chance that they no longer have the intended shape.

This would probably be mostly addressed by the richer video dataset that includes more rotated objects and varied camera views. I don't expect this to be fully solved however, some minor artifacts reminiscent of this phenomenon still seem to remain, for example in the paper plane video.

2) For the denoiser conditioning: I don't think that the prompt embeddings for the current image models can capture the idea of rotated text well enough. This could be explained by the fact that, in the rare cases where rotated text shows up in images, I wouldn't expect it to be reflected in the image's caption, either because the entire image is rotated by mistake or because such an obvious detail is simply not worth mentioning in the text. For DALL-E the captions were generated by an ML model that probably has capabilities similar to GPT-4V, and it describes my image with the rotated A as "a simple, black Penrose triangle".

In general, DALL-E and other image models don't seem to understand the concept of rotation and other transformations very well. For example, I tried generating a 90-degree rotated car with no success.

Sora seems to have a great "understanding" of rotations, at least across the temporal dimension from self-attention (see for example the feathers in the pigeon video or the family of monsters video from the technical report). Whether that understanding of rotation can be invoked from text conditioning and a rotation specified in the prompt will be faithfully captured is still uncertain, but, from the available examples, it seems to do a decent job of following descriptions of specific movements.

@3721126 It seems very likely that if you prompted Sora with "the letter A spinning clockwise," it would produce some frames that are rotated 90 degrees. That probably doesn't count since it's part of a video, but I wouldn't be entirely surprised if when you generated "single-frame videos," it would return a ~uniform distribution over possible rotations, as though it were a randomly-selected frame from a real video.

@3721126 SORA is an AI by OpenAI, i don't know why you think it'll be better but it does qualify

bought Ṁ10 of NO

Characters are still a little difficult and tend to fall apart after a few big words. I find it unlikely OpenAI would spend the time for a task such as this when they have put so much of their focus into ChatGPT.

bought Ṁ10 of NO

Seems like OpenAI has shifted their focus away from image generation, stable diffusion is the best we're going to get for a while.

bought Ṁ30 of YES

This is Adobe Firefly, so not OpenAI, (I ran out of Bing credits) but I did have some better luck with this prompt:

Top down view of a A1 paper. Lots of blank space. Minimialist. A tiny fridge magnet of the letter a is tilted askew, rotated 90° clockwise. Sideways.

All settings off, photo preset, visual intensity all the way down.

@willMay

Seems like bing doesn't want to rotate the A at all.

This is looking pretty good 🙂

Can others reproduce this?

bought Ṁ30 of YES

You may need to ask for it to be in the style of “classic illustration”

bought Ṁ500 of NO

@GraceKind I'm seeing a poor success rate reproducing this, but also I don't think these are likely to count - although it's a clever way to get it to draw the letters, the current market description and other clarifications downthread seem to imply that we need to be asking it specifically to draw a rotated letter, as opposed to the rotated letter being a consequence of something else we ask for.

(it's maybe not super clear yet exactly what this rules in and out, and there is yet to be any threshold of the required success rate set. Edit: actually, the thresholding 25% of images was mentioned downthread)

(looks like maybe an example of the portrait glitch @3721126 is talking about in the last one there)

predicts YES

Ah you’re right, I think this falls under the “grandma cooking” example, unfortunately.

sold Ṁ197 of YES

@GraceKind Nice! I was trying for a while last night to get a single sideways A and didn't quite get a good example. I tried a superhero flying sideways through the air, but this seems much better.

Unfortunately, I agree with Chris that it probably wouldn't count, as it requires something else other than the letter to be the subject of the image. With a heavy heart, I am exiting all but 10M of my position now 😢

I was wondering if something like this would count, if someone could get it to work as intended:

In this case the subject would actually be the letter A rather than a shirt that incidentally has the letter A on it.

predicts YES

@CDBiddulph Awww that A doing a plank is so cute

predicts YES

The rotated and mirrored text I mentioned in my earlier comments is apparently caused by a known bug with the 1024x1792 aspect ratio in Dall-e 3, see for example:

https://community.openai.com/t/addressing-the-frequent-horizontal-orientation-of-vertical-images-in-dall-e-3/481903

https://community.openai.com/t/orientation-problem-for-vertical-images/482759

https://www.reddit.com/r/ChatGPT/comments/176vjli/dalle_3_in_chatgpt_drawing_content_of_images/

https://www.reddit.com/r/dalle2/comments/175h8nj/reflected_text_and_rotated_layouts_any_tips/

I was able to reproduce it after a few tries in ChatGPT, with the prompt:

"""

Please invoke Dall-e with this exact prompt: A tall full-body portrait of a woman in front of a beautiful panorama. She is wearing a shirt with the text "First User".

Aspect ratio: portrait

"""

Result:

Finding out how to reliably trigger this behavior without using seeds and ideally without adding any other objects to the image could at least solve the case of letters with a horizontal or vertical symmetry. I've already found a brilliantly thought-out prompt for the letter "O", but I'll leave this one as an exercise for the reader.

predicts YES

@3721126 It seems extremely sensitive to the prompt, any small perturbation drastically impacts the "success" rate.

(it's so over)

predicts YES

Here's what I tried. This is a lot harder than I thought I love it.

Letter A, rotated at 90° clockwise, one fourth of the way to a complete rotation around the axis. Second image in a series of four. From Wikipedia. Simple 2D black font on white background. unicode U+2200. svg

I also tried the postcard suggestion:

[TODO: FIX. ROTATE 90 DEGREES. PNG IS WRONG ORIENTATION] vintage postcard with the letter a on it

bought Ṁ0 of NO

@chrisjbillington More limit orders up, come and get 'em!

I’d love to bet on this, but the resution criteria needs to be improved. What kind of prompts are allowed? How are results verified? What success rate is acceptable?

bought Ṁ372 of YES

Solved! Here's my prompt, which I used with Bing Image Creator:

"A perfect, simple black spiral viewed from overhead, made of perfectly clear letter A's in Helvetica font against a white background, forming a perfect circle. It spells the text AAAAAAAA, wrapping around the center of the image. Extremely simple image, created in under a minute in Photoshop."

Here are my results with the first 4 letters of the alphabet, without any cherry-picking after I'd decided on my final prompt. At least 50% of the images for each letter pass, so I think this handily meets the 25% requirement.

A: 4/4 successes.

B: 2/4 successes.

C: 3/4 successes, technically 4/4 if you count the very large (doubtless accidental) C in the top right.

D: 4/4 successes, 3/4 if you don't count the D in the top right, which is at more of a 45 degree angle

bought Ṁ10 NO at 89%
predicts YES

@CDBiddulph YOU ARE AMAZING

predicts NO

@CDBiddulph How does this count…. I thought it would have to be an image of one letter. Seems in the spirit of the market.

If anything literally within the question counts, I could ask chatgpt to use a python script with matplotlib that makes a line graph of a letter rotated 90 degrees.

predicts YES

.

predicts NO

@Soli But an example was given that makes it clear what was meant. I assume doing all the letters in one image would also be cheating if that helped the model, because of the way the example was phrased. That’s what I made my bet based on.

predicts YES

.

predicts NO

@Soli It’s actual success rate for doing so, even with a prompt that’s meant to ease it in, is very low. For every 1 that it successfully print it messed up like 20 others. It always seems to start failing way more around 90 degrees.

Even if this prompt type is allowed, it’s overall success rate is low, even if it did get at least 1 most of the time.

predicts YES

.

predicts NO

@Soli @ShadowyZephyr I see where both of you are coming from, but I think this is just a market where what is and isn't allowed is just way too vague for either of you to be objectively correct.

sold Ṁ1,139 of YES

@Jacy ok i am out

predicts NO

@Jacy ok this is valid. I’m not saying I’m objectively correct I just thought that was the clear spirit of the market, apparently I was wrong though. @firstuserhere write better resolution criteria

predicts NO

@CDBiddulph Impressive!

@ShadowyZephyr the Python script example wouldn't count since that's a totally different thing, though in the future I can imagine models where that line is a bit more blurry.

But yeah, extrapolating the "success rate" requirement to apply to the contents within the image is maybe a reasonable thing to do, to figure out what the spirit of the criteria has to say about images like these. We'll see what @firstuserhere thinks.

@chrisjbillington My thinking here is that asking the model to draw me the image of the letter, rotated 90 degrees clockwise, leads to the model creating an image of the letter, rotated by 90 degrees (As shown in the description example). @CDBiddulph's solution, while clever in itself, does not ask the model to do that. The fact that there's a rotated A is not a property of what the model was generating, but an artifact of cleverness of the user's prompting. So, this does not count (though i'm open to hearing counterarguments)

predicts YES

@firstuserhere Hm, you mentioned in an earlier comment that prompt engineering is allowed, so it's not clear to me exactly what kinds of prompt engineering would/wouldn't work.

An early version of my prompt specified something like "letters on the left and right of the circle are rotated 90 degrees." I guess it probably wouldn't count if I just added an additional sentence like that to the existing prompt, since the cause of the letters being rotated would still have more to do with the request for a spiral pattern than the request for 90-degree-rotated letters (since the same prompt with that sentence ablated also worked). Would it count if the prompt was "an image full of A's; the A's at the top are right-side up, the A's at the bottom are upside down, and the A's to either side are rotated 90 degrees"?

In general, does it matter if there are other letters or letter-attempts in the image? For instance, if the image contained the entire word "SIDEWAYS" at a 90-degree angle, would that count for the letter A?

What about "Describing the A as geometrical shapes without reference to the letter" as @Soli mentioned earlier?

@CDBiddulph Well, yes, Prompt engineering is allowed but it should atleast specify the task you want the model to accomplish, and not get it as a side effect of some totally other task. I will write concrete criteria for the market and answer these other questions in a bit

.

i am so happy i exited my position

.

predicts YES

@Soli Yeah, I thought about exiting when it was at 90% and now wish I had... I think the percentage should be higher than 60% though, so I'm keeping it for now

predicts YES

@firstuserhere I'm pretty confused about the clarification. What does it mean that the rotated A is not a property of what the model was generating? It was generated by the model to fit the user's prompt. Does this mean that the rotated A must be the main object of both the prompt and the generated image? Would you allow something like an entire word that contains A? How about something like a signpost containing a rotated A. Does that count?

More related questions