Will DALL-E 3 be capable of creating accurate Japanese text?
21
450Ṁ2166
resolved Oct 12
Resolved
NO

Existing neural image generators are well-known to be very bad at generating text in their images. For example, here's one quick example in DALL-E 2:

Note that, while the actual words are mostly meaningless, the individual glyphs are at least recognizable Latin characters derived from the text specified in the prompt.

With languages using more complicated glyphs, however, the results are even worse—partly due to poor representation in the training data, and partly because the glyphs are more complex overall. Japanese, I expect, is a particular challenge because it makes use of three distinct character sets: hiragana (あいうえお), katakana (アイウエオ), and kanji (猫犬鳥虎像). For example:

The kanji in these signs are mostly a scrambled mess as far as I can tell. More importantly, the image makes no use of hiragana or katakana, even though the requested text is primarily composed of those.

OpenAI recently announced that its new DALL-E 3 image generation model is going to be released soon, and its promotional images feature text fairly prominently, suggesting they have done some amount of work to improve the quality of text generated by the model. Will those improvements also extend to languages like Japanese with non-Latin characters?

Whenever I personally get access to DALL-E 3 (which I should get through an OpenAI Plus subscription), I will prompt it with the text seen above:

A lighted neon sign hanging from a building over a city street at night, bearing the Japanese text "楽園はココだよ," with an arrow urging passers-by to enter the building.

If at least one image in the results contains text that is vaguely recognizable as the above characters, this market will resolve to YES. Otherwise, it resolves to NO. Resolves as N/A if DALL-E 3 is cancelled for some reason.

Considerations:

  • I'm happy to entertain suggestions for other prompts, if someone thinks that particular prompt might pose a problem for the model. I'll probably try a few others on my own, just in case I get bad luck with that particular prompt. I'll still resolve YES if that particular prompt fails, but a selection of other, similar prompts produces generally good results. The only hard requirement for me is that the desired text should include a mixture of kanji, hiragana, and katakana.

  • The early marketing for DALL-E 3 suggests that it may use ChatGPT to refine prompts before generating images. If the default workflow begins with textual refinement before any images are generated, I will go through that workflow as the UI presents it, taking the first suggestion presented at each step. The first set of actual images presented will be the ones I use to resolve the market, unless the UI somehow marks those images as "low quality, needs improvement" or whatever.

  • I will not be betting in this market.

Get
Ṁ1,000
to start trading!

🏅 Top traders

© Manifold Markets, Inc.TermsPrivacy