Will OpenAI release an image model better than DALL-E 3 in 2024?

1kṀ3417

resolved Jan 1

Resolved

ALL

Resolves YES if in 2024, OpenAI release a new text-to-image model that is generally considered by OpenAI and commentators to produce better images than DALL-E 3 over a broad range of prompts.

It could be called DALL-E 4, or even DALL-E 3.5, as long as it is marketed as a distinct product - minor improvements to DALL-E 3 that aren't marketed as a separate product, and are still called "DALL-E 3" won't count.

It could even be not in the DALL-E series of models, as long as it it generally better than DALL-E 3, and not e.g. a deliberately smaller, faster model, or a model with some other purpose that is not a broad improvement over DALL-E 3 even if it's better at some specific tasks.

"Release" means accessible to end users without a waitlist or other restrictions other than payment or geography
A limited release in any one country is enough to resolve this YES.
A requirement to have already used OpenAI's other products or any other gatekeeping like that will be considered akin to a closed beta and will not count - a fresh signup to OpenAI in the right geography with a credit card in hand should be able to use it
Rate limits are OK.
Access via ChatGPT or the OpenAI API or anything else is fine.
A release by a partner such as Microsoft won't count, this has to be something marketed by OpenAI and accessed via a website or API endpoint belonging to OpenAI.

"2024" means pacific time.

Edit: a video model does count as an image model for the purposes of this market, since, it seems like if you prompt one to produce a relatively static scene, that could be used as an image model. Even a dynamic scene is kind of like an image model with a lot of output images.

But be warned: human judgement in which video frame(s) to compare to DALL-E 3 is not allowed, therefore pretty much every frame has to be superior to DALL-E 3 for a video model to be clearly superior. And remember, a model must be superior across a broad range of inputs. If a video model like Sora can only produce photorealistic output, then it is too specialised and will not be able to be superior to DALL-E 3 over a broad range of prompts.

OpenAI

DALLE3

DALLE4

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ583
2		Ṁ515
3		Ṁ338
4		Ṁ182
5		Ṁ113

2 Comments

36 Holders

60 Trades

Sort by:

is a video model also an image model

@volcanicguitar Hm.

I'd like to say yes, and just warn traders that to count as "broadly better", you'd have to be able to generate a scene that's somewhat static, such that pretty much every frame was better than a single image from DALL-E 3. Otherwise, a human picking which frame to use for comparison wouldn't be a fair comparison.

To reduce the chance of a messy resolution, I probably should just rule out video models. But I think that would be unfair since Sora looks like it actually generates some pretty high quality stuff. I would have expected it to make some serious compromises in order to be able to do video - and maybe it does and the demos we've seen aren't representative. But it would be cool if people could bet over that, and a shame to rule it out if there's a reasonably possibility it might actually be competitive with DALL-E 3.

So: ruling for now at least: video models do count.

But be warned, human judgement in which frame to compare to DALL-E 3 is not allowed, therefore pretty much every frame has to be superior to DALL-E 3. And remember, a model must be superior across a broad range of inputs. If a video model like Sora can only produce photorealistic output, then it is too specialised and will not be able to be superior to DALL-E 3 over a broad range of prompts.