Will OpenAI release true multimodal image generation for GPT-4.5 before 2026?
18
100Ṁ1350
Dec 31
17%
chance

Resolves as YES if there is strong evidence that OpenAI has released true multimodal image generation features for GPT-4.5 before January 1st 2026.

The GPT-4.5 model needs to be used to produce the image tokens in order for this question to resolve as YES. Using the GPT-4.5 model to produce an intermediate text representation does not qualify. Dispatching to a surrogate model such as GPT-4o also does not qualify.

Get
Ṁ1,000
to start trading!
Sort by:

Does 4.5 not already have image output?

@Bayesian not that I'm aware

@Bayesian I think it has it but it's not been released

@MalachiteEagle I’ve been told you can use image output while talking to gpt4.5

opened a Ṁ1,000 NO at 50% order

@Bayesian

https://openai.com/index/introducing-4o-image-generation/

OpenAI just released 4o multimodal image generation (distinct from the previous Dalle-3 image gen) yesterday and isn't even done rolling it out.

https://openai.com/index/hello-gpt-4o/

Since 4o is innately multimodal (the o stands for omni) they've had some form of it for many months and just released it now to make sure google gets no hype.

---

https://x.com/sama/status/1889755723078443244

On the other hand 4.5 (notice the lack of o) is not a multimodal model. They probably could tack it on if they tried hard and spent a bunch of time and effort doing so, but it feels very unlikely when 5 (which will also be multimodal, and who knows if it will have a o because OpenAI is trash at naming) should be out in a few months and will natively support it.

@lemon10 I think you've got a few things confused there. GPT-4.5 was probably pre-trained as a multimodal model, just like gpt-4o. They just haven't released the image generation features yet. OpenAI haven't said yet either way if it supports output image tokens, but the likelihood that it does is pretty high. Furthermore, it appears that the GPT-5 release will use the GPT-4.5 base model initially. But again, they haven't been all that clear on this point.

@Bayesian uh huh

@Bayesian Ok, now I'm confused. Is it possible that GPT-4.5 is calling the image generation capabilities from the GPT-4o model?

@Bayesian seems very weird that they haven't said anything about this

@Bayesian they said they released gpt-4o multimodal generation and silently released gpt-4.5 image generation as well???

@Bayesian this is the release blog: https://openai.com/index/introducing-4o-image-generation/

I'm so confused now. It doesn't say anything about 4.5

@MalachiteEagle

Models as early as bing called Dall-e when you asked them for an image, plenty of precedent for using different models for image generation and text generation

Anyway you should be able to check by seeing if you get very different image results from 4o and 4.5

@SaviorofPlant yeah that's what I thought initially when it generated the cat, but @Bayesian is right there are people on twitter claiming that 4.5 image generation is better...

I'm 80-90% sure this is dispatching to gpt-4o image generation

@MalachiteEagle a multimodal 4.5 would be the single biggest image generation model ever created and it wouldn't even be close, you'd expect some pretty stunning quality

@SaviorofPlant Yeah I agree, it would be a big deal if they released that

@SaviorofPlant One thing I can think might be happening is that the LLM is tasked with generating a more complex prompt for the image generation. So maybe if you give a simple prompt to GPT-4o vs GPT-4.5, the expanded prompt may be of better quality when it's coming from GPT-4.5. But I have doubts that this is what's happening.

@Bayesian either way this does not count as strong evidence as per the question criteria

But with image-to-image instruction following how is that supposed to work? Does GPT-4.5 also dispatch the input image to gpt-4o for these sorts of tasks? Very confusing

@MalachiteEagle i don't know if the criteria makes it clear that GPT-4.5 needs to be the model doing the image generation

Yeah ig i got deceived by openai but i agree model likely switches to 4o when generating image output. They can swap them no problem. I might be misunderstanding what ur saying but with the image output model there’s no like subprompting the image step by the text model or something, the omni model takes user prompt and creates the image directly, and i strongly suspect 4.5 just doesnt participate in the interaction when they detect ur asking for an image

@MalachiteEagle

I could easily be wrong about some stuff. Hard to say given how tight lipped they are about a bunch of the technical details.

Re: Using 4.5 for 5

https://i.imgur.com/GqZTv93.png

It sounds like 5 is going to work pretty differently. I think they are going to do a whole new training run. But shrugs, I don't work at OpenAI.
---
Yeah, 4.5 almost certainly just sends the instructions to 4o the same way that it used to just send the instructions to Dalle.

@lemon10 ah ha! I hadn't seen that tweet. That's interesting for sure

link in case others are interested: https://x.com/swyx/status/1889857673379852493

@MalachiteEagle oh god are they doing some kind of cursed model merge. didnt expect that

@SaviorofPlant one explanation that fits what they seem to have said so far, would be for them to have multiple unified models of different sizes. Then a router model to switch between different unified model sizes dynamically.

And the simplest way for them to achieve that without doing a new pretraining run is to take the base 4.5 model, add inference time scaling in a unified way (similar to sonnet 3.7). And then distill this down to multiple smaller sizes. Then add a router model.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules