Will GPT-4 be multimodal?
206
278
1.9K
resolved Mar 14
Resolved
YES

This question resolves to YES if OpenAI's GPT-4 is trained on at least 2 distinct data modalities, such as images and text. Otherwise, it resolves to NO.

If GPT-4 is not released before 2024, this question will resolve to N/A.

Clarification: This question will resolve to YES if any single model consistently called "GPT-4" by the OpenAI staff is multi-modal. For example, if there are two single-modality models: one trained only on images, and one trained only on text, that will not count.

Clarification 2: This question will resolve on the basis of all of the models that are revealed to have GPT-4 in their name within 24 hours of the first official announcement from OpenAI of GPT-4.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,018
2Ṁ874
3Ṁ839
4Ṁ781
5Ṁ515
Sort by:

Physiognomy check

sold Ṁ3 of NO

«We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while worse than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks»

from: https://openai.com/research/gpt-4

bought Ṁ400 of YES
predicted NO

I think a slightly more interesting question is whether the first non-OpenAI users of GPT-4 will be able to produce text from image or audio, or be able to produce image or audio from text. We often don't know many training details, and the modalities of training data could be ambiguous or not intuitive, such as if it's trained with a lot of transcribed audio data but no waveforms are encoded in the GPT-4 model itself.

I wouldn't rephrase this market or make a new one, but I just want to flag this.

predicted YES

https://twitter.com/_SilkeHahn/status/1634230731265196039

Update on the Heise story from the journalist responsible.
She states on Twitter that Microsoft Germany's Chief Technologist (not CTO) contacted her with a request to correct his name in the article, but no other changes. She interprets this as essentially confirmation that her statement in the article title claiming GPT-4 is multimodal is correct.

bought Ṁ10 of YES

https://github.com/microsoft/visual-chatgpt

Not quite sure how this would resolve. Bought YES to get towards 50%…

predicted NO

@MaxPayne This is not GPT-4, and this is not even a single model. That's multiple models connected together via LangChain.

bought Ṁ100 of NO

My main guess about what's going on:
The CTO mentioned that GPT-4 would be announced next week and then separately talked about all the AI services that will be/are available.

He was just emphasizing how AI APIs will allow for increasingly multimodal stuff. EG Whisper enables multimodality because it does speech-to-text well. He miight have claimed that text to video models are in the works or are about to be announced, as Sam Altman has previously mentioned, but also he might have just been talking about how video can be transcribed with Whisper. The mentioned embeddings are maybe just the latest Ada embeddings.

The focus on multimodality at the same time as the comment about GPT-4 was at some point incorrectly interpreted by a journalist as a claim that GPT-4 would be a multimodal model.

bought Ṁ20 of NO

I would be very, very surprised if GPT-4 was trained to generate videos.

predicted NO

(and released this month)

predicted YES

@NoaNabeshima Small input as a german speaker (assuming you, or whoever reading is not one):

The original quote is:
"Wir werden nächste Woche GPT-4 vorstellen, da haben wir multimodale Modelle, die noch ganz andere Möglichkeiten bieten werden – zum Beispiel Videos"

Heise's translation:
"We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos"

That translation is very literal, but also very accurate. The german quote is equally imprecise.
My interpretation is this:

"We will introduce GPT-4 next week. (At some point) we will have miltimodel models that will be capable of handling videos. We may already have multimodal models (Remember, this is Microsoft saying this, not OpenAI, so could be refering to Kosmos-1), but future ones will be more capable."

Wish there was a recording of the stream referenced in the article to confirm this. But yeah, agree with the conclusion that video is unlikely. Torn on whether GPT-4 is multimodal or not.

bought Ṁ100 of NO

Wait the article doesn't even claim GPT-4 will be multimodal:

"We will introduce GPT-4 next week, we will have multi-modal models that will offer completely different capabilities - for example, video"

bought Ṁ100 of NO

@NoaNabeshima Except in the title, sorry. Maybe I should say that the CTO doesn't claim GPT-4 will be multimodal in a direct quote.

bought Ṁ100 of NO

P(message ab multimodality was garbled from OA/MSFT to time it arrived in this news article) = 40%
P(multimodal before evidence from article) = 30%
P(multimodal if message wasn't garbled) = 100%

P(multimodal before evidence)*P(garbled) + (1-P(garbled)) = 0.3*0.4 + (1-0.4) = 0.72

predicted NO

@NoaNabeshima Probably P(garbled) should be higher?

bought Ṁ50 of YES

@NoaNabeshima was OA involved at all in the article? Idts

bought Ṁ100 of YES

Ah fast bionic fishes

bought Ṁ65 of NO

Does this market resolve based on the base model? As in, if the GPT-4 base model is text-only, but they release as well a “visual GPT-4” model that deals with images, does this market resolve positively or negatively?

predicted NO

@MatthewBarnett can you clarify this? Seems to plausibly make tens of % of difference in expectation.

predicted NO

@JacyAnthis I will add to the description that this question will resolve to YES if any single model consistently called "GPT-4" by the OpenAI staff is multi-modal. If there are two single-modality models: one trained only on images, and one trained only on text, that will not count.

sold Ṁ104 of NO

@MatthewBarnett Do they have to be released at the same time? For example, if this question was about ChatGPT, and they called visual-ChatGPT just ChatGPT, would this resolve negatively? I’m guessing so, or this question can never really resolve.

predicted NO

@BionicD0LPH1N I will resolve based on the first announcement of GPT-4, and on the basis all the models that are announced or revealed that have GPT-4 in their name within 24 hours of that first official announcement. I will update the description accordingly.