Skip to main content
MANIFOLD
Will openAI release a multimodal model with > 2 native modalities before 2025?
18
Ṁ330Ṁ5.2k
resolved May 13
Resolved
YES

GPT-4 was announced on March 14, 2023. The corresponding release paper references GPT-4 as a "large multimodal model (accepting image and text inputs, emitting text outputs)".

Text and images count as one modality each. Code interpreter is considered part of the text modality. Future modalities may include

  • Audio

  • Video

  • Sensor data (Radar/Lidar)

Resolves YES if

  • Whether by full release or staggered rollout, by means of the API or the UI interface, OpenAI allows at least some portion of the general public access to a >2 multimodal model by December 1, 2025.

  • The new modalities should be "native" to the model. This means, generally, that OpenAI is accepting that specific modality as a default input in the same way they are now accepting images.

  • Negative Example: using whisper to translate audio into text and feeding this into the text input does not count.

  • Negative Example: Leveraging the text interface to interpret raw sensor data does not count, unless OpenAI has specifically announced they would support this interface. Using code interpreter to accomplish this does not count.

  • Positive Example: The model (along with text and images) accepting raw audio as an input (potentially understanding tone and inflection) would count.

Resolves NO otherwise.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ147
2Ṁ62
3Ṁ40
4Ṁ38
5Ṁ24
Sort by:
bought Ṁ128 YES

Confirmed that GPTo released today understands audio/voice/speech natively at the model level. See last positive example in market description.