Will openAI release a multimodal model with > 2 native modalities before 2025?
17
129
330
2025
82%
chance

GPT-4 was announced on March 14, 2023. The corresponding release paper references GPT-4 as a "large multimodal model (accepting image and text inputs, emitting text outputs)".

Text and images count as one modality each. Code interpreter is considered part of the text modality. Future modalities may include

  • Audio

  • Video

  • Sensor data (Radar/Lidar)

Resolves YES if

  • Whether by full release or staggered rollout, by means of the API or the UI interface, OpenAI allows at least some portion of the general public access to a >2 multimodal model by December 1, 2025.

  • The new modalities should be "native" to the model. This means, generally, that OpenAI is accepting that specific modality as a default input in the same way they are now accepting images.

  • Negative Example: using whisper to translate audio into text and feeding this into the text input does not count.

  • Negative Example: Leveraging the text interface to interpret raw sensor data does not count, unless OpenAI has specifically announced they would support this interface. Using code interpreter to accomplish this does not count.

  • Positive Example: The model (along with text and images) accepting raw audio as an input (potentially understanding tone and inflection) would count.

Resolves NO otherwise.

Get Ṁ200 play money

More related questions