
Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)
Will Llama 3-multimodal be natively mixed-multimodal? (VQ-VAE+next token prediction)
2
1kṀ200Jan 1
50%
chance
1D
1W
1M
ALL
Vision Language Models currently has two common paradigms
The first one is LLaVA, where one assembles a CLIP-like vision block with a LLM through projection.
The second approach is Gemini/LVM, where one uses a VQ-VAE to compress pictures into discrete tokens, then simply do autoregressive next token prediction. It is suspected that GPT-4o is also trained this way, which explains why it can generate images with excellent text rendering.
Note that meta has just announced Chameleon: Mixed-Modal Early-Fusion Foundation Models
Will Llama 3 multi-modal or Llama 3 vision be trained in the second approach?
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
Related questions
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
Related questions
Top 3 Multimodal Vision2Language Model by EOY 2024? (by Organization/Company)
Will Llama 4 be the best LLM in the chatbot arena?
10% chance
Will OpenAI announce a multi-modal AI capable of any input-output modality combination by end of 2025? ($1000M subsidy)
83% chance
Will Llama 4 use mixture of experts?
66% chance
Will OpenAI's next major LLM release support video input?
48% chance
Will a Mamba 7b model trained on 2 trillion tokens outperform Llama2-13B
66% chance
Will Llama-3 (or next open Meta model) be obviously good in its first-order effects on the world?
88% chance
By 2030 will we have video-to-video where an LLM can continue any video prompt in any way you like?
76% chance
Will a SOTA open-sourced LLM forecasting system make major use of quasilinguistic neural reps (QNRs) before 2027?
19% chance