Will the next major LLM by OpenAI use a new tokenizer?
48
Ṁ1kṀ2.4kDec 31
89%
chance
1H
6H
1D
1W
1M
ALL
The GPT-2 model used r50k_base: vocab size = 50k
The GPT-3 model used r50k_base: vocab size = 50k
The GPT-3.5 model used cl100k_base: vocab size = 100k
The GPT-4 model used cl100k_base: vocab size = 100k
This question is managed and resolved by Manifold.
Market context
Get
1,000 to start trading!
People are also trading
Will OpenAI release another open source LLM before end of 2026?
70% chance
Will OpenAI, Deepmind, or Anthropic be the next to release a frontier LLM?
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
55% chance
Will OpenAI announce AGI before 2028 conditional on it centrally being an LLM?
48% chance
Before 2029, will OpenAI provide API access to a frontier LLM with 100,000,000+ context length?
49% chance
How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)
Will any widely used LLM be pre-trained with abstract synthetic data before 2030?
72% chance
Sort by:
@firstuserhere So YES if there's a GPT-4.5/5 that uses a tokeniser not on this list, and NO if there's a GPT-4.5/5 that uses a tokeniser that is on this list?
People are also trading
Related questions
Will OpenAI release another open source LLM before end of 2026?
70% chance
Will OpenAI, Deepmind, or Anthropic be the next to release a frontier LLM?
Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?
55% chance
Will OpenAI announce AGI before 2028 conditional on it centrally being an LLM?
48% chance
Before 2029, will OpenAI provide API access to a frontier LLM with 100,000,000+ context length?
49% chance
How much time will pass between an LLM being released that beats GPT4 and the next OpenAI LLM being released? (+ANSWERS)
Will any widely used LLM be pre-trained with abstract synthetic data before 2030?
72% chance