Will the next major LLM by OpenAI use a new tokenizer?
44
1kṀ1273Dec 31
77%
chance
1H
6H
1D
1W
1M
ALL
The GPT-2 model used r50k_base: vocab size = 50k
The GPT-3 model used r50k_base: vocab size = 50k
The GPT-3.5 model used cl100k_base: vocab size = 100k
The GPT-4 model used cl100k_base: vocab size = 100k
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
People are also trading
Related questions
Will OpenAI give their new LLM an anthropomorphic name?
13% chance
Will OpenAI release a tokenizer with more than 210000 tokens before 2026?
24% chance
Will OpenAI's next major LLM (after GPT-4) feature natural and convenient speech-to-speech capabilities?
80% chance
Will OpenAI's next major LLM release support video input?
37% chance
Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?
20% chance
Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?
45% chance
Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?
65% chance
OpenAI to release model weights by EOY?
83% chance
What will be true of OpenAI's best LLM by EOY 2025?
Will OpenAI's next major LLM (after GPT-4) surpass 70% accuracy on the GPQA benchmark?
75% chance