Will the next major LLM by OpenAI use a new tokenizer?

1kṀ1273

Dec 31

77%

chance

ALL

The GPT-2 model used r50k_base: vocab size = 50k
The GPT-3 model used r50k_base: vocab size = 50k
The GPT-3.5 model used cl100k_base: vocab size = 100k
The GPT-4 model used cl100k_base: vocab size = 100k

Get

1,000

to start trading!

People are also trading

Will OpenAI release a tokenizer with more than 210000 tokens before 2026?

24% chance

Will OpenAI's next major LLM (after GPT-4) feature natural and convenient speech-to-speech capabilities?

81% chance

Will OpenAI's next major LLM release support video input?

37% chance

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

20% chance

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

59% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

99% chance

What will be true of OpenAI's best LLM by EOY 2025?

Will OpenAI release another open source LLM before end of 2026?

77% chance

When will OpenAI release their next open-weight LLM model?

8/14/27

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

Sort by:

4o uses a different tokenizer (eg "gumdrop")

https://platform.openai.com/tokenizer

Is 4o "major"?

bought Ṁ50 YES

What if there are significantly more new tokens, e.g. representing images or audio, but the tokens representing text are pretty much unchanged?

@firstuserhere So YES if there's a GPT-4.5/5 that uses a tokeniser not on this list, and NO if there's a GPT-4.5/5 that uses a tokeniser that is on this list?

@chrisjbillington correct

Do you consider GPT-4-turbo to be a new iteration? What do you quantify as "next major LLM"

@GiftedGummyBee Bumping

@oh No, GPT-4 turbo is part of the same family, does not qualify as the next major LLM release

People are also trading

Will OpenAI release a tokenizer with more than 210000 tokens before 2026?

24% chance

Will OpenAI's next major LLM (after GPT-4) feature natural and convenient speech-to-speech capabilities?

81% chance

Will OpenAI's next major LLM release support video input?

37% chance

Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?

20% chance

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

59% chance

Will OpenAI's next major LLM (after GPT-4) achieve over 50% resolution rate on the SWE-bench benchmark?

99% chance

What will be true of OpenAI's best LLM by EOY 2025?

Will OpenAI release another open source LLM before end of 2026?

77% chance

When will OpenAI release their next open-weight LLM model?

8/14/27

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

50% chance

People are also trading

People are also trading

Related questions