What is the main reason behind GPT-4o speed improvement relative to GPT-4 base model?
Basic
22
1.4k
2029
69%
Smaller model size (hence, architecture/algorithm improvements)
40%
Something related to low-level computation efficiency (for example, optimized frameworks)
27%
More/better hardware allocated
23%
Other
15%
Better coarser-grained tokenizer

Get Ṁ600 play money
Sort by:

did they answer this for gpt-4-turbo?

"Main reason" implies only one of these resolves YES. If the coarse tokens help but aren't the central boost, I assume that option resolves NO?

@MaxHarms Correct. In very ambiguous situations, 2 options may resolve YES, but I expect this to happen in only extreme cases.

It's likely a combination of smaller model and low level optimisations (they are happening all the time, judging by open source solutions). However I find it unlikely that "open" AI will share exact numbers to determine what exactly played the biggest role.

bought Ṁ10 Something related to... YES

What do quantization count towards?

@Sss19971997 quantization itself would be "something related to low-level computation efficiency".

which base model?

@StephenMWalkerII The first publicly available GPT-4 model released on March 14, 2023.