Will Apple market its NPU by token/s in its September iPhone 16 announcement?
7
52
240
Oct 1
60%
chance

When running a LLM model locally, token/s is a new metric. For example when running the open source Llama3 8B:

-a typical multi core Xeon server processor reach about 10T/s.

-RTX 2070 GPU reach 20T/s

-RTX 4070 GPU reach 40T/s

-8 Groq’s specialized LPU connected together reach 750T/s

I haven’t seen a mobile phone brand market its neural processor by token/s.

Get Ṁ600 play money
Sort by:

More related questions