Will Apple market its NPU by token/s in its September iPhone 16 announcement?
Mini
13
4.1k
Oct 1
3%
chance

When running a LLM model locally, token/s is a new metric. For example when running the open source Llama3 8B:

-a typical multi core Xeon server processor reach about 10T/s.

-RTX 2070 GPU reach 20T/s

-RTX 4070 GPU reach 40T/s

-8 Groq’s specialized LPU connected together reach 750T/s

I haven’t seen a mobile phone brand market its neural processor by token/s.

Get Ṁ600 play money
Sort by:

More related questions