GPT4-0314
For the locally run model, we refer to the Language Model alone, not augmented with search/RAG/function_call. It needs a minimum throughput of 4 tokens/second
Not sure what benchmarks people use in 2026, but let’s say LMSYS Arena for the moment. Will change depends on the trend.
Current SOTA:
I am not sure Phi3(3.8B) can fit on a phone. If not, the current bests are MiniCPM and Gemma 2B
Gemma 2 it 9b is already higher on llmsys arena. Only 2 points though.
Upd: oops sorry was thinking that criteria is <10b, confused with different question
I'm guessing this means any consumer cellphone? E.g. if a model that fits in 32GB RAM beats GPT-4 and there's only 1-2 phones with that much RAM in 2026 (current record is 24GB), this resolves Yes.
@Magnus_ Great question...
How should we specify this? I am thinking that it can do RAG for everything inside the phone but no internet connection. What do you think?
Currently, a phone can hold up to 512GB. It is a lot of info, but not the whole internet.
This criterion captures the "local" aspect.
how you think?
@Magnus_ I thought about it again. For a fair comparison, we should have the same standard for the local mobile LM since GPT-4 is not using any RAG/search/tools. I have updated the criterion.