Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?


For the locally run model, we refer to the Language Model alone, not augmented with search/RAG/function_call. It needs a minimum throughput of 4 tokens/second

Not sure what benchmarks people use in 2026, but let’s say LMSYS Arena for the moment. Will change depends on the trend.

Current SOTA:

I am not sure Phi3(3.8B) can fit on a phone. If not, the current bests are MiniCPM and Gemma 2B

Get Ṁ600 play money
Sort by:
bought Ṁ20 NO

I'm guessing this means any consumer cellphone? E.g. if a model that fits in 32GB RAM beats GPT-4 and there's only 1-2 phones with that much RAM in 2026 (current record is 24GB), this resolves Yes.

@JoshYou yes. Any consumer cellphone

bought Ṁ10 NO

Runs at what rate? Of its token per minute does it count?

@0482 Let’s say 4 tokens/s

if they are allowed to browse the internet - then for sure. If we are talking about encoding all the knowledge. Then probably not.

@Magnus_ Great question...

How should we specify this? I am thinking that it can do RAG for everything inside the phone but no internet connection. What do you think?

Currently, a phone can hold up to 512GB. It is a lot of info, but not the whole internet.

This criterion captures the "local" aspect.

how you think?

@Magnus_ Another option is to say language model only, no local RAG

@Magnus_ I thought about it again. For a fair comparison, we should have the same standard for the local mobile LM since GPT-4 is not using any RAG/search/tools. I have updated the criterion.

More related questions