Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?

1kṀ3313

2027

79%

chance

ALL

GPT4-0314

For the locally run model, we refer to the Language Model alone, not augmented with search/RAG/function_call. It needs a minimum throughput of 4 tokens/second

Not sure what benchmarks people use in 2026, but let’s say LMSYS Arena for the moment. Will change depends on the trend.

Current SOTA:

I am not sure Phi3(3.8B) can fit on a phone. If not, the current bests are MiniCPM and Gemma 2B

ChatGPT

GPT-4

Get

1,000

to start trading!

People are also trading

By January 2026, will a language model with similar performance to GPT-4 be able to run locally on the latest iPhone?

95% chance

By January 2026, will we have a language model with similar performance to GPT-3.5 (i.e. ChatGPT as of Feb-23) that is small enough to run locally on the highest end iPhone available at the time?

93% chance

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

11% chance

Will a single model running on a single consumer GPU (<1.5k 2020 USD) outperform GPT-3 175B on all benchmarks in the original paper by 2025?

86% chance

Will we have an open-source model that is equivalent GPT-4 by end of 2025?

96% chance

Will a model as great as GPT-5 be available to the public in 2025?

99% chance

13% chance

GPT-5 level model runnable on phones by 2030?

41% chance

Will $10,000 worth of AI hardware be able to train a GPT-3 equivalent model in under 1 hour, by EOY 2027?

16% chance

By 2028 will a language model beat the Ender Dragon?

Sort by:

Gemma 2 it 9b is already higher on llmsys arena. Only 2 points though.

Upd: oops sorry was thinking that criteria is <10b, confused with different question

bought Ṁ20 NO

I'm guessing this means any consumer cellphone? E.g. if a model that fits in 32GB RAM beats GPT-4 and there's only 1-2 phones with that much RAM in 2026 (current record is 24GB), this resolves Yes.

@JoshYou yes. Any consumer cellphone

bought Ṁ10 NO

Runs at what rate? Of its token per minute does it count?

@0482 Let’s say 4 tokens/s

if they are allowed to browse the internet - then for sure. If we are talking about encoding all the knowledge. Then probably not.

@Magnus_ Great question...

How should we specify this? I am thinking that it can do RAG for everything inside the phone but no internet connection. What do you think?

Currently, a phone can hold up to 512GB. It is a lot of info, but not the whole internet.

This criterion captures the "local" aspect.

how you think?

@Magnus_ Another option is to say language model only, no local RAG

@Magnus_ I thought about it again. For a fair comparison, we should have the same standard for the local mobile LM since GPT-4 is not using any RAG/search/tools. I have updated the criterion.