Will any LLM have a context window of at least 1 million characters by the end of 2028?

350Ṁ2670

resolved May 11

Resolved

YES

ALL

Using characters instead of tokens because token size can be changed, and characters are what humans actually care about. If they advertise a context window in tokens, I'll convert it to characters at the average rate of that tokenizer on representative human text.

Something "cheaty" doesn't count, it has to be, say, at least as smart as GPT-3 on similar inputs.

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ149
2		Ṁ59
3		Ṁ24
4		Ṁ24
5		Ṁ8

People are also trading

Before 2029, will OpenAI provide API access to a frontier LLM with 100,000,000+ context length?

49% chance

Will an LLM consistently create 5x5 word squares by 2026?

84% chance

Will there be major breakthrough in LLM Continual Learning before 2026?

12% chance

Will there be any major breakthrough in LLM continual learning before 2029?

75% chance

Will the best LLM in 2027 have <1 trillion parameters?

26% chance

Will the best LLM in 2025 have <500 billion parameters?

17% chance

Will the best LLM in 2026 have <1 trillion parameters?

40% chance

Will the best LLM in 2025 have <1 trillion parameters?

42% chance

Daily LLM assistant personal usage exceeds 2 hours for >10% of users by end-2025?

36% chance

Will LLMs become a ubiquitous part of everyday life by June 2026?

Sort by:

Nice time capsule of a market... A year ago GPT-4 was 8k or some ridiculous-expensive 32k that wasn't even that good. 76% in April 2023 is not a strong vote of confidence given "end of 2028" timeline. That probability means "speculative, far in the future, but maybe somebody will get it working".

Today I put my codebase into a 1 million context LLM on my Mac to ask it questions, write code, and use tools. You can just download them and run them on any Mac. It's not some heroic feat like "76% by end of 2028" makes it sound.

resolve this….

@Hazel What LLM do you believe qualifies?

@IsaacKing Gemini 1.5.

“We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.”

1 million tokens > 1 million characters.

https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note

predictedYES

Let's keep in mind that LLM's are not purely generative in nature and don't have to be based upon a GPT or any pre-determined architecture, they are merely a probability distribution over sequences of words. So as written, this question has a wide interpretation. I would almost advocate for narrowing down the definition further to make it more interesting.

predictedYES

@PatrickDelaney It should probably be "equivalent to GPT-3 on some benchmarks", otherwise a random tree search or markov chain would qualify. (Well, a "large" markov chain)

Maybe the "Evals" repo that was introduced with GPT-4 would be a good one? openai/evals: Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks. (github.com )

I'm advancing the idea which of course helps my YES bet that this must include private LLM's not publicly disclosed LLM's, so if there is a leak or a news report about any kind of LLM with said context window, it qualifies.

Not with transformers since it scales quadratically but I'm sure somebody will train a test model using e.g. hyena operators just to test its limits.

@Mira the standard transformer* with dot product attention* scales quadratically

No, because it would be totally superfluous... No reason to waste compute like that

@jonsimon Humans have an unlimited context window. An AGI probably would too.

predictedNO

@IsaacKing would be much easier to use some kind of external knowledge/state store rather than a massive context window

predictedYES

@jonsimon No reason? How do you know that a-priori, do you know every industry? Do you know every possible architecture that might come out in 6 years? 1 million characters is 600 to 800 pages. I could imagine existing some odd esoteric application for that. If it was 100,000 or 1 million pages, increasingly less likely. But what about...government/intelligence summarization?

predictedYES

@IsaacKing I don't want to argue too harshly because this is all metaphors we're talking about but...humans do not have an unlimited context window analogous to an autoregressive GPT's context window...right? You would have to be able to, "remember," e.g. "tokenize," every conversation you ever had in detail, e.g. including every word to fulfill the condition, "*unlimited* context window." I think you might mean...humans can have a context window that stretches back selectively for years if not decades...something like that? To me, "unlimited," means just a massive billions or trillions of words long corpus including everything one ever heard, read, wrote or spoke. I hardly can remember what I did 10 minutes ago.

Relevant: "GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis."