What will GPT-5's context size be? (2025)
106
6.1kṀ66k
resolved Aug 7
61%73%
512k
39%20%
256k
0.0%
0
0.0%
2k
1.4%
4k
0.0%
8k
0.0%
16k
0.0%
32k
0.0%
64k
0.2%
128k
3%
1024k
1.3%
2048k
1.4%
4096k

When OpenAI announces GPT-5, this market resolves to the largest context size measured in tokens that they announce support for.

GPT-3: 2048 tokens

GPT-3.5: 4096

GPT-4: 8k, 32k

GPT-5: ???

Anthropic's Claude announced a 100k variant, there are rumors of upcoming 1 million context size models, and surely OpenAI would want the most impressive-sounding model on release.

In the unexpected case they don't mention a specific context size or their architecture is changed so fixed context sizes no longer make sense, I'll wait until I have access and test its recall using very large documents.

If the largest context size isn't on this table, then this market resolves to a weighting of the surrounding entries. k is a multiplier of size 1024. GPT-4 would resolve "32k". Claude would resolve "log2(100k) = 16.61, so 2^16 = 64k would get weight 39% and 2^17 = 128k would get weight 61%".

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ1,695
2Ṁ1,342
3Ṁ911
4Ṁ458
5Ṁ456
Sort by:

@mods will this resolve percentage (64% 512k) or closest (100% 512k)?

@Fynn

If the largest context size isn't on this table, then this market resolves to a weighting of the surrounding entries. k is a multiplier of size 1024. GPT-4 would resolve "32k". Claude would resolve "log2(100k) = 16.61, so 2^16 = 64k would get weight 39% and 2^17 = 128k would get weight 61%".

not an official ruling but I'm pretty sure this is saying percentage (plus anecdotally other Mira MC numeric markets have almost always followed a similar system)

@Ziddletwix I agree with that and am making it official. If I screwed up let me know!

I get log2(400000) = 18.6096 rounds to 18.61; 61% on 512k and 39% on 256k.

Note some units inconsistency -- previous models have had power of 2 contexts, and Mira's description makes it clear that 512k = 2^19 exactly in this context. However, the GPT-5 context length is 400000, not 400*1024.

@EvanDaniel 400,000 value from here:
https://platform.openai.com/docs/models/gpt-5

Front page says 400k, but I think the 400000 value is correct. If it's actually 400*1024 then the 64/36 split is the correct resolution.

bought Ṁ450 YES

looks like gpt-5-nano has 1M, and surely bigger models won't have a longer context length

if anyone wants to bet against this ping me

bought Ṁ20 YES

non-GDM companies have been surprisingly slow to adopt long context. I think it will happen soon-ish but it's not a top priority.

bought Ṁ1,000 NO

Wild that 4k is still at 3%.

reposted

Google Gemini is 2 million tokens already, with tests up to 10 million. If OpenAI competes with Google, I might have needed more levels on my scale...

It resolves 100% 4096k(4 million) no matter how larger the final context is, since I can't add options.

@Gen Is it possible to admin add levels to this? 8192k and 16384k would be nice to have. Anything larger would be basically infinite. The context sizes grew faster than I expected.

There's now a standard test suite for the kind of recall test I was thinking of doing: How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org

Leaving it as a comment here so I can remember to find it again in 2 years, if it's needed.

This should be a higher/lower market with a log scale IMO

@ShadowyZephyr This is an intentional choice because it allows higher leverage if you have a strong opinion on a narrow numerical range.

@Mira I heard from other people that the math of the multi-choice markets is not good for compensating people who make correct bets early on.

@ShadowyZephyr I would disagree with them, but I usually don't debate people. You can bet on @firstuserhere 's binary markets if you like, since I stole his market idea for this.

By testing recall, how would that work for an RNN, having theoretically infinite context

@dmayhem93 It would resolve to the largest entry if it can pass the test at any size without errors, in a single API call.

If it has an increasing error rate like RNNs often do, I'll resolve to the highest size I get at least 50% successful recall.

It will be a simple "locate a matching entry" task, so even if its performance degrades for more complex reasoning it's likely to be able to pass as having a high context size.

© Manifold Markets, Inc.TermsPrivacy