@teortaxesTex' DeepSeek V4 predictions thread

1.9kṀ2324

Jan 1

37%

>=1.5T parameters

58%

>=52B active parameters

61%

>=25T pretraining tokens

51%

uses some non-AdamW optimizer

35%

DS-MoE with adaptative expert count

41%

intra-expert communication

51%

>=512 experts

52%

>=16 active experts

59%

>= 2 shared experts

72%

Some variation of NSA (Native Sparse Attention)

47%

1M+ Context

22%

Gemini 2.5 Pro tier or higher on FictionBench (90.6%+ at 192k)

15%

>= 44% on Humanity's Last Exam (text only) at scale.com leaderboard

29%

>= 73% on SWE-Bench Verified (according to epoch.ai)

21%

>= 60% on BrowseComp (https://www.kaggle.com/benchmarks/openai/browsecomp)

35%

>= 50% on TerminalBench (https://www.tbench.ai/leaderboard)

38%

Some image input (multimodality)

20%

DeepSeek reports some results with a full-blown deep research agent, and emphasizes that this is the intended use-mode

Teortaxes gave some point estimates. These are not as amenable to prediction market forecasting so I turned them into over/under forecasts. I may add forecasts from other commenters in the thread later on, so these may not only be forecasts by Teo

See post for more (including forecasts I wasn't able to turn into market options):

Unranked

Get

1,000

to start trading!

People are also trading

Will DeepSeek V3.2 get >145 on the Epoch Capabilities Index (ECI)?

38% chance

Did DeepSeek receive unannounced assistance from OpenAI in the creation of their v3 model?

8% chance

When will Deepseek V4 be released?

3/8/26

Will DeepSeek R2 be open source?

93% chance

will deepseek-v4 destroy all other models?

15% chance

R2 (DeepSeek) release date

Will DeepSeek's next reasoning model be open-sourced?

83% chance

will DeepSeek become a closed AI lab by EOY?

13% chance

Did DeepSeek lie about the GPU compute budget they used in the training of v3?

5% chance

V4 (DeepSeek) release date

8 Comments

8 Holders

73 Trades

Sort by:

Arguments for why DSA is not a variant of NSA:

1. The "Compression vs. Selection" Argument (Technical)

NSA (Native Sparse Attention) is fundamentally a compression technique. Its defining characteristic is compressing blocks of Keys/Values into coarse-grained summary vectors to reduce memory footprint.
DSA (DeepSeek Sparse Attention) does not compress the KV cache. It maintains full-resolution tokens but uses sparse retrieval (top-k or similar) to select which ones to attend to.
Conclusion: A mechanism that summarizes data (lossy) is not a "variant" of a mechanism that filters data (lossless). They operate on opposing principles: NSA reduces the size of the cache; DSA reduces the compute over the cache.

2. The "Summary vs. Speed-Reading" Argument (Non-Technical)

NSA is like reading a summary of a book. You lose the specific words but get the gist significantly faster and with less memory.
DSA is like speed-reading. You skip many pages, but when you do stop to read a page, you read every single word exactly as it was written.
Conclusion: You wouldn't call "skipping pages" a variant of "writing a summary." One preserves the original text (DSA), and the other fundamentally alters it (NSA). If DSA doesn't compress the text, it cannot be NSA.

Would Deepseek 3.2 DSA (derpseek sparse attention) in v4 resolve as “some variant of NSA” for the purposes of this market?

@ookina_inu hmmm dunno the details enough to evaluate this. i'd default to asking teo maybe. if you know the details of both DSA and NSA and have an opinion one way or another lmk

@Bayesian Gotcha. I honestly think this could go either way. Seems sufficiently different from NSA to not literally be NSA, but plausibly could fit in “some variant of.” Will update if I form a stronger opinion