Will Transformer-Based LLMs Make Up ≥75% of Parameters in the Top General AI by 2030?
2
400Ṁ35
2029
41%
chance

As of December 31, 2029,will a large language model (LLM)—defined as a transformer-based, next-token prediction model—comprise at least 75% of the activation parameter count of the most capable, publicly-known general-purpose AI system?

Definitions:

  • LLM: A model whose main pre-training objective is next-token prediction and whose architecture is based primarily on transformers (including dense or sparse, MoE, or similar variants).

  • Activation parameters: The total number of trainable weights that are loaded in memory during a maximum-capability inference pass. For MoE models, count the union of all experts that could be active in any inference pass (not just the average active subset).

  • ≥ 75% rule: If one or more LLMs, combined, comprise at least 75% of all activation parameters (across all neural modules, including vision, planning, and others), the criterion is met.

  • Most capable general-purpose AI: The system, as of December 31, 2029, that demonstrates the highest publicly documented cross-domain performance (as measured by recognized AGI or multitask benchmarks) or is acknowledged as top-tier by a consensus.

  • Backbone: The neural component(s) that provide broad reasoning and general knowledge. Symbolic planners or retrieval databases without trainable weights are not counted.

  • Publicly-known: The system must be openly released or credibly leaked with reproducible technical details, such as model card, parameter count, architecture, or benchmark results.

Edge-case clarifications:

  • Mixture-of-Experts (MoE) LLMs: All possible experts count toward the parameter total, even if only a subset are active per token.

  • Retrieval-Augmented Generation (RAG) or external databases: Non-parametric resources (e.g., vector DBs) are ignored for parameter counting; only neural weights matter.

  • Controller LLM plus a non-LLM core (e.g., physics simulator): If the non-LLM neural weights exceed 25%, the criterion is not met.

  • Systems distilled from an LLM into a non-transformer architecture (e.g., Mamba, RWKV): Does not count, even if originally based on LLMs.

  • Neuro-symbolic or hybrid systems: Only count neural parameters. If LLMs make up less than 75%, the answer is “No.”

  • Multiple LLM agents: Combine all LLM weights for the total.

  • Quantized or adapted LLMs: Count the original trainable weights, not their precision.

  • Leaked systems without parameter evidence: If parameter count cannot be established, the answer is “No” (burden of proof on “Yes”).

Get
Ṁ1,000
to start trading!
Sort by:

Super unclear and ambiguous market, here are some clarifying questions:

  • we are comparing the internal composition of modules within a SINGLE system, not comparing multiple systems to each other, right?

  • what counts as a general AI system? For example YouTube’s collaborative filtering system for video recommendation must have at least (number of users) times (number of videos) times (dimensionality) trainable parameters for example, which dwarfs current LLMs by several orders of magnitude

  • Sometimes the definition of “transformer-based LLM” isn’t so clear. For illustration, Gemma-3 has a transformer-based text LLM and a transformer-based image LLM that’s also trained to perform masked token prediction using image tokens, though these aren’t inferred at runtime. What do you consider the composition of this model to be? If this were the largest model by market resolution, how would you score it?

  • you define “backbone” but do not use it, how is it relevant to this question?

  • what counts as transformer-based? if descendent methods in the literature count, would you have considered transformers to be “key/value-network-based”, for example? does flash attention count as transformer-based? is a transformer network “ResNet-based”?

© Manifold Markets, Inc.TermsPrivacy