"Humans begin using speech to pass on what they've learned within a lifetime and then immediately become superintelligent (compared to other animals)" and "AI begins using continual learning to pass on what they've learned in-context within RL and deployment and then immediately becomes superintelligent" don't analogize perfectly, but it's close. Will ASI happen less than 365 days after a frontier-ish AI company deploys better-than-nothing continual learning? N/A if ASI happens first Update 2026-05-29 (PST) (AI summary of creator comment): Continual learning is defined as models being able to learn new things at the weights level without being retrained from scratch. Key distinguishing features: Current training loops (retraining from scratch or from base model) do not qualify A rough indicator: continual learning would reduce the time between models knowing new things to under ~10 days (vs. current ~40-day release cycles) Creator will go with community consensus on whether a specific system qualifies Update 2026-05-29 (PST) (AI summary of creator comment): Continual learning does not require per-user weight modification — it can still qualify even if all users receive the same set of weights from the provider. Provider-level updates are sufficient. Update 2026-05-29 (PST) (AI summary of creator comment): The creator clarifies that solving catastrophic forgetting and converting in-context learned knowledge into weights-level knowledge are distinct concepts. The latter would require better sample efficiency and is not the same as continual learning as defined for this market.

Probably not — Manifold Markets prediction market estimates a 31% chance (13 traders, as of Jun 9, 2026).

Will ASI be achieved less than a year after continual learning?

MANIFOLD

Will ASI be achieved less than a year after continual learning?

Ṁ1kṀ706

2033

31%

chance

ALL

"Humans begin using speech to pass on what they've learned within a lifetime and then immediately become superintelligent (compared to other animals)" and "AI begins using continual learning to pass on what they've learned in-context within RL and deployment and then immediately becomes superintelligent" don't analogize perfectly, but it's close.

Will ASI happen less than 365 days after a frontier-ish AI company deploys better-than-nothing continual learning?

N/A if ASI happens first

Update 2026-05-29 (PST) (AI summary of creator comment): Continual learning is defined as models being able to learn new things at the weights level without being retrained from scratch. Key distinguishing features:
- Current training loops (retraining from scratch or from base model) do not qualify
- A rough indicator: continual learning would reduce the time between models knowing new things to under ~10 days (vs. current ~40-day release cycles)
- Creator will go with community consensus on whether a specific system qualifies

Update 2026-05-29 (PST) (AI summary of creator comment): Continual learning does not require per-user weight modification — it can still qualify even if all users receive the same set of weights from the provider. Provider-level updates are sufficient.

Update 2026-05-29 (PST) (AI summary of creator comment): The creator clarifies that solving catastrophic forgetting and converting in-context learned knowledge into weights-level knowledge are distinct concepts. The latter would require better sample efficiency and is not the same as continual learning as defined for this market.

Market context

Technical AI Timelines

AI Impacts

AI Safety

AGI

Get

1,000

to start trading!

People are also trading

Will ASI be developed 5+ years after AGI?

42% chance

Will we get ASI before 2027?

1% chance

How many years from 13/03/2025 until ASI?

14.1

Will we get ASI before 2028?

4% chance

Will we get ASI before 2029?

9% chance

Will we get ASI before 2034?

28% chance

Will AI continue to improve?

92% chance

Will we get ASI before 2030?

15% chance

Will we get ASI before 2035?

29% chance

Will we get ASI before 2031?

18% chance

Sort by:

I have the sense that in 2033, if people use the term "continual learning", they'll expect it to refer to some form of continual learning within user projects

However, unclear if this requires weight modification or not, since we don't have obvious frontier technology examples to refer to in June 2026

Generally speaking the more complex and novel the mechanism for training, the larger the gap between the perceived and theoretical performance versus its performance in practice. The rate of ground-breaking discoveries hasn't slowed much, but on a decade-by-decade timeline, has stalled or isn't distinguishable from natural variation (noise) above the baseline. What this suggests is that ASI, assuming it is reliant on continual learning under an AGI model, will be a standard step function up from AGI. If performance improvement becomes harder with time, and it appears to be the case, then even with a continually learning model, the best time-to-ASI we can expect once continual learning is mainstream, is linear. In other words the odds are better than even that it will take an average of 5-10 years after CL is fully understood and exploited.

You ever notice that biological brains can only do continual learning because they sleep? Just a shower thought.

In what way does the current training loop not match your definition of continual learning? Where do you draw the line between what's continual learning and what isn't?

@0xseraphim I'll go with whatever the consensus is. Right now companies retrain models from scratch (or at least from the base model) in order to add new data; the main feature of continual learning is that models can learn new things on the weights level without having to be retrained from scratch. Plus current model releases are ~40 days apart; continual learning would reduce the time between models knowing new things down to under ~10 days.

@Interrobang so continual learning / weight modification within a user project would not be a requirement? Provider-level RL post-training and releases are sufficient?

@0xseraphim Correct; it can still be continual learning even if everyone gets the same set of weights from the provider.

bought Ṁ20 NO

@0xseraphim The problem with standard MLP is catastrophic forgetting. lora and dora help, but they have their own problems. It is partly why mixture-of-experts has become mainstream, as has agent-collaboration frameworks.

@Interrobang If continual learning (CL) increases the speed of research, and no break throughs are made in either sample efficiency, or out-of-distribution generalization, then the answer is a solid no. Strong sample efficiency would lower training data requirements (and compute requirements, and possibly model-size requirements) translating into a weak yes. Any strong results on out-of-distribution generalization combine with even a better-than-nothing CL environment, translate to a strong 'maybe' at best.

@DavidAttenborough Hm, you're right. I was mentally conflating "solving catastrophic forgetting" with "converting in-context learned knowledge into weights-level knowledge" when the latter would require better sample efficiency.

@Interrobang thats fair. Where the rub is introduced is converting in-context learned representations to a format amiable to direct representation in network weights leads to destructive interference. One update from the context might improve performance on some task A, but modify weights critical to another task B, such that performance on task B is degraded or destroyed. It's the current problem encountered with too many LoRas, or the improvement on it, DoRa. Theres a lot of experiments and papers trying to solve this with varying degrees of success, but none of them are there yet. And while it is true sample efficiency improves learning at the network level (in the weights), it doesn't directly provide a mechanism for non-destructive weight updates. Incidentally, for the experiments exploring converting context to learned weights, sample efficiency is an underexplored metric under that precise regime, n-shot metrics in say, language models, notwithstanding. Better sample efficiency at a certain critical threshold for the network level at least implies better out-of-distribution generalization in the level above, in the autoregressive inference. Solving upstream solves downstream. It's why better tokenizers lead to better scores even when the rest of a model hasn't changed, it's about conditioning the data with useful priors that 1. smooth lots of ridges and local minima that correlate to gradient clashes, 2. which have a spectral characterization approaching blue noise. The general principle is the same, the earlier in the pipeline optimization is applied, the more general the improvement in components later in a model because the model has to do less work in weight-space on conditioning the data and extracting structures and priors. Learning at the weight level from extrapolations performed at the inference level is starting as late as possible in the pipeline, which works against this principle. It's plausible it could work anyway, which would be, as you wrote, the 'better than nothing' regime. The question is if better-than-nothing in-context-learning generalizes sufficiently under covariate shifts. A proof of that is sufficient, without going all the way, to say whether a weaker model (or ensemble of such models) is enough, by itself, to lead to AGI, which is at least assumed to lead to ASI by default, or whether it is insufficient, which would be weak but positive evidence that optimization earlier in the pipeline is the direction research has to take to cross the finish line. Man, I'm loving the market you posted more and more.

edit: To be clear the blue noise comment is an analogy that is still waiting on research to verify it, but its sound in theory. The ICL to AGI pipeline is speculative, but the entire premise my argument hinges on is "if any direction, early or late stage optimization general is most responsible for AGI, where I define AGI here as o.o.d generalization, which will be the core contributing research direction? Late or early optimization?" My argument says it will (mostly) be early optimization, while your argument is the biggest contributing factor will be late-stage optimization. It'll probably be a mix of both realistically (because most pipelines at the cutting edge optimize all stages to varying degrees), but the question remains, before the 'big break' into AGI, and then (completely assumed) ASI, what will be the defining change, a major shift in early stage optimization, or a major improvement in late stage?