The breakthrough needs to make LLMs better at learning from context at test time. Some examples of results that could demonstrate this breakthrough would be a model that shows less degradation than contemporaries on long-context or one that can more effectively learn from mistakes during agentic tasks.
The breakthrough needs to be major and conceptual, it can't be something as simple as models getting better at in-context learning through more scaling.
In order to count this breakthrough must apply to general LLMs, not just using continual learning to solve a narrow problem or class of problems
A paper claiming to have a really good result isn't enough (like the Titans paper). The breakthrough must be widely accepted as legitimate and there should be publicly accessible models that use it