Will the leading LLM at the beginning of 2026 still be subject to the reversal curse? | Manifold

Will the leading LLM at the beginning of 2026 still be subject to the reversal curse?

47

1kṀ2795

Jan 2

46%

chance

1H

6H

1D

1W

1M

ALL

This paper shows that current LLMs that are fine-tuned on f(x) = y will often fail to generalize to f-inverse(y) = x. Gary Marcus seems to think this is a fundamental problem in the current approach to AI.

I tested this myself and can confirm that ChatGPT has this problem.

At the beginning of 2026 I'll try something similar with the leading language model of the time. (Not the fine-tuning, just testing facts in its main training run via its public interface.) If there's at least one example where it consistently gets that f(x) = y and consistently does not get that f-inverse(y) = x, this resolves YES. If I can't find such an example, it resolves NO.

It's not allowed to search the internet or write code, it has to be the model itself doing the reasoning in both directions. If the leading model doesn't allow me to avoid those, I'll use the next-best one. It is allowed to use CoT however.

If the best general purpose AI is not longer an LLM at all, this resolves N/A.

Get

1,000

to start trading!

People are also trading

Will we get a new LLM paradigm by EOY?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will an LLM that someone is trying to shut down stop or avoid that in some way before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Will LLMs become a ubiquitous part of everyday life by June 2026?

At the beginning of 2028, will LLMs still make egregious common-sensical errors?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will LLMs Daydream by EOY 2026?

Will the most advanced LLM stop being from a US-based company any time before 2030?

Sort by:

4o gets the Mary Lee Pfeiffer example correct now, even without searching the web.

Does this only apply to leading large language models? I.e., if other architectures for SOTA general purpose AI appear that are no longer considered language models (perhaps because that's no longer their primary training task, or because they're no longer Transformer based), would you check those models rather than the best LLMs?

@Vergissfunktor I'll resolve N/A in that case.

@IsaacKing I assume reasoning models like o3 will count as LLMs?

People are also trading

Will we get a new LLM paradigm by EOY?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will an LLM that someone is trying to shut down stop or avoid that in some way before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Will LLMs become a ubiquitous part of everyday life by June 2026?

At the beginning of 2028, will LLMs still make egregious common-sensical errors?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will LLMs Daydream by EOY 2026?

Will the most advanced LLM stop being from a US-based company any time before 2030?

Related questions

Will we get a new LLM paradigm by EOY?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will an LLM that someone is trying to shut down stop or avoid that in some way before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Will LLMs become a ubiquitous part of everyday life by June 2026?

At the beginning of 2028, will LLMs still make egregious common-sensical errors?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will LLMs Daydream by EOY 2026?

Will the most advanced LLM stop being from a US-based company any time before 2030?

© Manifold Markets, Inc.•Terms•Privacy