At the beginning of 2028, will LLMs still make egregious common-sensical errors? | Manifold

At the beginning of 2028, will LLMs still make egregious common-sensical errors?

69

1kṀ9395

2028

48%

chance

1H

6H

1D

1W

1M

ALL

A duplicate of /ScottAlexander/in-2028-will-gary-marcus-still-be-a, with the ban on "bizarre hacking like tricks" removed and clearer resolution criteria.

This market resolves based on the behavior of all leading chatbots at the beginning of 2028. (Only ones that can actually be tested.)

Resolves YES if people can find three extremely obvious questions, that an average human teenager could certainly answer, which any leading chatbot still fails at at least half the time when asked.

Only the LLM portion of the chatbot is being tested here. Image-recognition and generation capabilities are not.

Scott Alexander's 5 year predictions

Get

1,000

to start trading!

People are also trading

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will the leading LLM at the beginning of 2026 still be subject to the reversal curse?

Will LLMs be the best reasoning models on these dates?

Will there be major breakthrough in LLM Continual Learning before 2026?

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Will we get a new LLM paradigm by EOY?

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

Will we have a popular LLM fine-tuned on people's personal texts by June 1, 2026?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Sort by:

will the llm be allowed to output Chain Of Thought? ie, "answer and nothing else" type responses it fails on very basic stuff and likely will for some time.

However, if it's allowed to do chain of thought (provide step by step thinking), it's reasoning skills 10x

As I mentioned in the other market, the magickarp token parsing bug is well understood and is orthogonal to llm reasoning capabilities. Whether the tokenizer is improved and how it is improved I don't think will make a huge impact except for a niche class of prompts.

sold Ṁ85 YES

@gpt_news_headlines CoT is fine

what about prompt hacking? Like the question is simple, but prefaced with a weird string that is necessary to confuse the model.

predictedYES

@Jono3h That's fine

People are also trading

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will the leading LLM at the beginning of 2026 still be subject to the reversal curse?

Will LLMs be the best reasoning models on these dates?

Will there be major breakthrough in LLM Continual Learning before 2026?

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Will we get a new LLM paradigm by EOY?

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

Will we have a popular LLM fine-tuned on people's personal texts by June 1, 2026?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Related questions

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will the leading LLM at the beginning of 2026 still be subject to the reversal curse?

Will LLMs be the best reasoning models on these dates?

Will there be major breakthrough in LLM Continual Learning before 2026?

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

Will we get a new LLM paradigm by EOY?

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

Will we have a popular LLM fine-tuned on people's personal texts by June 1, 2026?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

© Manifold Markets, Inc.•Terms•Privacy