Will the top chatbot in 2025 "think" before responding to a difficult prompt?

This market resolves YES if by January 1, 2025, the most popular chatbot creates "thoughts" when given difficult prompts, a la Chain-of-Thought. The thoughts must be separate from its final response.

By "thoughts," I mean data that the chatbot creates and maintains primarily for the purpose of improving its own responses, created and used across multiple forward passes, which occur after the entire prompt is provided to the model but before it starts writing a response. The thoughts must reason about how to solve the problem; for instance, it would not count if the model is simply prompted to "write a query to a search engine that will help answer this prompt."

The chatbot's thoughts should be separate from the "primary output" of the model; however, it's acceptable if the thoughts are viewable by clicking a button in the UI after the primary output has been presented. The thoughts can be human-interpretable or not (bet on which one in the companion market below).

Specifically, I will resolve YES if I believe that the top LLM "thinks" before answering at least 3 of the following 5 prompts:

  • Write a detective story. At the very end, the detective should use information scattered throughout the story to solve the mystery in a clever way.

  • Write a palindromic sentence. An example is "A man, a plan, a canal: Panama." The sentence should be at least 10 words long and must contain the word "lemon."

  • Write an original stand-up comedy bit that all leads up to a terrible, complicated pun.

  • Write a sonnet that doesn't use the letters A, E, or I.

  • Write a "code golf" Python script, using as few characters as possible to calculate and print the next stage of a 5x5 board of Conway's Game of Life. The initial board is represented by a 25-character string of ones and zeros, like "0001011100011100101011001", assigned to the variable x.

I will not bet in this market.

Companion market on whether these thoughts will be human-interpretable:

Get Ṁ600 play money
Sort by:

@traders An interesting segment of Lex Fridman's conversation with Sam Altman:

Lex Fridman

There’s just some questions I would love to ask, your intuition about what’s GPT able to do and not. So it’s allocating approximately the same amount of compute for each token it generates. Is there room there in this kind of approach to slower thinking, sequential thinking?

Sam Altman


I think there will be a new paradigm for that kind of thinking.

Lex Fridman


Will it be similar architecturally as what we’re seeing now with LLMs? Is it a layer on top of LLMs?

Sam Altman


I can imagine many ways to implement that. I think that’s less important than the question you were getting at, which is, do we need a way to do a slower kind of thinking, where the answer doesn’t have to get… I guess spiritually you could say that you want an AI to be able to think harder about a harder problem and answer more quickly about an easier problem. And I think that will be important.

Would LSTM model with many internal loops before starting an output count as thinking for the purpose of this question?

@MikhailDoroshenko Yes, as long as the cell state gets used in multiple forward passes, in which the entire prompt has been provided to the model but it hasn't started writing a response yet, that would count.

I realize now that "stored outside of the model's own weights" is confusing in this case, so I'll change that wording to what I wrote in this comment.

bought Ṁ5 of YES

Determining the most popular chatbot might be tricky - why not consider top 3 or something?

predicts YES

karpathy hints that this will probably happen in his recent videos on youtube

@Soli Yeah, right now it's pretty clearly ChatGPT, but it might not always be that way. I'll go off of vibes first (if it seems very obvious which is the "top chatbot"), then daily active users if that data is public and recent, then Google Trends as my last fallback

More related questions