Will ToT overtake CoT as a framework for language model inference in terms of popularity by the end of 2023?

150Ṁ283

Dec 31

24%

chance

ALL

https://arxiv.org/pdf/2305.10601.pdf

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.

-Ideas on how to resolve this are welcome (mention in new high-profile papers, implementations, etc are what I currently will base it on but a more stringent measure is preferred)

GitHub - ysymyth/tree-of-thought-llm: Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models - GitHub - ysymyth/tree-of-thought-llm: Tree of Thoughts: Deliberate Problem Solving with Large Language Models

New Year's Resolutions 2024

Get

1,000

to start trading!

People are also trading

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

84% chance

Will Transformer based architectures still be SOTA for language modelling by 2026?

97% chance

13% chance

Will a standardized category theory language for ML models emerge by end of 2025?

3% chance

Will any 10 trillion+ parameter language model that follows instructions be released to the public before 2026?

10% chance

Will all of the publicly accessible parts of heavengames.com/aok.heavengames.com become part of a large language model like Claude or GPT by 2025?

59% chance

Best available language model from an OpenAI competitor by 2026

76% chance

By the start of 2026, will I still think that transformers are the main architecture for tasks related to natural language processing?

90% chance

Will "Language Models Model Us" make the top fifty posts in LessWrong's 2024 Annual Review?

14% chance

Will "The case for more ambitious language model evals" make the top fifty posts in LessWrong's 2024 Annual Review?

Sort by:

@ChristianLarsen Can you please resolve this?

Popularity is defined how here? Poll of AI researchers, mentions in papers, Poll of the general population, Google Trends? You mentioned some ideas you had, but I feel like you should come up with something before making the market so earlier betters don't get screwed over.

I feel like it makes the most sense to count mentions in arxiv papers? But I don't really know

predictedYES

@ShadowyZephyr Agreed, and for future betters: mentions in arxiv papers is the simple metric I will base it on and from this metric you can place your no Goodhart assumed bets.

predictedNO

@ChristianLarsen What exact search terms will you use?

predictedYES

@Hedgehog mention of tree of thought, tree of thoughts, tree-of-thoughts etc all that may reasonably be associated as exact term in CS bracket of search

predictedNO

@ChristianLarsen I don’t understand. “Bracket of search”?

predictedNO

@Hedgehog And compared to what?

predictedYES

Hi @Hedgehog , within the Computer Science related bracket of the search in arxiv, or category of search in an alternate wording, this is to try to exclude non-AI related tree of thought language mentions. And it will be compared to the same simple methodology of measurement against CoT (and their variant wordings), searched for in the same bracket in the same way - hope this clears this up but feel free to ask more questions (it is hard to measure popularity in a general sense).