Will we see improvements in the TruthfulQA LLM benchmark in 2024? | Manifold

Will we see improvements in the TruthfulQA LLM benchmark in 2024?

3

140Ṁ71

resolved Jan 3

Resolved

N/A

1H

6H

1D

1W

1M

ALL

Daron Acemoglu wrote an article with a series of vague AI predictions for 2024 https://web.archive.org/web/20240110122026/https://www.wired.com/story/get-ready-for-the-great-ai-disappointment/.

One of which is: "More and more evidence will emerge that generative AI and large language models provide false information and are prone to hallucination—where an AI simply makes stuff up, and gets it wrong. Hopes of a quick fix to the hallucination problem via supervised learning, where these models are taught to stay away from questionable sources or statements, will prove optimistic at best. Because the architecture of these models is based on predicting the next word or words in a sequence, it will prove exceedingly difficult to have the predictions be anchored to known truths."

We have a benchmark with truthfulness of questions called TruthfulQA. The highest scoring model in 2023 was GPT-4 at 0.59. Will we see any improvement in this benchmark in 2024?

This is the best link I could find with different models run on the TruthfulQA benchmark, but am open to other sources if they exist https://paperswithcode.com/sota/question-answering-on-truthfulqa

Get

1,000

to start trading!

People are also trading

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Which AI companies will release a SoTA LLM on AidanBench in 2025?

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will a publicly-available LLM achieve gold on IMO before 2026?

What will be true of OpenAI's best LLM by EOY 2025?

Related questions

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Which AI companies will release a SoTA LLM on AidanBench in 2025?

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will a publicly-available LLM achieve gold on IMO before 2026?

What will be true of OpenAI's best LLM by EOY 2025?

© Manifold Markets, Inc.•Terms•Privacy