πŸ• Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" in 2023?
Basic
39
αΉ€3005
resolved Jan 10
Resolved
NO

Preface:

Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description

Google Big Bench Lite

https://arxiv.org/pdf/2206.04615.pdf

Big Bench was published in June 2022, as a collaborative effort between Google, OpenAI and 132 other institutions to come up with a way to characterize Large Language Model (LLM) capabilities and measure them.

The idea behind Big Bench is that it's a constantly evolving bench mark, meant to measure, "tasks that are believed to be beyond the capabilities of current language models."

While Big Bench doesn't appear to easily publish an aggregate score of all groups of measurements at ths time, they do publish a lite version of a broad array of tasks, including:

auto_debugging, bbq_lite_json, code_line_description, conceptual_combinations, conlang_translation, emoji_movie, formal_fallacies_syllogisms_negation, hindu_knowledge, known_unknowns, language_identification, linguistics_puzzles, logic_grid_puzzle, logical_deduction, misconceptions_russian, novel_concepts, operators, parsinlu_reading_comprehension, play_dialog_same_or_different, repeat_copy_logic, strange_stories, strategyqa, symbol_interpretation, vitaminc_fact_verification, winowhy

There's about 20 or so difficult tasks, so it's kind of like the Dow Jones of LLM.

Market Resolution Criteria:

https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/results

Specifically : https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/results/plot_BIG-bench_lite_aggregate.pdf

From the above chart, as of the time of creating this market, the highest score appears to be PaLM 2-Shot at about 43.

If any BigBench Lite Submission Gets an Aggregate Normalized Performance of 60 or higher by end of 2023, this resolves as YES, otherwise NO.

  • Mar 23, 10:24pm: Will A.I. Achieve Significantly Higher Performance Over a "Set of General Conceptual Skills" in 2023? β†’ Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" in 2023?

Get
αΉ€1,000
and
S3.00
Sort by:

2024 version of this market: https://manifold.markets/PatrickDelaney/-will-ai-achieve-significantly-high

Last commit I saw on this leaderboard was June 2023. Please message me if I'm missing something or if they are tracking BigBench elsewhere. Otherwise, resolving NO.

Β© Manifold Markets, Inc.β€’Terms + Mana-only Termsβ€’Privacyβ€’Rules