🐕 Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" in 2023?
Please read the preface for this type of market and other similar third-party validated AI markets here.

Third-Party Validated, Predictive Markets: AI Theme

Market Description

Google Big Bench Lite


Big Bench was published in June 2022, as a collaborative effort between Google, OpenAI and 132 other institutions to come up with a way to characterize Large Language Model (LLM) capabilities and measure them.

The idea behind Big Bench is that it's a constantly evolving bench mark, meant to measure, "tasks that are believed to be beyond the capabilities of current language models."

While Big Bench doesn't appear to easily publish an aggregate score of all groups of measurements at ths time, they do publish a lite version of a broad array of tasks, including:

auto_debugging, bbq_lite_json, code_line_description, conceptual_combinations, conlang_translation, emoji_movie, formal_fallacies_syllogisms_negation, hindu_knowledge, known_unknowns, language_identification, linguistics_puzzles, logic_grid_puzzle, logical_deduction, misconceptions_russian, novel_concepts, operators, parsinlu_reading_comprehension, play_dialog_same_or_different, repeat_copy_logic, strange_stories, strategyqa, symbol_interpretation, vitaminc_fact_verification, winowhy

There's about 20 or so difficult tasks, so it's kind of like the Dow Jones of LLM.

Market Resolution Criteria:


Specifically : https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/results/plot_BIG-bench_lite_aggregate.pdf

From the above chart, as of the time of creating this market, the highest score appears to be PaLM 2-Shot at about 43.

If any BigBench Lite Submission Gets an Aggregate Normalized Performance of 60 or higher by end of 2023, this resolves as YES, otherwise NO.

  • Mar 23, 10:24pm: Will A.I. Achieve Significantly Higher Performance Over a "Set of General Conceptual Skills" in 2023? → Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" in 2023?

