Preface:
Please read the preface for this type of market and other similar third-party validated AI markets here.
Third-Party Validated, Predictive Markets: AI Theme
Market Description
Google Big Bench Lite
https://arxiv.org/pdf/2206.04615.pdf
Big Bench was published in June 2022, as a collaborative effort between Google, OpenAI and 132 other institutions to come up with a way to characterize Large Language Model (LLM) capabilities and measure them.
The idea behind Big Bench is that it's a constantly evolving bench mark, meant to measure, "tasks that are believed to be beyond the capabilities of current language models."
While Big Bench doesn't appear to easily publish an aggregate score of all groups of measurements at ths time, they do publish a lite version of a broad array of tasks, including:
auto_debugging, bbq_lite_json, code_line_description, conceptual_combinations, conlang_translation, emoji_movie, formal_fallacies_syllogisms_negation, hindu_knowledge, known_unknowns, language_identification, linguistics_puzzles, logic_grid_puzzle, logical_deduction, misconceptions_russian, novel_concepts, operators, parsinlu_reading_comprehension, play_dialog_same_or_different, repeat_copy_logic, strange_stories, strategyqa, symbol_interpretation, vitaminc_fact_verification, winowhy
There's about 20 or so difficult tasks, so it's kind of like the Dow Jones of LLM.
Market Resolution Criteria:
https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/results
Specifically : https://github.com/google/BIG-bench/blob/main/bigbench/benchmark_tasks/results/plot_BIG-bench_lite_aggregate.pdf
From the above chart, as of the time of creating this market, the highest score appears to be PaLM 2-Shot at about 43.
If any BigBench Lite Submission Gets an Aggregate Normalized Performance of 60 or higher by end of 2023, this resolves as YES, otherwise NO.
Mar 23, 10:24pm:
Will A.I. Achieve Significantly Higher Performance Over a "Set of General Conceptual Skills" in 2023?β Will A.I. Achieve Significantly Higher Performance Over "General Conceptual Skills" in 2023?
2024 version of this market: https://manifold.markets/PatrickDelaney/-will-ai-achieve-significantly-high
Last commit I saw on this leaderboard was June 2023. Please message me if I'm missing something or if they are tracking BigBench elsewhere. Otherwise, resolving NO.