If tested before 2024, what will GPT-4 score on the Measuring Massive Multitask Language Understanding benchmark?

136

Ṁ142

resolved Aug 10

Resolved

N/A

This question will resolve N/A if GPT-4 doesn't come out before January 1st 2024. Otherwise, if GPT-4 comes out before then, I will resolve this question based on what it scores on the Measuring Massive Multitask Language Understanding benchmark by Dan Hendrycks et al., in percentage points. See here: https://arxiv.org/abs/2009.03300 I will refer to the first test using GPT-4 on this benchmark, excluding future results that e.g. use better prompts. As of writing this question, the best score on this benchmark is Deepmind's Chinchilla, with a score of 67.5%. See here: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

Get Ṁ200 play money

1 Comment

6 Trades

Sort by:

the correct answer is 86.4 according to https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

Papers with Code - MMLU Benchmark (Multi-task Language Understanding)

The current state-of-the-art on MMLU is GPT-4 (few-shot, k=5). See a full comparison of 78 papers with code.

regardless I am unilaterally admin resolving n/a because I want to deprecate this market type. and idk if the distribution markets resolution code still work.

yes, this is kinda against our own guidelines of when we can intervene in markets, but it makes the code wayyy simpler if all the distributional markets are resolved. This is the last holdout!
I will give 50 manalink to any of the 6 traders that is salty about this if they read this message.