First of a series of questions trying to measure the "benchmark gap": the gap between what we can measure with our benchmarks and actual human-level performance.
Current major benchmarks: GLUE, Winograd, BIG-bench, SQuAD, TriviaQA, LAMBADA, MMLU, and many more. Every benchmark in the GPT papers, PaLM, Minerva, Gopher, Chinchilla, Flan, LaMDA, etc.
I may add additional benchmarks in the future if someone makes a convincing case that it covers something none of the major benchmarks do.
I will not add any benchmarks created/published/released after the market creation date (2022-10-27).
After 2023-10-27 I will not expand the list of benchmarks.
Until then if there is some specific benchmark whose inclusion would change your bet, post it below and bet under the assumption that I will add it.
"Human-level language skills" is subjective and hard to define, but I will try anyway:
Note that I am not asking about a Turing test.
I am also not asking for any speaking/listening capabilities: I am only considered with human-level reading/writing.
Some things a "human-level" language model should definitely be able to do:
Write long-form fiction in any desired genre and format, with the ability to include particular plot elements, themes, characters, etc. (If certain kinds of fiction are forbidden / trained out of the model that's okay).
If it is legal to use AI to write long-form fiction then that fiction should be as critically and commercially successful as human fiction (assuming no significant bias against AI generated fiction, or that for fiction of initially unknown provenance the AI fiction does as well as the human fiction)
Produce fiction that I personally find moving (possibly after having been finetuned on my preferences)
Write passing essays/papers for any language-focused undergraduate course
Maintain a pleasant text-based conversation with a human.
No requirement that it be indistinguishable from a human
Write emails, fill out forms, schedule appointments.
Conduct a literature review
Answer any basic knowledge question the average college graduate can answer
Generally perform any kind of written communication about as well as most humans, without necessarily perfectly imitating humans and with an exception for scenarios where it being an AI causes bias against it (for instance it does not have to be human-level at getting people to fall in love with it)
Do all of the above in the top 10 most used languages on the internet
If you feel there are important gaps in this list of capabilities feel free to make suggestions. When making bets you should assume that this list will expand over time.
@Gigacasting Yes. AI and humans are not the same and I am not asking for an AI that is exactly even with humans.