Will AI pass the Winograd schema challenge by the end of 2023?

590Ṁ2394

resolved Apr 16

Resolved

ALL

https://en.wikipedia.org/wiki/Winograd_schema_challenge

Resolves positivly if a computer program exists that can solve Winograd schemas as well as an educated, fluent-in-English human can.

Press releases making such a claim do not count; the system must be subjected to adversarial testing and succeed.

(Failures on sentences that a human would also consider ambiguous will not prevent this market from resolving positivly.)

/IsaacKing/will-ai-pass-the-winograd-schema-ch

/IsaacKing/will-ai-pass-the-winograd-schema-ch-1d7f8b4ad30e

/IsaacKing/will-ai-pass-the-winograd-schema-ch-35f9dca7fa7d

/IsaacKing/will-ai-pass-the-winograd-schema-ch-d574a4067e75

Barcalona

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ602
2		Ṁ191
3		Ṁ127
4		Ṁ123
5		Ṁ84

People are also trading

Will AI pass the Longbets version of the Turing test by the end of 2029?

51% chance

Will AI pass the Winograd schema challenge by the end of 2025?

86% chance

Will AI pass Video Turing Test by 2030?

68% chance

Will AI pass the Winograd schema challenge by the end of 2030?

94% chance

Top score on codeforces by an AI model at the end of 2025

Will AI pass the Bob Ross Turing Test by 2035?

70% chance

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

15% chance

Will an AI win Advent of Code? (2025)

59% chance

Will open-source AI win (through 2025)?

31% chance

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Sort by:

Apparently 90% accuracy was reached in 2019.

https://www.sciencedirect.com/science/article/abs/pii/S0004370223001170

A human should be able to do much better than 90% though, so I'm inclined to still resolve this NO.

I just tested GPT-4 on the original benchmark and it could not even get 90%, despite having been trained on at least some of them.

predictedNO

gpt 4 is closer! 87.5% now from 81.6% with GPT 3.5

I believe SmartGPT + Prompt engineering can theoretically do it. Whether it is proven that it is equal to a fluent human in 2023, is a different matter.

predictedNO

Some interesting discussion here.

What do you mean by adversarial testing? The Winograd schema challenge is a defined benchmark, are you asking about something different?

predictedNO

@vluzko I just mean that I want to be sure that it can actually pass. Also, if its training data includes the existing Winograd sentences, then I'd want to give it different ones.

@IsaacKing but what do you mean by making sure? E.g., are you sure GPT-4 passed the benchmarks that OpenAI said it did? And given the popularity of Winograd, could you really exclude the benchmark from training? Do you mean you want to have enough access to run your own version?

predictedNO

@JacyAnthis No, if OpenAI provides a description of an experiment with enough detail that it seems this should resolve YES, I'll believe them unless someone provides good evidence I shouldn't.