Will AI pass the Winograd schema challenge by the end of 2023?

Ṁ590Ṁ2.4k

resolved Apr 16

Resolved

ALL

https://en.wikipedia.org/wiki/Winograd_schema_challenge

Resolves positivly if a computer program exists that can solve Winograd schemas as well as an educated, fluent-in-English human can.

Press releases making such a claim do not count; the system must be subjected to adversarial testing and succeed.

(Failures on sentences that a human would also consider ambiguous will not prevent this market from resolving positivly.)

/IsaacKing/will-ai-pass-the-winograd-schema-ch

/IsaacKing/will-ai-pass-the-winograd-schema-ch-1d7f8b4ad30e

/IsaacKing/will-ai-pass-the-winograd-schema-ch-35f9dca7fa7d

/IsaacKing/will-ai-pass-the-winograd-schema-ch-d574a4067e75

Market context

Barcalona

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ602
2		Ṁ191
3		Ṁ127
4		Ṁ123
5		Ṁ84

People are also trading

Will AI pass the Winograd schema challenge by the end of 2030?

94% chance

Will any AI achieve a score of 25% on ARC-AGI-3 by the end of 2026?

65% chance

Will AI pass the Rube Goldberg Turing test by the end of 2028?

37% chance

Will any AI model score above 95% on ARC-AGI-2 by end of 2026?

90% chance

Will OpenAI be in the lead in the AGI race end of 2026?

31% chance

Will AI pass the Longbets version of the Turing test by the end of 2029?

54% chance

Will AIs beat human experts in question-answering on the GPQA benchmark before January 1st, 2027?

95% chance

Will AI pass Video Turing Test by 2030?

68% chance

Will AI pass the Bob Ross Turing Test by 2035?

64% chance

Will an AI achieve a perfect score on the Miklós Schweitzer Competition before 2035?

81% chance

Sort by:

Apparently 90% accuracy was reached in 2019.

https://www.sciencedirect.com/science/article/abs/pii/S0004370223001170

A human should be able to do much better than 90% though, so I'm inclined to still resolve this NO.

I just tested GPT-4 on the original benchmark and it could not even get 90%, despite having been trained on at least some of them.

predictedNO

gpt 4 is closer! 87.5% now from 81.6% with GPT 3.5

I believe SmartGPT + Prompt engineering can theoretically do it. Whether it is proven that it is equal to a fluent human in 2023, is a different matter.

predictedNO

Some interesting discussion here.

What do you mean by adversarial testing? The Winograd schema challenge is a defined benchmark, are you asking about something different?

predictedNO

@vluzko I just mean that I want to be sure that it can actually pass. Also, if its training data includes the existing Winograd sentences, then I'd want to give it different ones.

@IsaacKing but what do you mean by making sure? E.g., are you sure GPT-4 passed the benchmarks that OpenAI said it did? And given the popularity of Winograd, could you really exclude the benchmark from training? Do you mean you want to have enough access to run your own version?

predictedNO

@JacyAnthis No, if OpenAI provides a description of an experiment with enough detail that it seems this should resolve YES, I'll believe them unless someone provides good evidence I shouldn't.