https://en.wikipedia.org/wiki/Winograd_schema_challenge
Resolves positivly if a computer program exists that can solve Winograd schemas as well as an educated, fluent-in-English human can.
Press releases making such a claim do not count; the system must be subjected to adversarial testing and succeed.
(Failures on sentences that a human would also consider ambiguous will not prevent this market from resolving positivly.)
/IsaacKing/will-ai-pass-the-winograd-schema-ch
/IsaacKing/will-ai-pass-the-winograd-schema-ch-1d7f8b4ad30e
Apparently 90% accuracy was reached in 2019.
https://www.sciencedirect.com/science/article/abs/pii/S0004370223001170
A human should be able to do much better than 90% though, so I'm inclined to still resolve this NO.
@vluzko I just mean that I want to be sure that it can actually pass. Also, if its training data includes the existing Winograd sentences, then I'd want to give it different ones.
@IsaacKing but what do you mean by making sure? E.g., are you sure GPT-4 passed the benchmarks that OpenAI said it did? And given the popularity of Winograd, could you really exclude the benchmark from training? Do you mean you want to have enough access to run your own version?
@JacyAnthis No, if OpenAI provides a description of an experiment with enough detail that it seems this should resolve YES, I'll believe them unless someone provides good evidence I shouldn't.