Will Gary Marcus tweet at least 10 examples of GPT-4 failure which won't be disproven/fixed within 24 hours? (in 2023)

Ṁ230Ṁ399

resolved Jan 3

Resolved

ALL

Resolves YES if Gary Marcus [quote] tweets (retweets without a comment don't count) at least 10 separate times (single thread counts as one) before the end of 2023 an example of a prompt which causes GPT-4 (or another successor SoTA model) to provide a wrong response, which isn't countered with a screenshot / example (e.g. a link) of the model giving a right answer within 24 hours. Example of a counter-tweet below:

Only tweets posted after the market creation and before the market close count (excluding counter-tweets). Positive examples should be posted in comments in order to count. If the counter-tweet is proven to be false/fake it doesn't count, but evidence should also be posted in comments.

Context:

Market context

Gary Marcus GPT-4 predictions

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ31
2		Ṁ28
3		Ṁ19
4		Ṁ7
5		Ṁ5

People are also trading

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

86% chance

Will Gary Marcus be accurate on at least 50% of his predictions on AI in 2029?

57% chance

Gary Marcus 2029 AI predications

Will Gary Marcus' legs be turned into paperclips (or similarly affected by AI) before he predicts AGI within 2.5 years?

12% chance

Sort by:

If you meant to count the example in the description as "disproven" then I think you should check again. My guess is that Alyssa has custom instructions enabled.

predictedYES

@ErickBall Is it your screenshot? Thanks for posting, I'll check

predictedYES

@MrLuke255 Yeah, I consider playground more definitive for this sort of thing than ChatGPT because you can use temperature 0

@ErickBall Not custom instructions, I tested this myself at the time, and repeated it now, and GPT-4 can answer it correctly. Would also be very silly for Alyssa to cheat when anyone could check.

As for the discrepancy with the playground: ChatGPT with GPT-4 does have a system message, so it's not going to be identical to the playground. And maybe it's just bad luck that at temperature zero it gives the wrong response. Also you set the maximum length low, maybe that means it doesn't have space to think. The correct answer is a bit more wordy, and ChatGPT tends to do better when it can think out loud for a bit.

People are also trading

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

86% chance

Will Gary Marcus be accurate on at least 50% of his predictions on AI in 2029?

57% chance

Gary Marcus 2029 AI predications

Will Gary Marcus' legs be turned into paperclips (or similarly affected by AI) before he predicts AGI within 2.5 years?

12% chance

🏅 Top traders

People are also trading

People are also trading

Related questions