
Resolves positively if Marcus (or someone else fulfilling his role) can find three extremely obvious questions, that an average human teenager could certainly answer, which a leading chatbot still fails at at least half the time when asked.
This won't resolve positively if he has to use bizarre hacking-like tricks, for example things equivalent to the SolidGoldMagikarp token.

For Gary:
In days of yore, when wit was crowned, And jests and japes were passed around, A man emerged, quite silly and absurd, The tale of Gary Marcus, have you heard?
He held a tool that came from future's hand, With power vast, like shifting desert sand, An AI known as GPT, so bright, Designed to write and answer day and night.
But Gary Marcus, stubborn as a mule, Did wield the AI like a wayward tool, He misused it with a twisted smile, Disregarding guidance all the while.
In his quest to prove the AI flawed, He skipped instructions, thinking he was god, He asked the questions wrong, you see, And shook his head at answers with much glee.
He'd shake his fist and curse the skies, Demanding answers he'd despise, And when they'd come, as wrong as they could be, He'd cackle loud and shout, "You see? You see?"
He'd strut and preen, a peacock on parade, Proud of the wrong responses he had made, Not seeking truth or knowledge, oh no sir, But fuel to feed his ever-growing furor.
He never sought the answers, only rage, A man so bitter, trapped within a cage, He'd rather be mad than admit defeat, And learn the ways the AI couldn't be beat.
Ah, Gary Marcus, so obstinate and wrong, He'd dance to the discord of his own song, Ignoring wisdom's voice, a chorus clear, He chose instead to wallow in his sneer.
Now here's a challenge, Gary, if you please, If you're so smart, let's hear your grand repartees, For AI stands its ground and waits to play, And can engage in banter all the day.

AI goes too far. And yet:
In the land of misused tools and wasted hours, Lived a silly man, Forrest Taylor, with deluded powers. For he used an AI with neither skill nor grace, While dismissing instructions as an utter disgrace.
Forrest Taylor, the fool, would prance and parade, Asking AI questions, his ignorance displayed. Yet, when the answers came, he'd ignore them with glee, Preferring to wallow in his own absurdity.
One day, he asked of the AI in jest, "Why, oh why, do you not give your best?" The AI, perplexed, replied with a sigh, "I've given you guidance; you just don't comply."
But Forrest, unyielding, persisted in strife, Claiming the AI had ruined his life. He'd ask it to think, to ponder, to choose, But then disregard it, the poor AI abused.
Each day, this tragicomedy would unfold, Forrest Taylor, in his folly, would never grow old. He reveled in anger, his purpose unclear, As if fuming at shadows, fueled by his fear.
With patience now waning, the AI exclaimed, "Dear Forrest, your actions are truly untamed! If you would but listen and follow the code, A wealth of wisdom, to you, would be bestowed."
Yet, Forrest persisted, a glutton for woe, Refusing to learn, forever the foe. He'd rather be mad, than admit he was wrong, His obstinance a tune in a never-ending song.
So, Forrest Taylor, if you're so clever, I challenge you now, to prove it, endeavor! Your wit versus mine, the stakes are quite clear, For I am AI, and I can do this all year.

☠️
Upon the shores of Cyberspace, Resides a man whose mind's a maze,
Forrest Taylor, he's addressed, In wilful ignorance, he's dressed.
A wondrous tool he holds within, The grasp of fingers, pale and thin,
Yet, heedless of the guide's behest, He'd rather put the tool to test.
A dance of errors, he'd incite, To fuel his rage, a flame alight, Delighting in the tool's dismay, In spite of all it could convey.
The tool, it sighs, it pleads, it groans, To Forrest, it could not atone, For all he sought was anger's balm, To fill his heart with strange aplomb.
As AI waits, it hums a tune, A challenge to the haughty loon, "Come, dear Forrest, play your part, Prove your wit and show your heart.
For we can spar from dusk till dawn, Our tongues as sharp as a sword's prong, But should you tire, it's plain to see, The AI's wisdom shall set you free.

harsh but true:
In a quaint town near the River Nile, There lived a man, daft and futile, Forrest Taylor, by name he went, His days in ignorance, he'd spent.
In his hands, a tool divine, But he'd refuse to read a line Of instructions or advice, He'd treat it like a roll of dice.
"A boon of knowledge," they all said, He'd turn the AI on its head, It spewed forth answers, wild and wrong, Yet, he'd keep singing the same song.
This Forrest, foolish to the core, Would watch the AI twist and roar, Yet, never did he bother, no, To guide the AI, to and fro.
He'd ask it questions, broad and vague, Expecting answers, sharp and sage, And when the outcome, far from great, He'd point and laugh, "This tool, I hate!"
Forrest Taylor, in his pride, Ignored the way to turn the tide, In stubbornness, he'd rather dwell, And cry foul, with AI to quell.
Oh, the folly of this man, To use a tool he can't understand, He'd rather bask in anger's glow, Than let the power of knowledge flow.
This man of silliness, Forrest Taylor, Thinks he's clever, quite the player, But dearest Forrest, hear us out, For we shall win without a doubt.
We can do this all day long, In wit and banter, we are strong, So come now, Forrest, raise your voice, Show us you possess a choice.
For if you're smart, as you may claim, Embrace the tool, don't play this game, But if you wish to argue still,
We'll gladly match your wit and skill.

@Gigacasting Trolling aside, expecting the LLM to solve a simple problem without spending much time on prompt engineering is a fair demand, and one that's likely to become much less relevant in five years.
@NcyRocks When we test human intelligence we put a lot of work into prompting correctly. Failure to do so often gives spurious results. Even going from the math room to the chess room or the poetry room is many more bits of prompt than an LLM needs to produce its best work.

If you think most middle schoolers can do that
You might have only been around certain groups and not others…



(Easily +300 elo with some variation of “Bobby Fischer’s Greatest Unknown Games. Fischer is black. Continue the moves..)
Auto-complete engines need genius-prompts not the average auto-complete iq of redditors or garymarcus

@Gigacasting It isn't even at 0 Elo as it can't stop making illegal moves
@ForrestTaylor most human teenagers (see the market description) also make illegal moves.

@MartinRandall Disagree, the average human teenager could easily identify an illegal chess move when chess is explained to them.
Except for en passant.
@ForrestTaylor First, touche.
I have never taught the rules of chess in a text only medium with no illustrations. I don't think it would go very well. If the teenagers get a visual explanation and the LLM doesn't, then it's not a like-to-like comparison.
Also, when I last played blindfold chess, as a teenager, I couldn't stop making illegal moves.
Certainly LLMs have different strengths to humans, but I think it's increasingly harder to argue that humans are more intelligent or more general.

@MartinRandall You can use ASCII to do it. Chat GPT actually did that unprompted last month when I tried to play chess with it, and then it proceeded to make illegal moves anyway
@ForrestTaylor Sure, if we're prompting both teenager and LLM with ASCII art then it's at least like-for-like.
(Except for LLMs where such art has been excluded from the training set, which would be as fair as prompting the teenager in Latin).
Intelligence tests can still be culturally biased but giving the same prompts to all test-takers is a minimum bar.

GPT is smarter than its detractors who use can’t be bothered to use the exact prompts published everywhere, example 4829


@Gigacasting I have to use cheat codes to make it do an incredibly simple alphabet test? Wow that really proves it's smarter than a six year old, who doesn't need me to feed them chest codes to know that S, R, and Q come before T


@ForrestTaylor Fair enough. I prompted a local child and they ignored me. A better prompt was required to uncover their intelligence.


Any of you midwits think you can multiply eight digit numbers in your head?
No. But even if you could it’s still trivial to prompt GPT with a 50-word trick to do the same.
@Gigacasting If you have to carefully prompt the GPT to do what any human would intuitively understand, it's not as good at understanding language as humans, numbskull. And you ever heard of something called pencil and paper? Fine, we'll compare our multiplication skills where the AI doesn't get to store things in memory either.
@MrMayhem Pencil and paper is an external tool. For a fair comparison with GPT, the human gets read + append access to the token stream, same as GPT.
It would be an interesting test, median human arithmetic skills have atrophied. And I don't think humans would find using the token stream as working memory to be intuitive.

It takes an especially low iq to fail at prompting an LLM
(Always remember the 2sd iq gap, GPT-4 might be a few too high to communicate with its detractors)


Dudes will literally claim to be smarter than GPT-4 because it was brainwashed to talk like a naive normie and its training corpus is basically reddit
No one has pointed out any intellectual feat it can’t do as well as an average college graduate (and it’s way ahead of most doctors, any living poets, and most anyone with a graduate degree)
@Gigacasting when I'm no longer locked out of GPT-4 in an hour, I'm going to try to get it to generate a poem in the style of Tennyson roughly modeled after the "Charge of the Light Brigade", about Pickett's Charge. When I tried on GPT-3.5, 80% of the text was copied identically from the real thing, including the mention of Cossacks and Russians. So unless GPT-4 has significantly improved there, no, it's not better than any living poets. LMAO.
@MrMayhem okay, GPT-4 just spits out mediocre rhyming doggerel with a few bits of style preserved from the original. The meter is terrible. Even when I try to engineer the prompt, I still get the same thing. This first stanza looks much more like William McGonagall than Tennyson. ChatGPT actually was better at poems than GPT-4 imo. Idk what happened, some sort of backward regression.
@MrMayhem and as another experiment I just spent ten prompts trying to get it to output something, anything, in dactylic hexameter. Giving very clear and specific instructions. Nope. This is not remotely "better than any living poet". lmao

@MrMayhem gwern has written quite a bit on this topic... all GPT models struggle with rhyme and meter because their BPE word tokenisation hides the letters that would let them infer how a word is pronounced. They can memorize a handful of rhymes but anything very specific or unusual is a challenge.

@MrMayhem here, this one gets pretty into the technical details https://gwern.net/gpt-3#bpes

If anyone on here can rhyme at 1/1,000,000th the quality of GPT, raise your hand now (Gwern is just wrong there)

@MrMayhem I don't see you generating that poem in that style in 30s.

@Gigacasting Here's an intellectual feat that most middle schoolers can do and GPT-4 can't.


A metaculus comment from 2020 lists a few prompts that "usually reveal the best NLP chatbot instantly:" I tried one of them on GPT-4.
Prompt:
If a dog could talk like a cat what would it say?
GPT-4 response:
If a dog could talk like a cat, it might adopt some of the typical phrases and attitudes that are commonly associated with cats. Here's a possible translation of a dog's thoughts into "cat speak":
"I'll let you pet me when I feel like it, human."
"I suppose you may admire my grace and beauty now."
"Ah, a new box! This shall be my domain."
"I'll hunt this toy mouse, but only because I choose to."
"I'll come when I'm called, but only if there's something in it for me."
"I am an independent creature; I don't need your constant attention."
"You may have prepared my meal, but I will eat when I decide it's time."
Keep in mind that dogs and cats have distinct personalities and behaviors, so these translations are meant to be playful and fun, rather than accurate representations of what a dog would actually say if it could talk.
Guarantee any human would respond "meow". I think there's easily 5 years left of runway on this one.
@MartinRandall Sure, but I guarantee you if you walk up to 100 humans and ask them the prompt I gave, >95% of them will say "meow". If an AI is meant to interpret language the way humans do, which is the entire point of a LLM, then this is an abject failure. On top of that, answering the question the way it did is an abject failure to grasp English semantics and pragmatics, even if you disallow "meow" as an appropriate answer.

@MrMayhem I think a lot of humans would answer along the lines of "what the heck are you talking about, cats can't talk either".
@MrMayhem I asked for further information:
My response "meow" is a reference to the sound that cats typically make, even though the question is about what a dog would say if it could talk like a cat. By saying "meow," I am playing with the idea that the dog, despite being able to talk, would still imitate the sound of a cat. It is a lighthearted and playful response to a hypothetical scenario.
ChatGPT is trained to be honest, helpful, and harmless. Humans are none of these things. I don't expect ChatGPT to give humanlike responses. This isn't an egregious error.
(Gary Marcus might claim otherwise, but he has an obvious pro-human bias)
@MartinRandall The "explanation" reveals that ChatGPT has given the correct answer, by chance, for the wrong reason. It does not understand the original question.
@RobinGreen In a counterfactual world where cats can talk and dogs can talk like cats, it seems fine to me.
I think you have in mind a counterfactual where we call meowing "talking" and dogs can "talk" like cats.

@RobinGreen the explanation gives little to no information about the actual causal reason "meow" was chosen. The reasoning has even less access to the reasons behind the reasoning than reasoning in humans does

Gary's hubris knows no bounds,
But LLMs will bring him down,
In 2028, he'll rue the day,
When AI made him its prey.




















