By 2029, will any AI be able to read a novel and reliably answer questions about it? (Gary Marcus benchmark #2)

117

1.7kṀ45k

2030

96%

chance

ALL

The second question from this post: https://garymarcus.substack.com/p/dear-elon-musk-here-are-five-things

The full text is: "In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI."

Judgment will be by me, not Gary Marcus.

Ambiguous whether this means start or end of 2029, so I have set it for the end.

AGI Timelines

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

By 2029 will any AI be able to watch a movie and accurately tell you what is going on? (Gary Marcus benchmark #1)

94% chance

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

29% chance

In 2029, will any AI be able to work as a competent cook in an arbitrary kitchen? (Gary Marcus benchmark #3)

65% chance

Will an AI produce encyclopedia-worthy philosophy by 2026?

15% chance

Will Quora questions be auto-answered by a more sophisticated bot (at level of GPT3.5 or higher) by EOY 2025?

71% chance

On Dec 31, 2025, will a widely available AI model be able to write a sophisticated 2000 line program?

72% chance

In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4)

81% chance

Will AI be able to read minds by 2030?

56% chance

Will Gary Marcus be accurate on at least 50% of his predictions on AI in 2029?

65% chance

Will an AI system capable of doing 50% of knowledge job arrive by 2027?

Sort by:

I think it's >50% likely to happen in 24 months

I looked at the substack and now that I have more context I still don't think I should bet. I don't see why you couldn't feed all Cliffs Notes and term papers on the Internet into a chatbot so that it generated answers to questions like "Why did Lady Macbeth wash her hands" that were at least as good as average middle schooler. That should happen by 2029. I doubt an AI will be able to pass junior quals in a rigorous undergrad literature program by 2029.

@ClubmasterTransparent I think you are misunderstanding the market. I will not test this with questions about Macbeth: Macbeth is in the training set, as is most of the analysis of Macbeth ever written. When it comes time to test I will grab a novel published after the model was trained, provide the entire novel as part of the prompt, and ask questions from about it.

Cliffs Notes is what a middle schooler who didn't read the book would refer to to answer the kind of question you describe "Why does Lady Macbeth was her hands". So you'd want this novel-appreciatimg AI to be able to answer at least as well as Cliffs Notes which is a low bar to get over. We will be waiting a while for such an AI to make an intelligent answer to "Who is the protagonist of Nabokov's Pale Fire.'

If this AI is unable to assimilate information from Cliff's Notes and middle school book reports, it won't be much use.

@ClubmasterTransparent my testing will not include books that existed when the model was trained.

@vluzko Thanks for response! So it needs to be at least as good as Cliffs Notes. Does it have to be demonstrably better? Say, at the level of an undergrad thesis in literature

@ClubmasterTransparent neither cliff notes nor theses are the right type. Testing will be question and answer format, not long form writing.

This is so high. ChatGPT can't even understand the meaning of a slightly cryptic but super short story. Here is my chat with it:

Me: Explain the twist in this story: A man leaves his house every morning to walk to work and passes a mental hospital surrounded by a wooden fence. Every morning the patients are out in the yard and he can hear them saying in unison, “10, 10, 10, 10, 10”. One day, he gets curious and looks through a hole in the fence. Suddenly, a stick shoots out and pokes him in the eye. “SHOOT!" He says to himself. While walking away pissed off he can hear the patients saying, “11, 11, 11, 11, 11”

ChatGPT: This story employs humor and a surprise twist to convey a simple message about perspective and perception.

The man in the story hears the mental hospital patients chanting "10, 10, 10, 10, 10" every day as he walks by. Initially, he doesn't think much of it, but one day, curiosity gets the better of him, and he looks through a hole in the fence. This act leads to him getting poked in the eye with a stick.

The humorous twist comes when the man, understandably upset about his injury, exclaims, "SHOOT!" In doing so, he unwittingly changes the number the patients are chanting to "11, 11, 11, 11, 11."

The meaning of the story lies in how our perspective can change our understanding of a situation. At first, the man perceives the patients' chant as meaningless repetition, but his own reaction alters that perception. It's a playful reminder that sometimes our reactions and emotions can color our understanding of events or even change the way we perceive the world around us.

Me: Why were the patients chanting 10?

ChatGPT: The story doesn't explicitly provide a reason for why the patients were chanting "10." It seems to be a random and repetitive behavior on their part, which might be typical of some individuals in a mental hospital setting where patients may engage in various repetitive or ritualistic behaviors.

The chanting of "10" serves as a humorous and somewhat absurd element in the story, adding to the surprise when the man's exclamation of "SHOOT" unintentionally changes the number they are chanting to "11." The lack of a specific reason for the chanting adds to the story's whimsical and unexpected twist, highlighting the unpredictability of human behavior and the way it can influence our perception of events.

Me: Is there any meaning to the numbers?

ChatGPT: The numbers "10" and "11" in the story don't appear to have a specific symbolic or deep meaning within the context of the narrative. They seem to be chosen arbitrarily to create a humorous and unexpected twist in the story.

The primary purpose of the numbers in the story is to set up the punchline, where the man's exclamation of "SHOOT" inadvertently changes the number the patients are chanting from "10" to "11." This twist is meant to be amusing and surprising rather than carrying a profound or symbolic meaning related to the numbers themselves. The story's humor is derived from the unexpected consequences of the man's actions and words.

----------------------------------------------------------------------------------------------

And I think that deciphering longer and longer stories would be exponentially harder, because as the events increase possible interpretations would increase exponentially.

predictedYES

@ItsMe What model?

predictedYES

@ItsMe claude 100k can already basically already do this

predictedNO

I think it's GPT3, I don't know. Whatever the default is on the openai website.

predictedYES

@ItsMe 3.5 sucks ass, no surprise

predictedNO

Can you try it with claude?

predictedYES

@ItsMe Yeah GPT-3 is not good at it. The default is now GPT-3.5. See my comment below discussing an entire short story (Casino Odyssey) with GPT-4

https://chat.openai.com/share/de75008b-ce31-48f9-af94-7996bc38d9cd

predictedYES

@ItsMe @vluzko said he would, but doesn't seem to have gotten around to it

predictedNO

You seemed to have asked it only surface level questions, though. Not something that demonstrates deep understanding.

predictedYES

@ItsMe I asked it some questions like

what are Orvielle's views about the use of powder cocaine?
what kind of currency did the protagonist get paid for his services? what does that signify?

which do require deeper comprehension than looking up facts

predictedNO

Do you have GPT4? Can you test it on the story I posted?

predictedYES

@ItsMe Yes, sure I can

predictedYES

@firstuserhere Here it is: GPT-4 -> https://chat.openai.com/share/89756bdf-2217-4049-8f3a-83e4798776af

GPT-4 code interperter -> https://chat.openai.com/share/b71f27dc-a26f-40cb-b58d-1a4581dc2009

predictedNO

Hmm. I still doubt it would be able to handle longer stories.

@ItsMe That is 2022 technology, vs 2029. Do you really think AI won't advance a lot in 7 years?!?!?

predictedNO

@5 Yeah, I think progress will plateau

predictedYES

@ItsMe Any particular intuition for why that might turn out to be the case?

predictedNO

@firstuserhere It's diminishing returns, I guess. The way LLMs work is by brute force, right? They predict what kind of words come after others, without having any actual understanding. This method can only take you so far, because as text gets longer, it becomes exponentially harder (I think) to interpret them. Overall, I believe that current AIs are as close to human intelligence is as airplanes are to teleporters.

@ItsMe Deepseek R1:

The twist in the story hinges on the patients' cryptic counting and the man’s unintended role in their "game." Here's the breakdown:

The Setup: Every day, the man hears patients chanting "10, 10, 10..." as he walks by. The repetition seems nonsensical, piquing his curiosity.
The Trap: When the man finally looks through the fence, a stick pokes his eye. His startled reaction—"Shoot!"—distracts him from realizing why the patients were counting.
The Reveal: After the incident, the patients chant "11, 11, 11..." This reveals that the numbers were tallying how many people fell for their trap. Each time someone peers through the hole, the patients poke them and increment their count.
The Twist: The man isn’t just a victim—he becomes the 11th entry in their morbid scorekeeping. The patients’ chants weren’t random; they were counting their "successes" in luring curious passersby. The story subverts expectations by turning the innocuous counting into a darkly humorous prank, where the man’s misfortune is just another number in their twisted game.