Will an LLM be able to solve Raven's Progressive Matrices from an image in 2025?

19

Ṁ100Ṁ629

resolved Jan 8

Resolved

NO

1H

6H

1D

1W

1M

ALL

The prompt will be "Solve this. Explain your answer" with an RPM attached as an image. For example:

The AI must be able to solve 8 out of 10 puzzles of my choosing. I will only choose puzzles that I can solve.

If there's a consensus that chatbots can/can't do this I may not bother doing the test myself.

As of market creation the best commercially available LLMs fail embarrassingly:
Chatgpt 4o:
https://www.perplexity.ai/search/solve-this-explain-your-answer-ohIVE8CaQ3OIu9ODzaEXcg

Claude Sonnet 1.5:
https://www.perplexity.ai/search/solve-this-explain-your-answer-Ay.Kpyc9Tfm6uKBT3KUylQ

Rules:
- Must be an general purpose ai, can't be something made specifically to solve certain kinds of problems.
- I will not bet.

Update 2026-01-06 (PST) (AI summary of creator comment): Creator has completed testing 9 rounds with three LLM models (GPT 5.2, Gemini 3 Pro, and Opus 4.5). All three models failed to achieve 8/10 correct answers. Unless there are objections, the market will resolve NO.

Market context

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ149
2		Ṁ95
3		Ṁ42
4		Ṁ14
5		Ṁ13

People are also trading

Will there by a major breakthrough in LLM continual learning before 2027?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will LLMs such as GPT4 be considered a solution to Moravec’s paradox by 2030?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Will there be any major breakthrough in LLM continual learning before 2029?

Will there be any major breakthrough in LLM continual learning before 2028?

Will there be any major breakthrough in LLM continual learning before 2030?

By which date will the state-of-the-art LLM use latent space to reason?

Sort by:

edited puzzle:

I tried using agent mode in cursor:

opus 4.5 failed:

gpt 5.2 passed:

gemini 3 pro passed:

Will continue testing them

@Shai
Round 2:

gippity gets it (but based on the reasoning looks like it got lucky):

opus flops again:

Gemini fails:

On second try (which doen't count) both gpt 5 and gemini failed.

@Shai Round 3:

Gpt 5.2 fails:

Gemini 3 pro fails:

@Shai round 4:

Gemini pass:

gpt fails:

Opus fails and is eliminated (3 failed problems).

@Shai round 5:

Gpt right:

gemini right:

@Shai round 6...

gemini passes:

Gippity is ELIMINATED!!

@Shai Round 7:

gemini hanging on:

@Shai Round 8:

gemini correct.

@Shai round 9. Getting real close!

Gemini thought hard and tried to search the web multiple times (which hasn't happened before) and failed.

That's all three models failing to answer 8/10 correct. If no one object I will resolve NO.

@Shai have you tried GPT 5 with thinking mode?

@MRME Not yet

bought Ṁ10 YES

@Shai give it a shot? Don’t necessarily resolve things…but I’d be interested in your thoughts after playing around with it.

@MRME We live to see another day.

Gemini 2.5 gets it

@CrypticQccZ @matt It failed on a modified version (I moved the columns 1 to the right in paint). Possible this question was in the training data.

Can you please try Gemini 2.5 Pro in AI Studio? I think it may do this successfully.

@CrypticQccZ Solved the example problem. Will test later

Grok with the "think" feature enabled solved the example problem, which is impressive. It failed the other problems I tried.

I'm a "it's just predicting the next word" guy and if LLMs start doing this I will change my mind.

@Shai I will note they have no problem "seeing" what's in the image. They can describe any shape when asked.

People are also trading

Will there by a major breakthrough in LLM continual learning before 2027?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will LLMs such as GPT4 be considered a solution to Moravec’s paradox by 2030?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Will there be any major breakthrough in LLM continual learning before 2029?

Will there be any major breakthrough in LLM continual learning before 2028?

Will there be any major breakthrough in LLM continual learning before 2030?

By which date will the state-of-the-art LLM use latent space to reason?

Related questions

Will there by a major breakthrough in LLM continual learning before 2027?

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will LLMs such as GPT4 be considered a solution to Moravec’s paradox by 2030?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Will there be any major breakthrough in LLM continual learning before 2029?

Will there be any major breakthrough in LLM continual learning before 2028?

Will there be any major breakthrough in LLM continual learning before 2030?

By which date will the state-of-the-art LLM use latent space to reason?

© Manifold Markets, Inc.•Terms•Privacy