Has AI surpassed technical but non-mathematician humans at math?
13
1kṀ1511
Jan 11
32%
chance

I'm going to make this about me, and not bet in this market.

This is like my superhuman math market but with a much lower bar. Instead of needing to solve any math problem a team of Fields medalists can solve, the AI just needs to be able to solve any math problem I personally can solve.

And I'm further operationalizing that as follows. By January 10, will any commenter be able to pose a math problem that the frontier models fail to give the right answer to but that I can solve. If so, this resolves to NO. If, as I currently suspect, no such math problems can be found, it resolves YES.

(In case it helps calibrate, I have an undergrad math/CS degree and a PhD in algorithmic game theory and I do math for fun but am emphatically not a mathematician and am pretty average at it compared to my hypernerd non-mathematician friends. I think I'm a decent benchmark to use for the spirit of the question we're asking here.)

FAQ

1. Which frontier models exactly?

Whatever's available on the mid-level paid plans from OpenAI, Anthropic, and Google DeepMind. Currently that's GPT-5.2-Thinking, Claude Opus 4.5, and Gemini 3 Pro.

2. What if only one frontier model gets it?

That suffices.

3. Is the AI allowed to search the web?

TBD. When posing the problems I plan to tell the AI not to search the web. I believe it's reliable in not secretly doing so but we can talk about either (a) how to be more sure about that or (b) decide that that's fair game and we just need to find ungooglable problems.

4. What if the AI is super dumb but I happen to be even dumber?

I'm allowed to get hints from humans and even use AI myself. I'll use my judgment on whether my human brain meaningfully contributed to getting the right answer and whether I believe I would've gotten there on my own with about two full days of work. If so, it counts as human victory if I get there but the AIs didn't.

5. Does the AI have to one-shot it?

Yes, even if all it takes is an "are you sure?" to nudge the AI into giving the right answer, that doesn't count. Unless...

6. What if the AI needs a nudge that I also need?

This is implied by FAQ4 but if I'm certain that I would've given the same wrong answer as the AI, then the AI needing the same nudge as me means I don't count as having bested it on that problem.

7. Does it count if I beat the AI for non-math reasons?

For example, maybe the problem involves a diagram in crayon that the AI fails to parse correctly. This would not count. The problem can include diagrams but they have to be given cleanly.

8. Can the AI use tools like writing and running code?

Yes, since we're not asking about LLMs specifically, it makes sense to count those tools as part of the AI.

(I'll add to the FAQ as more clarifying questions are asked.)

Related Markets

[ignore auto-generated clarifications below this line; nothing's official till I add it to the FAQ]

  • Update 2025-12-13 (PST) (AI summary of creator comment): - Problems can be presented informally

    • Creator will try not to search the internet but may already know some problems

    • Creator is allowed to use Mathematica when solving problems

    • Physics and process optimization problems: Creator will use judgment on whether they count as math problems (subject to debate in comments)

  • Update 2025-12-13 (PST) (AI summary of creator comment): Misleading or trick questions:

    • Problems phrased with racial slurs or in languages the creator doesn't understand (like Vietnamese) would not count as valid math problems

    • Problems need to be translatable to a canonical form for standard technical communication

    • Full-on trick questions are likely out of bounds (too much randomness for both AI and humans)

    • Simply being phrased misleadingly may be fair game, but will be evaluated on a case-by-case basis with examples

Get
Ṁ1,000
to start trading!
Sort by:

Can you share what experience you have yourself using recent AI for math? I'm somewhat confused, because in my experience if you give youself two days, I wouldn't even expect it to be close for someone with this type of background, but the fact that you're creating this market seems to imply that you're at least pretty uncertain.

@consnop I'd say I pose math problems to AI on a weekly basis at least, sometimes more often. I seeded this market with M$1k of liquidity and my initial guess for the probability was 67%. With higher liquidity I probably would've gone lower. Which is to say that some of the uncertainty is about how hard people will hunt for examples. My own impression is that, with the latest frontier models, at least 1 of the 3 of them always gets the right answer for anything I throw at them. This wasn't true a month ago, and my sample size is too low to be very sure it's true today.

It sounds like you're saying that, even with the latest models, it's not uncommon for you to see them fall on their face on math problems that aren't that hard. I'm especially eager to see those examples!

@dreev have to admit I haven't played too much with the latest batch, but I have the impression it would have to be a much bigger jump than I think likely to make this resolve YES. I'll give finding an example a shot if I have a bit of spare time!

If it's phrased in a misleading way for AI, does it count?

What if I phrase it in a way that makes the AI refuse to answer it? The AI would technically be failing to answer correctly.

bought Ṁ300 NO

Yeah this is driving my no bet. AI isn't at human level on either arc or simple bench which are broadly speaking "math problems"

@Usaar33 Oh, my feeling is that that's too broad a definition of math problem and would violate the spirit of the question to resolve NO for that reason. Like if you pose a problem with a bunch of racial slurs and the AI clams up, that's similar to how you could pose the question in Vietnamese and the AI wouldn't bat an eye but I'd be clueless.

Basically, translating the problem statement to a canonical form for standard technical communication should be allowed.

But just being phrased misleadingly? My gut reaction is that that's fair game but we should probably look at examples before making an official verdict there. I think full-on trick questions should probably be out. Too much randomness, both for the AI and for humans, in whether one spots the trick.

Didn't GPT-5 fail to get that super easy bagel-splitting question right? Does that count for this market?

@ItsMe Oh, crap, I forgot about that one! Alright, do you want to pose it again here? AI has gotten a lot better since then, so we'll see. (But I think the market probability should be falling right now, before having checked myself.)

Are calculators better than humans at math? It's probably the same answer as that.

@ItsMe Ha, true. But I think we're robust to that technicality. Namely, it doesn't matter that there are infinitely many problems that AI (or just calculators) can solve that I can't. We're asking whether there exists a single math problem that AI can't solve but that I can.

Problems can be presented informally, correct? Are you allowed to search the internet yourself? Are physics problems allowed? Process optimization questions?

@Usaar33 Yes to presenting problems informally. I will try not to search the internet but of course I may already know some problems. Ultimately I'm making the judgment about whether I could have solved a problem on my own (plus Mathematica, let's say).

As for physics and process optimization problems, I'll use my judgment on whether they also count as math problems. Or we can debate it here in the comments.

The spec doesn’t match the title very well. If AI is can solve 1000 things that you can’t, and you can’t solve 1 thing AI can’t, then AI is better at math than you but this bet resolves to No.

@AlexRosence5a Hmm, yeah, got ideas for a better title? Something along the lines of "does ai pareto-dominate stem people at math?" maybe?

Or more direct: will we find a math problem i can solve that ai can't?

bought Ṁ50 NO

Is it allowed to use code interpreter/calculator tools?

@spiderduckpig Yes, I think it makes sense to treat that as allowed. This isn't asking about LLMs specifically. So those tools count as part of the AI.

Does vision count? eg a super crappy picture of a beginner sudoku, or kid's drawing of a houses/utilities topology puzzle variant might be too hard for their multi-modality to deal with, but this seems a bit cheap. Maybe math problems must be submitted in text form?

@DZC Yes, I'd like to call that too cheap. Great to clarify though!

And I don't think the math problems should have to be pure text. Just written up cleanly.

© Manifold Markets, Inc.TermsPrivacy