Next year will I think that AI is better than me at math?

1kṀ5118

Dec 29

chance

ALL

Within one year, will there be an AI that can solve any math problem I can (_including_ research math problems) for less money than it would cost to hire me or someone with a similar background as a consultant on the problem (let's say $250/hour).

In theory I should test this by handing it my grad school work and seeing how it does but that may be prohibitively expensive, so instead resolution will be based on my inscrutable whims / general vibes, so consider yourselves warned.

(For my level of math: this is my real name and you can look up my resume, but tl;dr I dropped out of a PhD in ML where about half of my time was spent on PAC learning bounds for causal discovery algorithms. I made it semi-far into the proofs but didn't publish, which is part of why the comparison will be vibes based. I also did okay on the Putnam but it's pretty likely that AI is already better than me at competition math so I don't think that's very relevant.)

Update 2024-21-12 (PST): The market will be resolved based on my assessment at market close time (one year from market creation). I will resolve Yes if I think AI is better than me at that time, and No otherwise. (AI summary of creator comment)

Get

1,000

to start trading!

People are also trading

Does AI Pareto-dominate technical but non-mathematician humans at math?

25% chance

Will AI be smarter than any one human probably around the end of 2025?

4% chance

Will an AI score over 80% on FrontierMath Benchmark in 2025

2% chance

Will I think the "AI has a data bottleneck" people are dumb before the end of 2025?

10% chance

By 2026, will I regret learning a bunch of math/physics as opposed to focusing more on AI?

56% chance

Will AI surpass human intellect by 2030?

90% chance

AI outperforms humans in all mathematical research areas by 2028?

11% chance

Will AI solve one of 129 major mathematical conjectures before year X?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

55% chance

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Sort by:

6 7

@vluzko how about an update? How do you feel currently about AI's math ability?

@MRME I gave one of my research problems to the openai and anthropic thinking models and they just fell over. I'm sure with better prompting and tooling they could do better, but it doesn't seem very close right now.

@VincentLuczkow how do you feel personally about this? Like on an emotional level?

bought Ṁ30 YES

Yes I think so, provided you have access to o4-class models

I think it's likely someone will discover classes of problems that o3 at release seriously struggles with, like combinatorics or something

will there be an AI that can solve any math problem I can

I'm assuming "any" here means "all" rather than "at least one", otherwise a pocket calculator wins lol

If so, this may be near impossible with machine learning, because it can only learn to do stuff based on there being a bunch of it in its training data right, which may be impossible unless your level of research math becomes a common publicly posted passtime

bought Ṁ20 YES

@TheAllMemeingEye It only needs to understand the concept, and it can reason through the problem after that.

There are a lot of things not in specific AI models' training data that they can still figure out with some effort.

@Haiku what would you say are some good examples of such things? My understanding is that absence from training data is why for example LLMs often struggle with ASCII art

@TheAllMemeingEye To your question, it depends on what constitutes "not in the training data" (i.e. how close of an example counts), but I think some good examples include:
- Explaining novel jokes
- Playing a simple novel game explained at task time
- Solving novel code challenges
- Solving novel math problems

At some point when you've seen enough, there almost isn't such a thing as novelty, since there's always something that is in some way similar. But that's a property of information, not a property of language models. Humans also usually can't solve types of problems that are extremely novel to them.

I think the ASCII art thing has more to do with the fact that LLMs see the world through 1 dimension, so it's difficult to construct representative 2D images with no practice (i.e. no post-training/RL on ASCII art output). That's roughly the same reason why the ARC benchmark took so long to beat. A model that can beat that benchmark the way it's forced to do it is much more intelligent (in that aspect) than a human. If you trained an LLM much more heavily on ASCII art, it would probably overcome the handicap and be able to produce new and compelling ASCII images despite how difficult it is to do so, because much more of its neural network would be dedicated to memorizing additional layers of useful algorithms for doing so. I think doing this task in 1D would be extremely difficult for most humans.

Intelligence/reasoning is a huge patchwork bundle of various useful algorithms. There are obvious holes in LLM reasoning that haven't been patched yet, but I haven't heard any compelling arguments for why they'll never be patched in that architecture.

I don't really have sources on most of the above, but I really liked this deep dive on whether LLMs can reason:
https://www.youtube.com/watch?v=wXGiV6tVtN0

@Haiku thanks for explaining 👍

Just to be clear, you will resolve no if that is the case around new year 2026

@JussiVilleHeiskanen What does "that" refer to? I will resolve yes if I think AI is better than me (at market close time, one year from market creation), and no otherwise.