
Will Resolve yes if an AI is able to score more than 100 IQ points on an offical IQ re test about problem solving. If the IQ test contains questions that are just about retained knowlege (like "who was the president in 2011") it wont count. Will Resolve no if it does'nt resolve yes.
This market will predict whether or not artificial intelligence will out-perform humans in the area of problem solving.

Would be nice to reframe this about a specific new raven's progressive matricea test to keep it simple
@StrayClimb I'm particularly interested in the ARC test. Created a market specifically for that: https://manifold.markets/MGM/ai-solves-the-abstraction-and-reaso



Very misleading title. Please change to reflect the actual resolution criteria. Compare to the probability of /L/will-human-brains-be-weaker-than-ai, which is a synonymous question yet has criteria that more in line with the title.

@MartinRandall Then this should resolve YES right now, shouldn't it? Computers already surpass humans in several specific ways.

@IsaacKing @MartinRandall he's right, you can train an LLM (which is effectively synonymous with AI in most people's minds right now) to specifically pass virtually any test if you have the answers ahead of time. It's kind of like saying, "can a computer pass a test that it has the answers to ahead of time?" Yeah of course, that's been true for decades.

@IsaacKing Well, yes, and I bought YES accordingly. The specific resolution criteria are "more than 100 IQ points on an official IQ test about problem solving" and that's done several times already.

@MartinRandall You mean you don't just scroll through the front page and give wild ass guess bets on things YES/NO based purely on the title? I do...way more fun...maybe I need to take the common sense benchmark test again.
@PatrickDelaney Well, I do both. In life, sometimes you're the shark, and sometimes you're the whale.

Bigbench Lite is an attempt to index a larger set of human intellect skills, it's the Dow Jones Industrial Average of benchmarking AI against human intellect:


As someone who has only bet $M 10 on this, I would like human intellect to be re-defined to make this more interesting, otherwise it's already as good as resolved YES for the most part as others has pointed out. A more interesting discussion could get into further discussions about what human intellect is and what the best metric or sets of metrics for it would be, vs. various A.I. leaderboards and find something that's more of a 50/50 at this point.
@PatrickDelaney I think that is a different market, which I encourage you to create.
Scoring > 100 points on an IQ test is a terrible definition of "surpass human intellect." I guess that's why you always read the description before betting!

No doubt Raven’s and older SATs will be done quite soon (150 LSAT is already basically ~100 IQ)

Please be more specific about the tests you'd accept, because current AI are probably capable of scoring higher than 100 is plenty of them right now, forget 2030.
For example, these should all meet your requirements of not being "just about retained knowledge":
RPM
CFIT 3
KBIT Non-Verbal
CAS (Das-Naglieri)
Naglieri Non-Verbal
WRIT Visual
RIAS
MAB II Performance
This list includes most of the widely used IQ tests, by the way. You should be very clear if you think that any particular IQ test isn't eligible to qualify ahead of time.
It would be annoying for example if e.g. AI were to keep clearing IQ tests but you then retroactively come up with reasons to disqualify them for one reason or another. You could start by saying all the ones on the list would be fine, then people can work with that.

Does this include AIs that are specialized for a specific type of IQ test taking?
We worked on an IQ test solving AI and it was surprisingly hard to get it to work well but that was literally a day's test or so, so I just cannot imagine that it won't be able to do it.

Many IQ tests contain sections that measure retained knowledge -- e.g., measuring short term memory by repeating back numbers. Are you ignoring any test that has any such component? How 'official' does a test have to be? Have you identified any tests that could resolve this market YES?




















