Human-machine intelligence parity achieved before 2028

1kṀ5210

2028

62%

chance

ALL

(Metaculites created this question with resolution date 2040.)

"A team of three expert interviewers will interact with a candidate machine system (MS) and three humans (3H). The humans will be graduate students in each of physics, mathematics and computer science from one of the top 25 research universities (per some recognized list), chosen independently of the interviewers. The interviewers will electronically communicate (via text, image, spoken word, or other means) an identical series of exam questions of their choosing over a period of two hours to the MS and 3H, designed to advantage the 3H. Both MS and 3H have full access to the internet, but no party is allowed to consult additional humans, and we assume the MS is not an internet-accessible resource. The exam will be scored blindly by a disinterested third party." The experts may come up with new questions to ask while administering the test.

If such a test is passed before 2028, then this resolves Yes. If such a test is conducted on a state-of-the-art AI in 2027, and the AI fails, then this resolves No. If neither criterion is met, then this will resolve to my credence that such a test could be passed by an existing AI system. I will not participate in this market.

EDIT: No other AI systems should be consulted. Systems which use an AI instrumentally (e.g. as in Google search results) are ok, but the test adminstrator should do their best to redact direct AI content e.g. the AI-generated QA panels at the top of certain Google queries.

Market context

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Will it be effectively impossible to tell a human and a high quality AI apart on social media before 2026?

8% chance

Will AI be smarter than any one human probably around the end of 2025?

4% chance

As predicted by AI 2027, will AI surpass all living humans in coding ability by January 2027?

16% chance

In what year will human-level AI first exist?

2038

Will an AI system capable of doing tasks that take humans eight hours as determined by METR.org, exist by 2027

91% chance

Will an AI system capable of doing 50% of knowledge job arrive by 2027?

21% chance

Which peak human skills will smart robots have by 2026?

Human whole brain emulation before 2100?

80% chance

will there be robots indistinguishable from real human beings by 2032

7% chance

In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

Sort by:

Given 2 hours constraint I think o3 can do it already. Especially if there are many questions, like e.g. 30. Although "designed to advantage the 3H" is a bit vague, it must be possible to design some very adversarial questions (in the lieu of "what number is larger, 9.9 or 9.11") but it will be silly

As stated it's left open whether AI are allowed to be consulted by both sides. If they were, then this ends up being a question on the gap between the best and second best available AI system at times of testing.

I propose adding a clause that no other AI systems should be consulted. Systems which use an AI instrumentally (e.g. as in Google search results) are ok, but the test adminstrator should do their best to redact direct AI content e.g. the AI-generated QA panels at the top of certain Google queries.

If no one objects within a week, I will add this to the question text. I'm very open to debate here, since I think this is a significant ambiguity in the resolution criterion as stated.

@JacobPfau I made this change.

I've added the text "The experts may come up with new questions to ask while administering the test" to clarify.

This tests only a very limited area of intelligence which favours AI. Add an additional test that the humans and AI have to navigate autonomously from the other side of the city to the examination site for a fair fight.

predictedYES

@Toby96 This is simply a clone of a pre-existing Metaculus-bot question/market. The market was resolved N/A on Manifold because it was deleted or otherwise unfindable on Metaculus.

You're right that it's a fairly narrow area of intelligence, but if it were expanded I don't think your suggestion is very good since that is more of a test of robotics/sensory technology advancements than AI advancements. Other expansions (ideally in a different market) could include doing economic tasks, expanding the relevant knowledge domain to areas in a larger slice of STEM or outside of STEM fields, or other things that don't require robotics or other non-AI technical advances.

This is beyond parity. I'm not a graduate student from a top 25 research university, and most people aren't. Plus the questions will be designed to advantage the humans.