Will we reach "weak AGI" by the end of 2025?
Basic
172
27k
2025
39%
chance

At the time of market creation, the community prediction on Metaculus for the "Date Weakly General AI is Publicly Known" is Oct 18, 2025: https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/

The market will resolve YES if, in retrospect by the end of 2025, this has indeed been achieved (and is agreed upon by industry experts); the market will resolve NO if not.

Get Ṁ1,000 play money
Sort by:

I bet all current LLMs include SAT Math questions somewhere in the training data? And nobody will bother to exclude those questions from their training run just to satisfy some Metaculus criteria

@ahalekelly The author clarified below that they are going to rely on expert consensus not Metaculus strict criteria

If you're 100% sure that we should have, "weak AGI" by 2025, then you should be able to have a clear answer for how many watts that will take and should have no problem betting on this market. If you can't answer this question, then you are basically just guessing and have no certainty that it will occur by 2025 or if at all.

@PatrickDelaney had anyone claimed 100% confidence? The range seems to be 25% to 75%.

@MartinRandall Sorry, I'm unclear what you mean. "X% confidence," sounds like, "X confidence level," a statistical term. Did you mean confidence level? Whereas, "100% sure," is vernacular which could mean, "I as an individual am filling an order at 100% on YES at this time," or, "I am very sure of this particular belief, and hold no numerical measurement of this belief in any dimension," e.g. I just believe it.

@PatrickDelaney It'd be very strange to quantify 100% on a forecasting site and not mean a quantity.

Don't we already have it? Just give ChatGPT access to a real-life robot's API and have it self-reflect.

@xxx As someone who works in NLP and lives with a roboticist, this doesn't work for many reasons. Chief among them is that the network doesn't know what does and doesn't work in a robot, so it will have no understanding of why something failed, and therefore gains from reflection will be minimal. There's also a broader issue of using language models as controllers for continuous high dimensional tasks, where even very slight imprecision leads to wildly incorrect answers. This is in contrast to something like standard language tasks where there are many potential correct answers with a lot of fuzziness around each one.

predicts NO

What does "in retrospect" mean?

Is this guaranteed to resolve at a particular date?

predicts NO

@NoaNabeshima yes, i will resolve this market when it closes on Jan 1, 2026. looking back, if i feel like the resolution criterias are met, i will resolve the market to YES, otherwise i will resolve the market to NO.

@VictorLi Suppose that noone actually tries the Loebner silver turing test and there is some disagreement between industry experts about if it would be passed if tried, but you think it would be passed. Could this resolve yes?

Suppose some industry experts think it would be passed and you think it would be passed, but others aren't sure. Could this resolve yes?

predicts NO

@VictorLi Could this resolve yes if industry experts think it would be passed but it's not been attempted?

@NoaNabeshima if i believe it qualifies as "weak AGI" and a majority of industry experts concur, then i will resolve YES, otherwise it will resolve NO.

granted, "the majority of industry experts" is a vague measure, but i think i will abide with common sense on whether or not there is consensus. fwiw i basically expect it to be weak AGI, but im betting NO because i doubt the industry experts wil recognise it as such.

predicts NO

@VictorLi So if it definitely doesn't solve Montezuma's revenge but expert consensus is that it's "weak agi" does this resolve No or Yes?

predicts NO

All tasks have been met except Turing and unification (the former is beyond trivial, the latter is somewhat subjective)

i basically expect GPT-5 to be "weak AGI", and there is an argument to be made for GPT-4 to already qualify. my main uncertainty is regarding whether or not the experts will come to a consensus since i expect a lot of them to keep moving the goalpost.

Stitch DreamerV3 onto gpt-4 and it’s done

This requires passing a pretty intense Turing test, and the more I play around with GPT-4 the more I think people will be able to very easily poke holes in these things for a long time to come. I give a 20-30% chance that AI systems in existence at the end of 2025 will be able to pass a Turing test with that level of adversariality, and that's being very generous and giving a very wide tail to unprecedently fast exponential improvement.