Will we reach "weak AGI" by the end of 2025?

295

2.2kṀ67k

Dec 31

1.7%

chance

ALL

At the time of market creation, the community prediction on Metaculus for the "Date Weakly General AI is Publicly Known" is Oct 18, 2025: https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/

The market will resolve YES if, in retrospect by the end of 2025, this has indeed been achieved (and is agreed upon by industry experts); the market will resolve NO if not.

AGI Timelines

Technology

Technical AI Timelines

World

AI Safety

Get

1,000

to start trading!

People are also trading

Will we get AGI before 2028?

17% chance

Will we get AGI before 2035?

60% chance

Will we have an AGI as smart as a "generally educated human" by the end of 2025?

7% chance

Will we get AGI before 2046?

84% chance

Will we get AGI before 2027?

6% chance

Will we get AGI before 2032?

47% chance

Will we get AGI before 2047?

85% chance

Will we get AGI before 2034?

55% chance

Will we get AGI before 2038?

72% chance

Will we get AGI before 2039?

Sort by:

Super weak hidden

community prediction on Metaculus for the "Date Weakly General AI is Publicly Known" is Oct 18, 2025

Looking back at yesterday... hasn't happened 😅

But seriously: At end of 2025, we look at the Metaculus question and, if it's resolved by then, this market resolves Yes, otherwise No, correct?

@Primer that’s one option, the other is to resolve according to the criteria of the Metaculus question

@DavidHiggs Down in the comments the creator said this will resolve according to his estimate of whether experts would say we got "weak AGI", basically irrespective of the Metaculus question's resolution.

To make things more complicated: The cteator seems to be gone, so I guess I'm gonna ask @mods whether they're gonna switch to Metaculus criteria, thus going against earlier clarifications by the creator.

Closest to the creator's intent might be to resolve Yes if Metaculus resolves Yes by year's end, otherwise resolve N/A as we don't know how the creator would've ruled then. Which would turn this into "bet Yes for a risk-free chance to win".

Were it up to me, I'd N/A right away.

@Primer I'll go by the description when we get there.

The market will resolve YES if, in retrospect by the end of 2025, this has indeed been achieved (and is agreed upon by industry experts); the market will resolve NO if not.

is enough to work with

@Primer experts are wellcome

bought Ṁ500 NO

Betting for NO because of "Montezuma's revenge" and "unified system" requirements.

I'd be interested in betting on something like this, but I can't accept the resolution: is agreed upon by industry experts. "industry experts" are biased. You should instead define what you mean by AGI in a way that is scientifically testable and resolve it by if any can pass the test.

A weak test (but good example) would be the Turing test. A given AI will or will not pass the Turing test, without regards to what industry experts say.

@BrandonNorman "Turing test" is not very well defined and some versions were beaten by chatgpt already... on multiple occasions, depending on what you consider a "Turing test".

https://humsci.stanford.edu/feature/study-finds-chatgpts-latest-bot-behaves-humans-only-better

https://www.nature.com/articles/d41586-023-02361-7

@ProjectVictory I don't disagree, but a specific version of the Turing test could be well-defined for the purpose of this resolution, or we could use some other test. I just don't like the idea of "industry experts". They will be biased to make claims beyond the capability of the product they are selling.

It's like asking the car salesmen how good the car is.

@BrandonNorman I agree but defining and testing what's "intelligent" proves to be much harder than expected. So far LLMs are doing surprisingly well at testing while struggling in actual deployment. Another possible problem with Turing test is a system can be very capable, but not very convincing at roleplaying as a human.

@block blast While it’s true that many LLMs may contain SAT Math questions in their training datasets, the focus of their training is on understanding patterns and generating coherent responses rather than strictly adhering to any particular educational standard. Excluding certain types of questions for the sake of compliance with Metaculus criteria may not be a priority for developers, especially if those questions contribute to the model's overall performance and versatility in math-related tasks.

this is a very vague resolution criteria

I bet all current LLMs include SAT Math questions somewhere in the training data? And nobody will bother to exclude those questions from their training run just to satisfy some Metaculus criteria

@ahalekelly The author clarified below that they are going to rely on expert consensus not Metaculus strict criteria

If you're 100% sure that we should have, "weak AGI" by 2025, then you should be able to have a clear answer for how many watts that will take and should have no problem betting on this market. If you can't answer this question, then you are basically just guessing and have no certainty that it will occur by 2025 or if at all.

@PatrickDelaney had anyone claimed 100% confidence? The range seems to be 25% to 75%.

@MartinRandall Sorry, I'm unclear what you mean. "X% confidence," sounds like, "X confidence level," a statistical term. Did you mean confidence level? Whereas, "100% sure," is vernacular which could mean, "I as an individual am filling an order at 100% on YES at this time," or, "I am very sure of this particular belief, and hold no numerical measurement of this belief in any dimension," e.g. I just believe it.

@PatrickDelaney It'd be very strange to quantify 100% on a forecasting site and not mean a quantity.

Don't we already have it? Just give ChatGPT access to a real-life robot's API and have it self-reflect.

@xxx As someone who works in NLP and lives with a roboticist, this doesn't work for many reasons. Chief among them is that the network doesn't know what does and doesn't work in a robot, so it will have no understanding of why something failed, and therefore gains from reflection will be minimal. There's also a broader issue of using language models as controllers for continuous high dimensional tasks, where even very slight imprecision leads to wildly incorrect answers. This is in contrast to something like standard language tasks where there are many potential correct answers with a lot of fuzziness around each one.

predictedNO

What does "in retrospect" mean?

Is this guaranteed to resolve at a particular date?

predictedNO

@NoaNabeshima yes, i will resolve this market when it closes on Jan 1, 2026. looking back, if i feel like the resolution criterias are met, i will resolve the market to YES, otherwise i will resolve the market to NO.

@VictorLi Suppose that noone actually tries the Loebner silver turing test and there is some disagreement between industry experts about if it would be passed if tried, but you think it would be passed. Could this resolve yes?

Suppose some industry experts think it would be passed and you think it would be passed, but others aren't sure. Could this resolve yes?

predictedNO

@VictorLi Could this resolve yes if industry experts think it would be passed but it's not been attempted?

@NoaNabeshima if i believe it qualifies as "weak AGI" and a majority of industry experts concur, then i will resolve YES, otherwise it will resolve NO.

granted, "the majority of industry experts" is a vague measure, but i think i will abide with common sense on whether or not there is consensus. fwiw i basically expect it to be weak AGI, but im betting NO because i doubt the industry experts wil recognise it as such.

predictedNO

@VictorLi So if it definitely doesn't solve Montezuma's revenge but expert consensus is that it's "weak agi" does this resolve No or Yes?