AI keeps getting capabilities that I previously would've thought would require AGI. Solving FrontierMath problems, explaining jokes, generating art and poetry. And this isn't new. We once presumed that playing grandmaster-level chess would require AGI. I'd like to have a market for predicting when we'll reach truly human-level intelligence -- mentally, not physically -- that won't accidentally resolve YES on a technicality.
But I also want to allow for the perfectly real possibility that people like Eric Drexler are right and that we get AGI without the world necessarily turning upside-down.
I think the right balance, what best captures the spirit of the question, is to focus on Leopold Aschenbrenner's concept of drop-in remote workers.
However, it's possible that that definition is also flawed. So, DO NOT TRADE IN THIS MARKET YET. Ask more clarifying questions until this FAQ is fleshed out enough that you're satisfied that we're predicting the right thing.
FAQ
1. Will this resolve the same as other AGI markets on Manifold?
It probably will but the point of this market is to avoid getting backed into a corner on a definition of AGI that violates the spirit of the question.
2. Does passing a long, informed, adversarial Turing Test count?
This is a necessary but not sufficient condition. Matthew Barnett on Metaculus has a lengthy definition of this kind of Turning test. I believe it's even stricter than the Longbets version. Roughly the idea is to have human foils who are PhD-level experts in at least one STEM field, judges who are PhD-level experts in AI, with each interview at least 2 hours and with everyone trying their hardest to act human and distinguish humans from AIs.
One way I'd reject the outcome of such a Turing test is if the judges were able to identify the AI because it was too smart or too fast. Or any other tell that was unrelated to actual intelligence/capability.
One could also imagine an AI that passes such a test but can't stay coherent for more than a couple hours and is thus not AGI according to the spirit of the question. Stay tuned for additional FAQ items about that.
3. Does the AGI have to be publicly available?
No but we won't just believe Sam Altman or whoever and in fact we'll err on the side of disbelieving. (This needs to be pinned down further.)
4. What if running the AGI is obscenely slow and expensive?
The focus is on human-level AI so if it's more expensive than hiring a human at market rates and slower than a human, that doesn't count.
5. What if it's cheaper but slower than a human, or more expensive but faster?
I'm not sure the best answer to this yet.
6. If we get AGI in, say, 2027, does 2028 also resolve YES?
No, we're predicting the exact year. This is a probability density function (pdf), not a cumulative distribution function (cdf).
Ask clarifying questions before betting! I'll add them to the FAQ.
Note that I'm betting in this market myself and am committing to make the resolution fair. Meaning I'll be transparent about my reasoning and if there are good faith objections, I'll hear them out, we'll discuss, and I'll outsource the final decision if needed. But, again, note the evolving FAQ. The expectation is that bettors ask clarifying questions before betting, to minimize the chances of it coming down to a judgment call.
Related Markets
https://manifold.markets/ManifoldAI/agi-when-resolves-to-the-year-in-wh-d5c5ad8e4708
https://manifold.markets/dreev/will-ai-be-lifechanging-for-muggles
https://manifold.markets/dreev/will-ai-have-a-sudden-trillion-doll
https://manifold.markets/dreev/instant-deepfakes-of-anyone-within
https://manifold.markets/dreev/will-ai-pass-the-turing-test-by-202
https://manifold.markets/ScottAlexander/by-2028-will-there-be-a-visible-bre
https://manifold.markets/MetaculusBot/before-2030-will-an-ai-complete-the
Scratch area for auto-generated AI updates
Don't believe what magically appears down here; I'll add clarifications to the FAQ.
Update 2025-03-13 (PST) (AI summary of creator comment): - Unpackaged AI Systems: An AI that in principle has AGI-level capabilities but isn’t presented in a form that meets our finalized criteria will not be counted.
Explicit Criteria Requirement: The market will only resolve YES once the AI clearly meets the explicit criteria we agree upon.
Has such a test been performed and “trained” for? I’m pretty sure if someone fine tuned existing SOTA models for this task it could be done today, under the right conditions. Though potentially not meeting the cost or timing requirements.
I’d be curious to know if there’s been any attempts at this. If there aren’t any attempts made in a given year that would qualify for the purposes of this market, that doesn’t necessarily mean we couldn’t have met the criteria. I think it’s still valid criteria to have, but perhaps it’s less informative as a AGI indicator as a result
@MarcusM Sounds like the question is, what if we get AI that could in principle count as AGI but isn't packaged into something that meets the criteria we come up with? In that case I'm inclined to say it doesn't count yet. We wait till it meets our explicit criteria (which we need to continue to pin down!).
@VitorBosshard Exact. This is a probability density function (pdf), not a cumulative distribution function (cdf). Adding that to the FAQ now; thank you!
Note this bit from the market description (I edited it just now as well, to be clearer about outsourcing the final decision if there's no way to avoid it coming down to a judgment call):
Note that I'm betting in this market myself and am committing to make the resolution fair. Meaning I'll be transparent about my reasoning and if there are good faith objections, I'll hear them out, we'll discuss, and I'll outsource the final decision if needed. But, again, note the evolving FAQ. The expectation is that bettors ask clarifying questions before betting, to minimize the chances of it coming down to a judgment call.
You can also take a look at my other markets (or perhaps my AGI Friday newsletter) to decide how much you trust me on this. But I'll definitely be trading.
I thought Claude 3.7 identified a couple useful properties:
# AI as "Drop-in Remote Workers" Resolution Criteria
Here's a concise set of capability-based resolution criteria for determining when an AI system qualifies as a "drop-in remote worker":
1. Independent Task Completion: The AI can take a job description with minimal additional context and complete tasks autonomously without human intervention beyond initial instruction and final review.
2. Communication Competence: The AI can participate in meetings, ask clarifying questions, provide progress updates, and collaborate with team members using standard communication channels.
3. Context Adaptation: The AI can understand and adapt to company-specific terminology, processes, and culture based on limited documentation (employee handbook, style guides, etc.).
4. Self-Improvement: The AI independently identifies knowledge gaps and takes initiative to acquire necessary information to complete tasks better over time.
5. Practical Tool Use: The AI can effectively use standard workplace tools (email, chat, productivity software, internal systems) with only standard authorization processes.
6. Reliability Threshold: The AI completes assigned tasks with at least 90% success rate without requiring more supervision than an average human worker in the same role.
Resolution occurs when at least one AI system can consistently satisfy all these criteria across at least three common knowledge-worker domains (e.g., content creation, customer support, data analysis).
I think the "minimal additional context besides a job description" is a bit strong, but maybe that should be interpreted as that the AI can ask for the context it needs, as per #4
@Siebe Nice. Can we avoid getting into the weeds that much with a criterion along the lines of "do bosses hiring for remote positions generally prefer AI to humans?" or "does it generally only make business sense to hire humans because you need their physical skills or for reasons other than actual job performance, such as legal or PR reasons?"?
PS, either way, we may want to pick a threshold for the fraction of the remote workforce that should apply to. Maybe it first becomes true for the cheapest, least skilled human workers. And maybe we want to say we're hitting the AGI threshold when it's true for a median first-world college-educated native speaker?