Will pre-2026 AI out-forecast the Metaculus community?
49
1kṀ7055
Jan 2
54%
chance

Will an AI system out-perform the Metaculus community prediction before 2026? Any amount of scaffolding is allowed.

If this does not happen, and no negative result comes out in the last quarter of 2025, then this question resolves to my subjective credence that this could be done with an existing AI system and scaffolding. Specifically, my credence on the proposition 'Using 4 months of individual-engineering time, a pre-2026 AI could be fine-tuned and scaffolded to out-perform, on mean brier score, over all binary questions on Metaculus

I will not participate in this market.

Get
Ṁ1,000
to start trading!
Sort by:

@Dulaman why would anyone care about this, it's pretty much a coinflip? It's not like they have information the market doesn't have

@Bayesian I disagree, there's definitely differences in performance between models on these sorts of problems. That difference is not a coinflip.

i mean you can lose from transaction costs i guess and you can do weird things with your risk profile but do you think any of them is beating the market including transaction costs?

@Bayesian probably not. But some models are better than others and that's what's interesting here. This has some value as a benchmark.

bought Ṁ500 YES

@Dulaman I think whether the models are better or worse is probably reducible to chance or whether one of the models is just doing something actively stupid

@bens if this benchmark is done at a large enough scale (number of independent parallel samples) then that increases the statistical power of this approach.

to be clear, I agree that doing this over a single sample run where each one is given 10k dollars means that the statistical power is very low. But the approach remains sound.

@Dulaman
>if this benchmark is done at a large enough scale (number of independent parallel samples) then that increases the statistical power of this approach.

Not really, actually. Let's say every xAI bot's trading strategy is fundamentally "Buy DOGE", and DOGE just goes up up consistently over a couple years, but then drops 99%. Well, for two years, it's gonna look like xAI is better, even if there are a large number of xAI bots and even if the strategy is dumb.

@bens then in that specific setting the benchmark would need to be run for longer. Would you still make that claim if the benchmark is run for 15 years and there is a clear difference between winners and losers?

bought Ṁ100 YES

Getting pretty close! We'll see what happens on the next tournament. (Note that when you adjust for coverage -- and the bot is not selectively forecasting -- it's even closer between Mantic and the community forecast)

I propose adding "Specifically, my credence on the proposition 'Using 4 months of individual-engineering time, a pre-2026 AI could be fine-tuned and scaffolded to out-perform, on mean brier score, over all binary questions on Metaculus".

If no one objects within a week, I'll add this.

bought Ṁ100 YES

"If this does not happen, and no negative result comes out in the last quarter of 2025, then this question resolves to my subjective credence that this could be done with an existing AI system and scaffolding."

Does this include finetuning?

@NoaNabeshima Yes my subjective credence includes limited fine-tuning things like the berkeley group's level of fine-tuning are fine.

bought Ṁ150 NO

I think it can be done in principle it's just not clear it will be done in practice

Unless anyone objects, I'll clarify the constraint that this out-performance should hold on average for at least 50% of the questions on Metaculus in a prospective study. Obviously if this ends up depending on my credence, I'll be taking into account other results e.g. the below.

@JacobPfau wdym by "hold on average for at least 50% of the questions"? if they outperform it'll be an average of theirs vs an average of metaculus, I would think?

@Bayesian You're right that the question phrasing implied all questions, though I didn't specify binary vs time series etc. I'll come back to this tomorrow.

Regretting the "I will not participate in this market" anyway https://arxiv.org/pdf/2402.18563.pdf

@JacobPfau FWIW I'm at 90% on this.

Excellent operationalization!

© Manifold Markets, Inc.TermsPrivacy