Will the ARC-AGI grand prize be claimed by end of 2025?
💎
Premium
231
Ṁ520k
2026
67%
chance

https://arcprize.org/competition
>=85% performance on Chollet's abstraction and reasoning corpus, private set.
(If Chollet et al. change the requirements for the grand prize in 2025, this question will not change. The bar will remain >=85% performance)

2024 version https://manifold.markets/JacobPfau/will-the-arcagi-grand-prize-be-clai

  • Update 2024-20-12 (PST) (AI summary of creator comment): - This market uses the grand prize rules from ARC-AGI, not the public prize rules

    • The 87.5% score mentioned in comments was for the semi-private dataset, which does not satisfy the grand prize criteria requiring performance on the private dataset

  • Update 2024-20-12 (PST): - Market will be resolved based on the original 2024 ARC-AGI test set ("ARC-AGI-1"), not the updated ARC-AGI-2 dataset (AI summary of creator comment)

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ50 NO

I'm confused by the difference between this market and this one:

https://manifold.markets/yaqubali/will-the-arc-agi-grand-prize-be-cla-khaivmwh6j

@mathwizurd Idk but i bought it up

sold Ṁ150 NO

@Bayesian Hahah I arbed both markets (albeit way lower volume than you)

@mathwizurd For the market you linked, it is clear from both title and description that the grand price needs to be claimed for it to resolve YES. To claim the price, >=85% on the updated private test set need to be achieved.
The same should obviously be true for this market here, but the creator now decided that they don't care about the price, but rather about the old/original test set.

This should be at 90%+

bought Ṁ1 YES at 65%

The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced. Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.

https://x.com/fchollet/status/1870171031945785821

@JacobPfau Will you resolve this market based on results on the original test set, "ARC-AGI-1"? (That seems to follow from your description.) I guess they'll still report that too.

@na_pewno Yes we're sticking with the 2024 rules and data.

bought Ṁ900 NO

@JacobPfau This market is explicitly named 'Will the ARC-AGI grand price be claimed by end of 2025?'. To win the price one needs to reach 85% on the updated private set.

@CalibratedNeutral

(If Chollet et al. change the requirements for the grand prize in 2025, this question will not change. The bar will remain >=85% performance)

This indicates it only needs 85%+ on the previous dataset I'm pretty sure

@Bayesian The relevant achievement is getting >=85% on the private benchmark. If Chollet et al. had changed the requirements to requiring only 60% on the private benchmark, because the original target turned out to be too hard and they wanted to allocate the price anyway, then this market should have still been resolved YES once 85% are hit due to the sentence you quoted.

The private benchmark is supposed to be private. Information leaks from it every time the creators report the score a model achieved on it. It makes sense to continually update it as long as the newly added tasks are true to the spirit of ARC. The private benchmark stays the private benchmark.

@CalibratedNeutral Oh, my understanding was that the new questions added are meant to be different from the previous ones, to be questions that even o3 doesn't succeed at, but humans still succeed at. If that's wrong and they're along the same distribution but trying to deal with data leaks it's less of an important difference ig. but yeah we agree, if he dropped requirement to 60% that wouldn't matter for this market

opened a Ṁ20,000 YES at 53% order

To clarify, for this market to resolve yes the 85% needs to go through the limited compute Kaggle setup right as well as being open source right? @JacobPfau https://x.com/fchollet/status/1870170897283526723

bought Ṁ462 YES

87.5% therefore this question resolves YES

doesn't the model have to be opensource and tiny to win that prize or smth?

@MalachiteEagle Chollet wrote "It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task)" - what is a semi-private dataset and why does it satisfy resolution criteria (">=85% performance on Chollet's abstraction and reasoning corpus, private set")?

sold Ṁ321 YES

@Bayesian ahh right open source / open weights criteria. Might be a good idea to mention that in the description

@MalachiteEagle Good point, I've screenshotted and included.

@Metastable I'm guessing the semi-private dataset is called that because they're supposedly only sharing it with OpenAI through the API for o3 or something

bought Ṁ500 NO

@MalachiteEagle the competition is on the private set, not the semi private set. Semi-private set is used for models that work through API to prevent leakage of the private questions. In addition, the competition has compute requirements which are likely way exceeded by o3.

@na_pewno That’s the public prize not the grand prize btw. We’re using grand prize rules

@MalachiteEagle Wow, I hope they made it clear at least in fine print that they might switch to a harder evaluation set; otherwise this feels really unfair to the people who have put a lot of work into solutions.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules