Will the ARC-AGI grand prize be claimed by an LLM?
16
1.5k
2031
65%
chance

See this page for information about the competition: https://lab42.global/arcathon/. See also this podcast for an interview with Francois Chollet about the challenge and his predictions: https://www.dwarkeshpatel.com/p/francois-chollet

The fundamental characteristics of an "LLM" for the purposes of this question:

  • Sequence-to-sequence type model. (State-space and transformer models would both count, for example.)

  • No substantial post-hoc computation (like tree search). Sampling as it is practiced now is allowed. Prompting as it is practiced now is allowed.

  • I will use my best judgement if it’s ambiguous. The main point is that the model should be in the class of models that LLM-naysayers (Chollet especially) refer to when they assert that LLMs cannot solve ARC narrowly and are off-pathway for AGI generally.

See also:

Get Ṁ600 play money
Sort by:

@Tossup Will this resolve YES if the LLM-system is not used by a tree search algorithm from the outside (i.e. Tree of Thoughts), but something like tree search was still used in its training/fine-tuning regime, as some people speculated about e.g. Q*? I.e. the result is still an LLM that gets inferred in the regular way as current LLMs do, but the training/fine-tuning might be a bit/lot more advanced.

Said another way: if training is advanced and inference is simple for a system that wins the prize, will this still resolve YES?

Yes, I'm okay with novel "advanced" training techniques. Only the inference needs to be "standard" for LLMs. I think it would be too hard to determine if a training technique is too "advanced" given how little is public about frontier LLM training.

bought Ṁ50 NO

How does this resolve if no one gets the grand prize?

This resolves when the grand prize is awarded or the competition is shut down. To be clear, the market does not necessarily resolve NO if the grand prize is unclaimed in the 2024 round of the competition.

By "single forward pass," do you just mean it can't do any chain-of-thought before beginning its answer? I would expect that there would be one forward pass per pixel produced in the LLM's response.

Good point. I want to express something like “it’s a feed forward network or can be unrolled into a feed forward network (as for some SSMs)”, but I can’t think of a precise statement. I will remove this criterion.

sold Ṁ94 NO

So it can use chain-of-thought?

Yes