How well will OpenAI's o1 (not o1-preview) do on the ARC prize when it's released if tested?

10kṀ16k

resolved Dec 19

Resolved

30.00-33.00

ALL

The creators of the ARC prize already tested OpenAI's new o1-preview and o1-mini models on the prize. The non-preview version of o1 performed substantially better (see below) on OpenAI's math benchmarks and will seemingly be released before EOY. Assuming it's tested on the ARC prize, how well will the full version of o1 perform?

Note 1: I usually don't participate in my own markets, but in this case I am participating since the resolution criteria are especially clear.

Note 2: The ideal case is if the ARC prize tests o1 in the same conditions. If they don't, I'll try to make a fair call on whether unofficial testing matches the conditions closely enough to count. If there's uncertainty, I'll err on the side of resolving N/A.

Update 2024-18-12 (PST): When evaluating ARC prize results, if there are multiple scores based on different thinking time settings, the creator will likely use the High thinking time score for resolution, but will consider community feedback before finalizing this decision. (AI summary of creator comment)

Technology

Technical AI Timelines

OpenAI

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ5,140
2		Ṁ2,423
3		Ṁ435
4		Ṁ390
5		Ṁ226

People are also trading

Will OpenAI launch a model even more expensive than o1-pro in 2025?

28% chance

Will OpenAI o1 (or any direct iteration like o3) get gold on any International Math Olympiad by the end of 2025?

19% chance

When will OpenAI announce o4 (full)

Will OpenAI increase the cost of its ChatGPT o1 Pro Subscription Plan this year?

19% chance

Will OpenAI's o4 get above 50% on humanity's last exam?

Sort by:

So I didn't plan for the option to use more thinking time and now I have a dilemma about which of these to take: https://x.com/arcprize/status/1869551373848908029. I'm leaning towards the "High", but please speak up in the next day or two if you disagree.

bought Ṁ1,000 YES

I think Medium and High are defensible, but I also prefer High

The preview got like 25, IIRC?

@MartinVlach 21% actually, 12% more than 4o

Which set? Public eval or semi private?

@Usaar33

> OpenAI o1-preview and o1-mini both outperform GPT-4o on the ARC-AGI public evaluation dataset. o1-preview is about on par with Anthropic's Claude 3.5 Sonnet in terms of accuracy but takes about 10X longer to achieve similar results to Sonnet.

Public, same as the evaluation I linked.

People are also trading

Will OpenAI launch a model even more expensive than o1-pro in 2025?

28% chance

Will OpenAI o1 (or any direct iteration like o3) get gold on any International Math Olympiad by the end of 2025?

19% chance

When will OpenAI announce o4 (full)

Will OpenAI increase the cost of its ChatGPT o1 Pro Subscription Plan this year?

19% chance

Will OpenAI's o4 get above 50% on humanity's last exam?

16% chance

🏅 Top traders

People are also trading

People are also trading

Related questions