Will Grok 3 be 'the most powerful AI in the world'?
Basic
226
40k
2025
19%
chance

Elon Musk is talking big: https://x.com/tsarnick/status/1815493761486708993. Says that Grok 3 will come out in December and 'should be' the most powerful AI in the world.

Resolves to YES if Grok 3 is, at the time of its release, plausibly the most powerful AI in the world according to my best judgment. Has to be at least as strong as all models publicly available at the time.

Resolves to NO if it is not the most powerful.

(Resolves NO if no such model is released by 7/23/25, to ensure this doesn't go on forever.)

As of 7/23/2024 Claude Sonnet 3.5 is IMO most powerful AI, but GPT-4o would also resolve to YES based on its position at #1 on Arena and other ways in which some people prefer it. Gemini 1.5 Pro or Advanced would not qualify, but would have counted prior to Sonnet 3.5 and GPT-4o.

(I will not take clarifying questions on my criteria here, it will be my subjective take on 'is this plausibly the best LLM I can access right now.')

Get Ṁ1,000 play money
Sort by:
bought Ṁ50 NO

I tend to distrust Elon Musk's predictions, by default. His ever-changing timeline for self-driving cars betrays both a lack of rigor with predictions and bad epistemology by not updating it.

Take a look at this LLM benchmark:

https://livebench.ai/

A way better/fairer ranking than lmsys imo

bought Ṁ50 YES

Seeing how good Grok 2 is makes me think it will at be on par with 3.5 Opus and whichever models OpenAI and GDM release before the end of the year.

Llama 3.1 was trained on 15k GPUs, so Grok 3 should have unprecedented scale

I recommend substituting your best judgement with ELO score on chatbot arena.

https://youtu.be/Kbk9BiPhm7o?t=2086
even Elon seems skeptical about grok 3 being the best in the world by EOY

bought Ṁ100 YES

I don't think it will beat GPT-5, but Grok 1.5 was reasonably close to GPT-4 level. So if OpenAI doesn't release GPT-5 before Grok 3, I could see it being similarly powerful as whatever else gets released by then (Claude 3.5 Opus? Gemini 1.5 Ultra?)

Leading Llm Arena or leasing in 8/10 commonly used benchmarks would be a less judgment based criterium

"according to my best judgment"
This is a conflict of interest

"IMO"
This is anecdotal

This is a prediction of what works best for you, based on how you feel.

Note that Zvi has a full time (I think?) job covering AI, is trusted by many, and subject to a lot of scrutiny.