Will Grok2 “Exceed Current AI Models on All Metrics” [Creator's best judgement]

Plus

Ṁ13k

resolved Nov 22

Resolved

N/A

ALL

https://x.com/elonmusk/status/1773655245769330757?s=46

Resolves to my best judgment (mostly based on evals), after release. I’ll generally stick to consensus. Please note that elon claims “all metrics”

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

14 Comments

51 Holders

146 Trades

Sort by:

Creator is inactive, market is subjective, resolving to N/A

bought Ṁ90 YES

@mods "creator's best judgment" but the creator has left Manifold. N/A?

sold Ṁ6 NO

This question is extremely vague. What does it resolve on: comparison with models at the time of the tweet, or at the time of grok-2’s release?

@mods I’d like to request this market gets unranked since it’s too vague and easily misleads at least 1/2 of the people trading in it.

I am not sure if this was added recently, but now the market description reads "after release". I do however agree, this market is extremely vague, and it's a lesson for me not to trade in markets that don't take the time to spell out these things in detail.

I don't think this meets the criteria for unranking. And though it is ambiguous, I think markets like this are generally fine, as it's as easy to set up criteria that don't fit your intent as it is to be biased when resolving

bought Ṁ50 YES

based on this, it was an accurate claim at the time. grok 2 is better than all models available at the time of the tweet. https://x.ai/blog/grok-2

bought Ṁ50 NO

Can you not read? Your link shows a table where it's behind 3.5 Sonnet on 6 reported benchmarks

you don't have to be rude, and yes, I can read. 3.5 sonnet was not released at the time of that tweet, as it was released in june, and the tweet is from march.

bought Ṁ400 YES

I thought this market was current, as in now, not when originally tweeted. In that case, it should resolve as yes

bought Ṁ50 NO

@0xSMW @Phill the resolution criteria say "after release"; the title says "will"; and even Elon Musk's tweet says "should exceed" (i.e., future tense). I think it's clear that both Grok and the comparison models are being referred to in the future. A claim that, "When our model comes out in several months in this extremely fast-paced race, it will be better than what is out there today," would be quite uninteresting.

Obviously I am biased here, but I think this market has always been trading on beating Claude 3 Opus, rather than any models released in the future.

Musk does 10 tweets per day, and made another claim (about imminent release) in the same tweet. Whether it's an interesting statement or not seems irrelevant.

I would imagine they didn't expext to wait so long to release these results.

Using "current" and nothing else as wording about which models it beats would be extremely curious choice of words if he meant future models.

bought Ṁ10 NO

Same as this market by Zvi: https://manifold.markets/ZviMowshowitz/will-grok-2-exceed-current-march-23?r=c2FueWVybw

@sanyero alas, i thought i was quick

Related questions

Related questions