https://x.com/elonmusk/status/1773655245769330757?s=46
Resolves to my best judgment (mostly based on evals), after release. I’ll generally stick to consensus. Please note that elon claims “all metrics”
@mods I’d like to request this market gets unranked since it’s too vague and easily misleads at least 1/2 of the people trading in it.
based on this, it was an accurate claim at the time. grok 2 is better than all models available at the time of the tweet. https://x.ai/blog/grok-2
@0xSMW @Phill the resolution criteria say "after release"; the title says "will"; and even Elon Musk's tweet says "should exceed" (i.e., future tense). I think it's clear that both Grok and the comparison models are being referred to in the future. A claim that, "When our model comes out in several months in this extremely fast-paced race, it will be better than what is out there today," would be quite uninteresting.
Obviously I am biased here, but I think this market has always been trading on beating Claude 3 Opus, rather than any models released in the future.
Musk does 10 tweets per day, and made another claim (about imminent release) in the same tweet. Whether it's an interesting statement or not seems irrelevant.
I would imagine they didn't expext to wait so long to release these results.
Using "current" and nothing else as wording about which models it beats would be extremely curious choice of words if he meant future models.
Same as this market by Zvi: https://manifold.markets/ZviMowshowitz/will-grok-2-exceed-current-march-23?r=c2FueWVybw