Grok 3.5 ‘leaked’ benchmark scores end up real?
30
1kṀ9436
resolved Jul 18
Resolved
NO

This was shared across twitter.

Will it be confirmed real or completely made up? If they announce benchmark results that are all better than 1% less than the current leaked results, this market resolves YES.

It also resolves yes if the benchmark results were obtained with pass@1024 or something like that

  • Update 2025-07-01 (PST) (AI summary of creator comment): This market now refers to Grok 4, which the creator considers a rename of the model previously referred to as Grok 3.5.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ952
2Ṁ45
3Ṁ14
4Ṁ11
5Ṁ11
Sort by:

90.83% in AIME 2024, resolves NO I believe

but yes, surprisingly close

wait its actually kinda close no?

Let me iiiin

It seems fair for the purpose of this market that this market now refers to Grok 4, since it's the same model just renamed? LMK if you disagree ig. After I hear a bit of feedback I may reopen

tbf it's like 99% sure that the benchmark results are made up in the tweet but it's plausible that it gets this good idk

Do the results count as real if they are the result of juicing it with extremely high levels of inference time compute expenditure (like with o3 preview with >$1000 per query) .

@Damin Yeah!

the original resolution criteria was

Will it be confirmed real or completely made up? If they announce benchmark results within 1% of each of these except one which can be within 2%, even if they announce the results were for pass@64 or any other not ‘apples to apples’ comparison like that, the market resolves yes.

But when i updated it i accidentally removed the part that would have answered your question

© Manifold Markets, Inc.TermsPrivacy