xAI Grok will beat OpenAI's flagship model on HumanEval benchmarks by the end of 2024.
Basic
93
4.7k
2025
42%
chance

This is inclusive of any new models OpenAI unveils in 2024, but the question resolves to "yes" if Grok beats OpenAI at any time in 2024 against their current state of the art model.

Get Ṁ600 play money
Sort by:
DanboughtṀ150YES

https://x.ai/blog/grok-1.5

They claim to have done it:

bought Ṁ10 YES from 69% to 71%

@DanMan314 67.0% is the HumanEval figure from the original GPT-4 report published more than a year ago. The current zero-shot GPT-4 performance, as reported by Papers With Code, is 76.5%, which is from Guo et al. (January 2024).

Note that the market creator is banned, so this will probably be resolved by moderators. Personally, I think the current version of GPT-4 is the more natural interpretation of "OpenAI's flagship model" than the original version of GPT-4.

DansoldṀ212YES

@Jacy Yea I just looked into it and I agree with your assessment.