Will Meta release an open source language model that outperforms GPT-4 by the end of 2024
➕
Plus
22
Ṁ1035
resolved Jan 3
Resolved
YES

Will resolve to YES if Meta releases an open source model that acheives a higher average score than GPT-4 on the following benchmarks by the end of 2024:

HellaSwag (few-shot): 0.953

MMLU (few-shot): 0.864

AI2 Reasoning Challenge (ARC): 0.963

  • Update 2025-03-01 (PST) (AI summary of creator comment): - Llama 3.1 405B achieves the following benchmark scores:

    • MMLU (zero-shot CoT): 0.886

    • ARC (zero-shot): 0.969

    • Hellaswag score is not reported due to potential contamination and is not considered in the resolution criteria.

    • The model is deemed open-source, and based on its performance and higher Elo on LMSYS, the market is resolved to YES.

Get
Ṁ1,000
and
S3.00
Sort by:

According to the Llama 3.1 release [1], Llama 3.1 405B achieves the following scores on two benchmarks in the original question:

MMLU (zero-shot CoT): 0.886

ARC (zero-shot): 0.969

However, they do not report a score on Hellaswag, and there does not seem to be reliable third-party reports of Hellaswag for Llama 3.1 405B Instruct elsewhere, likely due to contamination as noted in the Llama 3.1 paper [2]. Based on improved performance on the above benchmarks, as well as higher elo on LMSYS. I am inclined to say that Llama 3.1 405B does indeed "outperform" GPT-4.

Whether or not Llama 3.1 405B is truly "open-source" is a debated topic, however I am considering it to be open source.

For this reason I am resolving the market to YES.



[1] https://ai.meta.com/blog/meta-llama-3-1/

[2] https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

405B is not open source, just open weights, afaict?

It depends on who you ask! Many authoritative sources saying it is and isn't.

predictedYES

Related news: Meta currently training Llama 3, and plans to ramp up to almost 600K H100s equivalent compute by the end of the year https://www.instagram.com/reel/C2QARHJR1sZ/.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules