Will Meta release an open source language model that outperforms GPT-4 by the end of 2024
Will Meta release an open source language model that outperforms GPT-4 by the end of 2024
22
1kṀ1035
resolved Jan 3
Resolved
YES

Will resolve to YES if Meta releases an open source model that acheives a higher average score than GPT-4 on the following benchmarks by the end of 2024:

HellaSwag (few-shot): 0.953

MMLU (few-shot): 0.864

AI2 Reasoning Challenge (ARC): 0.963

  • Update 2025-03-01 (PST) (AI summary of creator comment): - Llama 3.1 405B achieves the following benchmark scores:

    • MMLU (zero-shot CoT): 0.886

    • ARC (zero-shot): 0.969

    • Hellaswag score is not reported due to potential contamination and is not considered in the resolution criteria.

    • The model is deemed open-source, and based on its performance and higher Elo on LMSYS, the market is resolved to YES.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ136
2Ṁ78
3Ṁ66
4Ṁ45
5Ṁ30


Sort by:
2mo

According to the Llama 3.1 release [1], Llama 3.1 405B achieves the following scores on two benchmarks in the original question:

MMLU (zero-shot CoT): 0.886

ARC (zero-shot): 0.969

However, they do not report a score on Hellaswag, and there does not seem to be reliable third-party reports of Hellaswag for Llama 3.1 405B Instruct elsewhere, likely due to contamination as noted in the Llama 3.1 paper [2]. Based on improved performance on the above benchmarks, as well as higher elo on LMSYS. I am inclined to say that Llama 3.1 405B does indeed "outperform" GPT-4.

Whether or not Llama 3.1 405B is truly "open-source" is a debated topic, however I am considering it to be open source.

For this reason I am resolving the market to YES.



[1] https://ai.meta.com/blog/meta-llama-3-1/

[2] https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

8mo

405B is not open source, just open weights, afaict?

8mo

It depends on who you ask! Many authoritative sources saying it is and isn't.

predictedYES 1y

Related news: Meta currently training Llama 3, and plans to ramp up to almost 600K H100s equivalent compute by the end of the year https://www.instagram.com/reel/C2QARHJR1sZ/.

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules