Will Meta release an open source language model that outperforms GPT-4 by the end of 2024

1kṀ1035

resolved Jan 3

Resolved

YES

ALL

Will resolve to YES if Meta releases an open source model that acheives a higher average score than GPT-4 on the following benchmarks by the end of 2024:

HellaSwag (few-shot): 0.953

MMLU (few-shot): 0.864

AI2 Reasoning Challenge (ARC): 0.963

Update 2025-03-01 (PST) (AI summary of creator comment): - Llama 3.1 405B achieves the following benchmark scores:
- MMLU (zero-shot CoT): 0.886
- ARC (zero-shot): 0.969
- Hellaswag score is not reported due to potential contamination and is not considered in the resolution criteria.
- The model is deemed open-source, and based on its performance and higher Elo on LMSYS, the market is resolved to YES.

Technology

Science

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ136
2		Ṁ78
3		Ṁ66
4		Ṁ45
5		Ṁ30

People are also trading

Will we have an open-source model that is equivalent GPT-4 by end of 2025?

96% chance

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

17% chance

Will OpenAI announce a model with a name containing the string "GPT-4b" in 2025

3% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

16% chance

Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?

Sort by:

According to the Llama 3.1 release [1], Llama 3.1 405B achieves the following scores on two benchmarks in the original question:

MMLU (zero-shot CoT): 0.886

ARC (zero-shot): 0.969

However, they do not report a score on Hellaswag, and there does not seem to be reliable third-party reports of Hellaswag for Llama 3.1 405B Instruct elsewhere, likely due to contamination as noted in the Llama 3.1 paper [2]. Based on improved performance on the above benchmarks, as well as higher elo on LMSYS. I am inclined to say that Llama 3.1 405B does indeed "outperform" GPT-4.

Whether or not Llama 3.1 405B is truly "open-source" is a debated topic, however I am considering it to be open source.

For this reason I am resolving the market to YES.

[1] https://ai.meta.com/blog/meta-llama-3-1/

[2] https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

405B is not open source, just open weights, afaict?

It depends on who you ask! Many authoritative sources saying it is and isn't.

predictedYES

Related news: Meta currently training Llama 3, and plans to ramp up to almost 600K H100s equivalent compute by the end of the year https://www.instagram.com/reel/C2QARHJR1sZ/.

People are also trading

Will we have an open-source model that is equivalent GPT-4 by end of 2025?

96% chance

Will there be an AI language model that strongly surpasses ChatGPT and other OpenAI models before the end of 2025?

17% chance

Will OpenAI announce a model with a name containing the string "GPT-4b" in 2025

3% chance

Will OpenAI announce a new model that EpochAI estimates is at least as large as GPT-4.5, in 2025?

16% chance

Will a language model that runs locally on a consumer cellphone beat GPT4 by EOY 2026?

79% chance

🏅 Top traders

People are also trading

People are also trading

Related questions