Will Meta release any same-size LLaMa that performs better at MMLU before October 14th 2024?
Standard
50
Ṁ58k
Oct 16
4%
chance

Will any model that performs better than the equivalent size (± 10% in parameter count) LLaMa 3.1 model be officially released by Meta, where "performs better" means "at least 0.5% more accurate at MMLU"? Base model only.

For example, LLaMa 3.1 70B's MMLU score is 83.6% (an improvement over LLaMa 3.0 70B's 79.5% MMLU). LLaMa 3.2 70B would need to perform at 84.1% MMLU to resolve this market YES. Note that any model in the family (8B, 70B, 405B) performing at least 0.5% better is enough to resolve this market.

Multimodal models eligible but only text MMLU performance will be evaluated. Models that were fine-tuned, DPO'd, RLHF'd, or CPT'd on synthetic data will not resolve this market.

For reference, LLaMa 3.0 70B's MMLU score was 79.5, GPT-4o's score is 88.7, and LLaMa 3.1 405B base's score is 85.2. (LLaMa 3.1 405B Instruct score's is 88.7)

Get
Ṁ1,000
and
S1.00
Sort by:

new related market:

As far as I can tell none of the 3.2 models that just got released are in the same size class as any of the 3.1 models so none of them would count for this.

@Fay42 I count a 90B model with 20B of multimodal adapters and 70B of language model as a 70B language model, since it's possible to isolate and only run the language part. Unclear to me if the language part has been updated though

They seem to have identical MMLU performance regardless.

@Fay42 Yup, comparing the MMLU CoT scores for 3.2 on https://www.llama.com/ and on the 3.1 model card, they do look to be exactly identical.

@yetforever Yup, they aren't new LLMs, they're vision adapters for existing LLMs.

opened a Ṁ1,000 NO at 75% order

Do distilled models count?

A distilled model is counted as its actual parameter count, and belongs to the highest weight class (8B, 70B, 405B) it is less than or equal to.

"Actual parameter count" what does this mean? I'm specifically asking about what happens if they distill llama 70b to a 8b size model that's better than the existing 8b size model.

If they distill a 70B model into an 8B size model, it counts as an 8B size model

reposted

Please add liquidity to this market! This is an important question that I care about. I've already added M3000 myself

I bought M50 when the market was at 99% and I'm slowly exiting my position. Please trade accordingly

Comment hidden