Will Meta release any same-size LLaMa that performs better at MMLU before October 14th 2024?
Basic
38
22k
Oct 16
70%
chance

Will any model that performs better than the equivalent size (± 10% in parameter count) LLaMa 3.1 model be officially released by Meta, where "performs better" means "at least 0.5% more accurate at MMLU"? Base model only.

For example, LLaMa 3.1 70B's MMLU score is 83.6% (an improvement over LLaMa 3.0 70B's 79.5% MMLU). LLaMa 3.2 70B would need to perform at 84.1% MMLU to resolve this market YES. Note that any model in the family (8B, 70B, 405B) performing at least 0.5% better is enough to resolve this market.

Multimodal models eligible but only text MMLU performance will be evaluated. Models that were fine-tuned, DPO'd, RLHF'd, or CPT'd on synthetic data will not resolve this market.

For reference, LLaMa 3.0 70B's MMLU score was 79.5, GPT-4o's score is 88.7, and LLaMa 3.1 405B base's score is 85.2. (LLaMa 3.1 405B Instruct score's is 88.7)

Get Ṁ1,000 play money
Sort by:
opened a Ṁ1,000 NO at 75% order

Do distilled models count?

A distilled model is counted as its actual parameter count, and belongs to the highest weight class (8B, 70B, 405B) it is less than or equal to.

"Actual parameter count" what does this mean? I'm specifically asking about what happens if they distill llama 70b to a 8b size model that's better than the existing 8b size model.

If they distill a 70B model into an 8B size model, it counts as an 8B size model

reposted

Please add liquidity to this market! This is an important question that I care about. I've already added M3000 myself

I bought M50 when the market was at 99% and I'm slowly exiting my position. Please trade accordingly

Comment hidden