This will be based on whatever Meta calls Llama-4, whether or not it deserves that name, or if it renames its next larger LLM to not include 'llama' I will use best judgment on whether it counts. If Meta does not release a relevant model by EOY 2025 this resolves to NO. If the model is not open sourced, it does not count.
By default will judge based on the leaderboard here: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
Clarification: This will compare to GPT-4 versions that existed at market creation. At this point, this is 99% a market on whether Llama-4 will exist and be an open model, I would be super surprised if it wasn't good enough on Arena.
Once it has been on the leaderboard for 7 days if it is close to allow ratings to settle, or if the resolution is obvious in either direction for any reason, I will resolve. If I feel the leaderboard is clearly wrong or it is not available at the time and the answer is non-obvious, I will consult experts and/or use a Twitter poll.
Currently, there are multiple GTP4s being ranked with elo in the arena, which are we comparing to Llama 4? :
• ChatGPT-4o-latest (2024-09-03)
• GPT-4o-2024-05-13
• GPT-4o-mini-2024-07-18
[1] Are future GPT-4 models included in the comparison or just one of the existing ones being ranked?
[2] Will you compare highest GPT-4 elo against the highest Llama elo, or lowest against lowest, or lowest GPT-4 against highest Llama 4?
[2] Please specify, and, is there a tie-breaker in the rare case the models were tied in elo?
Thank you, and please add these instructions to the market to clear any confusion.
@nixtoshi I have made this very clear now.
(And not that it is going to happen, but 'as good' means it only has to tie the Elo number)
Which GPT-4? The GPT-4 that's serving now is miles ahead on all of the benchmarks compared to what was originally released.
@AdamTreat https://github.com/facebookresearch/llama/blob/main/LICENSE
"2. Additional Commercial Terms. If, on the Llama 2 version release date, the
monthly active users of the products or services made available by or for Licensee,
or Licensee's affiliates, is greater than 700 million monthly active users in the
preceding calendar month, you must request a license from Meta, which Meta may
grant to you in its sole discretion, and you are not authorized to exercise any of the
rights under this Agreement unless or until Meta otherwise expressly grants you
such rights."
@ZviMowshowitz ok, then the question is do you have (or have any expectation of having) greater than 700 million monthly active users ;)