I will use my subjective judgement for resolving whether it is as good as GPT-4, but benchmark results will play a part in shaping that judgement. The rest will be qualitative measurement.
Whether something is "open source" is defined liberally here and also will be determined by my subjective judgement, but generally I will deem something open source if (a) anyone can access it and (b) it wasn't the result of an unintentional leak/exfiltration, regardless of the precisions of the license.
I will not personally be trading on this market because it relies on my subjective judgement.
@firstuserhere does it need to be as good as the best GPT-4, or just as good as any of the GPT-4 models?
@Seeker Yes, i know LLaMA isn't truly open source but it would've qualified for the purposes of this market.
I think an Elo ranking (https://arena.lmsys.org/) could be used to determine the winner objectively. Interestingly, Mistral-Medium is on par with GPT 3.5 in terms of elo :O
'Mistral-Medium outperforms GPT-4 in Winogrande benchmark lmao'
@Dom95cc The Mistral/Mixtral models only seem good in very particular ways. Medium only scores 75 on the MMLU.
@TobiasH the benchmark results for GPT-4 from its report at the time of release, and qualitative baseline of today's GPT-4