The model has a Elo greater than 1190 on ChatbotArena (or if ChatbotArena is no longer available/updating, achieves GPT 4 (03.14) equivalent or greater performance on both MMLU and MT-Bench)
When running inference in a geographically distributed fashion (the computational hardware is not colocated, and is networked over typical consumer equipment)
on heterogeneous hardware (the computational hardware is varied in type, e.g. different GPU models)
without the act of distributed inference causing the model to require 2 OOM more energy usage (e.g. if doing so Is incredibly lossy and inefficient, it does not count. The burden of proof lies on anyone claiming this clause should be activated)
Note: if (or, when) an edge case is presented, it's applicability to this question will be evaluated in mine + Robert's understanding of the spirit of the question.
I'd say rough draft could be (with GPT-4 performance as a baseline):
The model has a Elo greater than 1190 on ChatbotArena (or if ChatbotArena is no longer available/updating, achieves GPT 4 (03.14) equivalent or greater performance on both MMLU and MT-Bench)
when running inference in a geographically distributed fashion (the computational hardware is not colocated, and is networked over typical consumer equipment)
on heterogeneous hardware (the computational hardware is varied in type, e.g. different GPU models).
without the act of distributed inference causing the model to require 2 OOM more energy usage (e.g. if doing so Is incredibly lossy and inefficient, it does not count. The burden of proof lies on anyone claiming this clause should be activated).
Note: if (or, when) an edge case is presented, it's applicability to this question will be evaluated in my understanding of the spirit of the question (is it possible to run inference on a SOTA LLM using a bunch of different peoples computers?).
Note: @firstuserhere my knowledge of LLM's is definitely in the midwit realm - if any of the above clauses (benchmarks?) seem bad feel free to improve. I do suggest we have this question evaluate GPT-4 or equivalent models though as from my read through of the paper, they already have LLAMA 70b running in a distributed fashion which is pretty close to GPT-3.5.