Will an open-source system match or exceed Devin's 13.9% score on SWE-bench (unassisted) by EOY 2024?
I will define a system as "open-source" if:
its code (inference code, agent framework, etc) is publicly available under an open-source license
it uses a model which is reasonably available to the general public via an API (e.g. GPT-4, Claude-3 Opus, Gemini 1.5 Pro) OR
Specifically a language model API. I don't know exactly how to define this, but just using Devin via an API would certainly not count. The current OpenAI completions/chat completions API is fine. Anything doing lots of extra inference (for tree search, chain of thought, etc) on the API side is not.
it uses a model with weights available under a license allowing most personal use (e.g. the LLaMA 2 license, which is not strictly open source)
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ332 | |
2 | Ṁ222 | |
3 | Ṁ68 | |
4 | Ṁ50 | |
5 | Ṁ49 |