Open-source(ish) Devin equivalent by EOY 2024?
Dec 31

Will an open-source system match or exceed Devin's 13.9% score on SWE-bench (unassisted) by EOY 2024?

I will define a system as "open-source" if:

  • its code (inference code, agent framework, etc) is publicly available under an open-source license

  • it uses a model which is reasonably available to the general public via an API (e.g. GPT-4, Claude-3 Opus, Gemini 1.5 Pro) OR

    • Specifically a language model API. I don't know exactly how to define this, but just using Devin via an API would certainly not count. The current OpenAI completions/chat completions API is fine. Anything doing lots of extra inference (for tree search, chain of thought, etc) on the API side is not.

  • it uses a model with weights available under a license allowing most personal use (e.g. the LLaMA 2 license, which is not strictly open source)

bought Ṁ100 YES

Beating Devin is such a low goal. Princeton has already got 12.29 vs 13.84

@Sss19971997 Wow. I thought that would be harder.

