Criteria:
code must be written completely by AI with no human interventions
must be generated according to a free-form human-provided specification which permits arbitrary problem domain (i.e. not confined to a pre-selected domain)
code must confirm to the specification and meet quality standards of an expert human senior developer proficient in a given language and problem domain
at least 3000 non-trivial lines of code
programming languages like Java, Kotlin, C++, C# etc. Exclude "dynamic", "scripting" languages like Python, JavaScript, TypeScript, etc.
Why do we exclude language like JS? There's a huge amount of publicly accessible JS examples so it is harder to assess originality.
What would the definition of "trivial" be? Some people would consider writing a binary tree a non-trivial task, while for others a non-trivial codebase is something like the Linux kernel. How would you decide if the generated codebase is trivial?
@AlexMizrahi What does the success rate need to be? In other words - how reliably does it need to have this ability?
@YoavTzfati Let's say 70%. Then we can also clarify that "arbitrary problem domain" should exclude ones which require a lot of specialist knowledge.
@NLeseul Added a clarification that it needs to be ±as good as a human senior developer would write. (Of course, that's a bit subjective.)