The AI which passes the below test must have an API available to the public, or at least broadly available to businesses.
Programming competency will be judged by the ability to produce functioning code for more than 50% of the projects described here. (I may substitute another metric if this metric proves problematic; e.g. if there are many solutions to the projects online which seem to confound train/test splitting).
Any prompting which does not depend on the individual projects is acceptable. Retrieval is also acceptable, as long as no full solution to the problem is accessed. The AI in question may run its generated code to debug, or experiment with functions etc.
GDP growth (annual rate, in percent terms) will be measured starting from the quarter following the appearance of such an AI.