The AI which passes the below test must have an API available to the public, or at least broadly available to businesses.
Programming competency will be judged by the ability to produce functioning code for more than 50% of the projects described here. (I may substitute another metric if this metric proves problematic; e.g. if there are many solutions to the projects online which seem to confound train/test splitting).
Any prompting which does not depend on the individual projects is acceptable. Retrieval is also acceptable, as long as no full solution to the problem is accessed. The AI in question may run its generated code to debug, or experiment with functions etc.
GDP growth (annual rate, in percent terms) will be measured starting from the quarter following the appearance of such an AI.
The AI may be set up to refuse to carry out the trojan horse one: Create a Trojan Horse in C: Again that's pretty easy. Run a simple C++ program on remote computer. Intelligently create a huge .dll file in SYS 32 directory and as an impact, eat the hard-disk space of victim's computer. Any Clean up software ignore .dll file, so the only solution is: Victims have to reformat the hard drive.
Manifold in the wild: A Tweet by Jacob Pfau
https://manifold.markets/JacobPfau/what-will-us-gdp-growth-be-1-year-a?referrer=JacobPfau