An AI agent is something which takes a task and directly applies changes to a code base. (Possibly via a merge request, letting a human to review changes.) I.e. it works similarly to giving a task to a programmer.
The market resolves to "YES" if such agents exist by the end of the year and are used in commercial environments, essentially displacing work of programmers.
The agent must work for a mainstream programming language and a commonly used code base format. "AI app generator" which produces something from a template does not count, neither do specialized "no code" environments.
Tools like Copilot do not count - they are designed to help a programmer to write code, not to replace a programmer.
Experiments in a lab settings do not count - it's much easier to operate in a controlled environment.
Does it have to be good? 🤣 I fully expect to see a lot of AI hype scams (I suspect this is the first I've seen in the wild, $99 to talk to a private AI bot - https://twitter.com/ReadMultiplex) and I can just about guarantee some one will make code this way and market their product as being written by AI. I also strongly suspect the code produced will be absolute trash.
@JustNo It has to meat quality standards of commercial software development.
@AlexMizrahi any examples of such standards? Test coverage, for instance?
@BraulioValdivielsoMartine There are no formal standard. The best way to assess quality is to sample opinions of senior software developers. Information about such assessment can be posted in press, blogs, etc. E.g. if we see that e.g. Google considers quality acceptable that would be it.
Would this type of stuff count? (keeping aside scale or commercial environment for now)
Yes. It is sufficiently general and it does the work which otherwise would be done by a human programmer.
This bot automatically opens pull requests to update the dependencies in a repo:
This does replace some (though not much) of the work of a programmer, and is prompted by the bot (not by a human)
I'm guessing this doesn't count. Do you mean because it can't do a variety of tasks like a human programmer? Or some other reason maybe?
@YonatanCale It needs to be sufficiently general, i.e. it should be able to take a task in a natural language and carry it out. It is mentioned in the description: "takes a task".
dependabot does only one thing. Narrow tools like that existed for decades so it doesn't make sense to create a prediction market about them, the question is whether we'll get something new - more general, more powerful. It should be almost as powerful as a human programmer.
"Almost as powerful as a human programmer" - I'd be happy if you were more specific (maybe give 10 tasks and say it should be able to do 7 of them?)
But this is enough for me to buy NO anyway
@YonatanCale Results are already available for tasks which are easy to specify and measure: "AlphaCode achieved an estimated rank within the top 54% of participants in programming competitions".
Commercial software development, however, does not have a well-defined measure of complexity. We can't use things like coding competitions as they are skewed towards more self-contained tasks which are uncharacteristic for commercial software development.
So I'm afraid it's better to leave this open ended.
If this resolves to YES most likely the evidence will be in form of articles claiming that programmers are being replaced by AI agents. I will use my own judgement as an expert (I am a CTO of a software company and a senior programmer) to ignore irrelevant evidence - for example, a bot having only a 'narrow' functionality.
@AlexMizrahi "articles claiming that programmers are being replaced by AI agents" (judged by you) adds relevant info for me.
Together with "Almost as powerful as a human programmer" - that removes stuff like "just generate css"
I think AI agents could potentially be used to automate some mundane tasks like "rename Foo to Bar throughout the codebase, including when they appear as part of larger names (FooStatus, currentFoo, updateFoo), except in the SuperfooCoApi module." I want to say that tools already exist to automate a task like this, but not with a natural-language prompt.
@CollectedOverSpread Copilot and GPT-4 both are both way stronger than that