Will AI agents be able to regularly code small features for us in a year?

I'm thinking of something like https://mentat.ai/, but that actually works.

I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.

I tried this yesterday and it failed haha:

See more discussion in my post:


Get Ṁ600 play money
Sort by:

Does "fairly reliably" roughly mean 75% success, 90%, 98%, ...?


Too subjective for me to bet much on. Expectations will shift as much or more than capabilities over the next year.

I think that in a year we'll see some outstanding successes when the feature is straightforward and uses a common pattern (i.e. add some CRUD route handlers to a REST API for a popular server framework).

But for more complicated things, and for codebases which go off the beaten path a bit, we'll still see broken PRs and code which superficially looks right but has an unusual number of subtle bugs.

In a year, I don't know if this market will resolve based on asking it to do something easy or hard, where the difficulty for a human might not correlate to difficulty for an AI-bot in a easily predictable way.

My general bias is that, with experience, a programmer will learn to avoid pitfalls of any tool, making the tool more useful over time, even without the tool changing at all.

I have a clear idea of what I'm looking for. It needs to be able to make good changes to the codebase for a variety of small-ish requests, which often involve some refactoring along the way. (Leaving code better after the change than before would be a good sign!)

I think this qualifies as a harder objective in your characterization. I'm totally on board with the idea that even now AI coding agents could become more useful operating within a more limited framework.

You've explored this a bit already -- do you know if any AI coding agents integrate with CI/CD to build & test the code they write? It seems like that could go a long way towards fixing the "code only superficially looks correct" issue.

If a first agent could write a comprehensive set of unit tests and end-to-end tests (including performance goals for desired level of scale), then it seems like you could let a second agent take as many implementation attempts as it needs to reach those goals.

That doesn't help with the broader "is AI generated code clean enough to directly incorporate into my codebase?" issue though. I suspect that we'll go through a period of "AI writes custom libraries to do a specific task. Humans don't mess with them, they just use them." That's not very different from how we treat compilers. If we want to alter the library, we'll tweak our requirements and let the AI generate it again, possibly using the old library for reference.

It's a good idea! Especially with languages that have types as another layer of checking.

MentatBot seemed to make lots of errors that could be tested, but they do say that testing approaches is a key part of how it works: https://mentat.ai/blog/mentatbot-sota-coding-agent

bought Ṁ500 YES

Seriously this is priced so ridiculously wrong

bought Ṁ2,000 YES

@JamesGrugett Is this just a ploy to get us to buy more mana so we can bet this up to 99%?

"Yeah, we're a tech debt as a service startup"

this market will have controversial resolution!

bought Ṁ100 YES

this market will have a controversial yes resolution**

More related questions