Will AI agents be able to regularly code small features for us in a year?
334
20kṀ490k
resolved Jul 4
Resolved
YES

I'm thinking of something like https://mentat.ai/, but that actually works.

I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.

I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694

See more discussion in my post:

https://jamesgrugett.com/p/software-automation-will-make-us

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ29,803
2Ṁ6,448
3Ṁ5,174
4Ṁ4,766
5Ṁ2,413
Sort by:
reposted

Check out my new market with some forecasts about coding agents! https://manifold.markets/JamesGrugett/ai-coding-agent-forecasts-from-my-b?r=SmFtZXNHcnVnZXR0

reposted

Alright!! The market has closed and it's time for resolution!

I fired up Claude Code (as the best non-Codebuff agent) and asked it to implement a small feature for me: building a new agent template in our multi-agent system.

✅ It did it!

I think this resolution hinges on the definition of "small feature". Coding agents today are up for this task and are consistently successful with an iteration or two of human feedback.

However, larger tasks, ones that require significant design, or "g"-loaded tasks which require a lot of working memory, are usually not completed satisfactorily.

Frequently, even small features will be more verbosely implemented than an expert human, duplicating code, missing a commonly used pattern in your codebase, etc.

Coding agents are not perfect, but they've made incredible progress in the last year. We really are on a breakout trajectory for AI — the world is changing quickly!

I will need to think carefully so as not to underestimate a milestone for next year that I can turn into a prediction market.

Resolved: YES

@JamesGrugett I'm surprised at this resolution, as the described feature sounds more like a small code snippet. It's not a fully-fledged change to the functionality of the deployed code, is it? In the terminology of your own post, it sounds like level 2 automation and not level 3.

Of course, I might be confused as to what exactly an "agent template" is, in your context. No problem! Since the AI is regularly coding such small features for you, it should be easy to give 2-3 better examples.

@VitorBosshard I'm sympathetic to this take, and agree it isn't obviously YES.

But I do think you can ask for whole features, which make changes across e.g. the backend, front end, util files, and it will give you something that works.

@ian posted an example below where it modified 9 files across different parts of the codebase to make a feature for deleting spam comments, on the order of ~200 lines changed.

I think this reliably works for an app like Manifold, which is just a web app with a server and a postgres database.

For this question, I had to choose either YES or NO. I think you can make the case for either, but IMO YES is the better answer.

@JamesGrugett Ok, looking through the other comments, there are indeed multiple example of fully fledged features.

Does it count if I find myself rewriting most of the code that gets produced?

Here's another good one-shot pr from cursor's background agent, adding the ability for admins/mods to 'delete' spam comments so that they aren't rendered at all, unlike the 'hide' feature which still renders the hidden comments: https://github.com/manifoldmarkets/manifold/pull/3600

This took a minute to prompt, 5m for cursor to come up with a solution, and 5-10m to test to make sure it worked.

This was a really good experience! I used cursor's background agent to add a minimum bet filter to the trades tab and it finished a good start in 5 minutes, and then I tested it and prompted it to get rid of pagination, and use infinite scroll instead. Done in less than 20 minutes! https://github.com/manifoldmarkets/manifold/pull/3599

bought Ṁ5,000 YES

This looks good to me, stephen gave it two prompts to create this and I think it took less than 10 mins https://github.com/manifoldmarkets/manifold/pull/3588

@ian Looks like we need another prompt to fix the type error, should come in well under 30 mins still, though

bought Ṁ50 NO

@ian Initial comment was more than 30 minutes ago, so this is a failure

@CalibratedNeutral oh we stopped paying attention

@CalibratedNeutral I don't know if stephen told it to fix the type error

@ian the key to vibe-coding is to stay just the right amount drunk and not to over do it

bought Ṁ50 YES

Claude 4 with github I think does what the mentat.ai thing you linked does

bought Ṁ250 NO

@ian do you have access to chatgpt plus or pro and would be willing to see how codex-1 fares? it's currently only accessible on pro and teams iirc but will be accessible to plus probably before the market closes

bought Ṁ5,000 YES

GPT 4.1 is awesome for coding.

It's genuinely really good. (mini is ok, nano is dogwater). I have been using it off azure with cursor both as assist and tedious implementation speedrunner - it's one-shot so many instructions that 4o would have a bad time with, and that claude would overthink.

Not tab complete, mostly just asking stuff. Really has come a long way with code

Crazy how ai agents are regularly building small features for me almost daily and this market is still at 80%

@DarklyMade is this code peer reviewed?

@Kire_ of course! The peer review AI looks at it!

I'd like to conduct some tests using codebuff/cursor. What are acceptable small features in your mind? I have a couple ideas:
- add a button to the comments bottom row that allows users to tip the commenter. Denormalize the tip amount onto the comment and display the total tipped amount on the button.
- Add a delete button for admins/mods that marks a comment as deleted (don't actually delete the comment, just set the deleted flag and hidden flags both) that hides the comment completely from the market.

@JamesGrugett said the delete comment button for spam fit the bill, I'll try using codebuff to do this soon

@ian a "view results" button on polls?

@cthor Also seems reasonable!

@ian I am aware that you work on Manifold, but since you are also the largest YES holder can we maybe agree to let @JamesGrugett do these kinds of evaluations once time comes.

@CalibratedNeutral That sounds reasonable, although he doesn't work at manifold anymore so I'm not sure if he'll want to put 30 mins in to do this. I was going to film my attempt from scratch

@CalibratedNeutral I was not aware of that. Then maybe a third party (another developer working on Manifold)? The stakes are reasonably high for me, so I really would strongly prefer to have everything as unbiased as possible.

@CalibratedNeutral We might be able to get @SG or @SirSalty to do it

@CalibratedNeutral Alternatively, @JamesGrugett could test this question on his new startup, codebuff. He uses codebuff to help develop codebuff

@ian Either option sounds good to me as long as the resolution criteria are followed according to @JamesGrugett's judgement

@ian how tf did you get the dead head badge?

Comment hidden
© Manifold Markets, Inc.TermsPrivacy