In my blog post on what I learned in a year of building a coding agent, I list several forecasts, which I've added to this market.
Read my post:
https://jamesgrugett.com/p/what-i-learned-building-an-ai-coding
I considered assigning my own percentage forecasts to each of them in the article, but it seemed a little cluttered. I'll add them here:
80% - The multi-agent paradigm will win
60% - “Live learning” will be standard
70% - Coding agents will flip the initiative
80% - Coding agents will close the loop
50% - Recursively improving coding agents will succeed in the market
50% - xAI will gain a sizable lead in model quality
60% - The specific model will not matter as much as today; the network of agents will be important
See also:
https://manifold.markets/JamesGrugett/will-ai-agents-be-able-to-code-a-sm
Update 2025-07-05 (PST) (AI summary of creator comment): For the answer 'xAI will gain a sizable lead in model quality', the creator has specified that model quality will be judged based on performance on benchmarks.
Update 2025-09-25 (PST) (AI summary of creator comment): - For "Recursively improving coding agents will succeed in the market": the agent must be able to spend lots of time autonomously improving itself beyond direct human instructions; 100% self-modification is not required (human involvement is allowed); merely following human-directed tasks does not qualify.
The agent must autonomously find and tackle issues to improve itself.
Human involvement is allowed, but a human orchestrating each change with the agent as a tool does not qualify.
Autonomous self-improvement must be an important mode of improvement, beyond direct human instructions.
Update 2025-09-25 (PST) (AI summary of creator comment): - For “Live learning” will be standard: "Live learning" means agents learn across runs without users explicitly telling them what to learn, akin to continual learning.
Example: an agent gets better in a codebase by learning from previous failures, not just by following new user instructions.
Must be more than simple memory/config edits (e.g., just updating agents.md or a memory file is not sufficient).
The agent’s autonomous learning across runs should be an important contributor to its good results.
People are also trading
The best model will not matter as much as today. Instead, it will be the network of agents that distinguishes the best product.
How would this resolve if it's pretty much a mix of both, to a similar extent as today? The model used being one of the top 1-3 frontier ones is critical to the coding agent being great, but so is the scaffold being good and, better yet, optimized / trained alongside the model
@JamesGrugett according to you has any model since the base gpt-4 had a sizable lead in model quality in the way you intend for this market?
@Bayesian Ah, I don't think so, but gpt-5 is significantly better than the last version of gpt-4o, grok-4-fast is significantly better than Gemini Flash, etc.
But it would also need to be more than just one month that grok-5 was significantly better than other models, for example. So "sizable lead" has a time component too IMO.
@JamesGrugett Is this basically continual learning? The AI should be able to play around with a new programmjng language whose syntax it has never seen and learn through interactions with it in one situation to later be as fluent in it as any other language it pretrained on or something, for example? Or is something else meant here
@Bayesian Yes, I think so. It means across runs it learns without users specifically telling it what to learn. E.g. it gets better in a codebase by learning from previous failures.
@JamesGrugett but like would that need to be an actual true improvement or can it be a surface level improvement like editing the agents.md or some memory files to remember to not do X or Y in the future? Like knowledge learning doesn't count, it needs to be skill learning?
@Bayesian It needs to be more than that and it needs to be an important part of how the agent produces good results.
@JamesGrugett Does it count as recursively improving if the team that builds it is using some coding agents to build it, ie the coding agents are not fully autonomous? I think it's already the case and has been for a while that the best coding agents are coded by devs that make use of that same coding agent, but the coding agents doing everything on their own sounds really unlikely
@Bayesian This one wasn't defined super well, but I was thinking of some greater level of autonomous improvement than the current paradigm, where the coding agent mostly does what the human says, but not necessarily that 100% of the improvement must be the coding agent modifying itself.
Instead, the coding agent must be able to spend lots of time autonomously improving itself, especially in ways that are beyond what humans directly instructed it.
@JamesGrugett Couldn’t it resolve Yes now then? Your description seems ljke something that is already true
@Bayesian I don't think so -- not based on what I was thinking.
AFAIK most coding agent companies use their tool, but they would just have a human mastermind each change and prompt the coding agent each time to do a piece of work (even if it can now run for longer on a task).
To qualify, for this one, the agent would need to find and tackle issues on its own, and have that be an important way that it improves.
@JamesGrugett I’d guess this is a mildly contrarian take relative to consensus. I’m curious why you think xAI will gain the lead?
@Ziddletwix Most compute, most hardcore team, best trajectory (although Google's trajectory is pretty good too).
I actually met some guy from xAI and was impressed by just how much they are grinding. I simply don't see how any other org can catch them.
@KJW_01294 lol. I think this is the most objective question of the bunch -- the best model will score highest on the benchmarks. And it would need a "sizable" lead. If you're not ok with any ambiguity, then don't bet, but I think the resolution will be obvious.
Also I've only spoken to that guy once like 3 months ago.
@JamesGrugett
> Most compute, most hardcore team, best trajectory (although Google's trajectory is pretty good too).
I actually met some guy from xAI and was impressed by just how much they are grinding.
lmfao - i wish there was a way to short codebuff