Nat Friedman, former GitHub CEO and AI hype enthusiast made this tweet claiming "This is going to be an insane year for AIs writing code" which falls into the kind of vague, gesture-y and unfalsifiable AI hype proclamations that I'm quite skeptical about and find rather annoying.
I want to challenge my biases though, so I'm making this market to register my deep skepticism, and to see if I will have been right or wrong to be so by the end of the year. I am a software engineer so I'm familiar with and use most AI code tools including gpt-4 and copilot and find them of some marginal utility, say 3/10 on a subjective, vibes-based scale.
I will resolve this market as YES if by the end of the year, it turns out Nat was bullshitting and there's not a step change in new AI tools that I consider "insane" or a significant improvement on current Copilot for example (say 6/10 on my subjective scale). Considering I might be biased, I will also allow my judgement to be influenced by the consensus of opinions of other programmers or commenters.
I will resolve this market NO if there is a clear step up in new AI tools that showcase clearly superior abilities to current tools, or are "insane" in some consensus observable way. If said AI tools are not publicly available but there's clear evidence of their existence in some other domain, I will also bias to resolving NO. I can't think of any reasons to resolve this N/A besides force majeure reasons, since I'm bound to have an opinion one way or another
I'm open to better wordings of this market or more concrete ways to quantify my opinion
@traders Some updates: After the initial launch of bolt.new and me playing around with it, I have not used it since. Seems like I got caught up in the launch hype a bit, and it just doesn't fit my natural use cases for writing code i.e i'm very rarely spinning up greenfield apps. I did finally however get the hang of Cursor and it's basically better than advertised both in terms of code completion, and as sort of a pretty good pairing assistant. So I'm still likely to resolve NO, but it's only due to Claude and Cursor, with all other products being rather middling to actively useless.
another update @traders : I haven't yet made time to learn/use Cursor because I can't find a straightforward tutorial, but just tried bolt.new from Stackblitz and it blew my mind https://x.com/stackblitz/status/1841873251313844631. The sweet spot is still commoditized UI-heavy or frontend tasks, but this is clearly a step change from what was possible 6 months ago. There is a small chance I can be convinced otherwise but I will almost certainly resolve this NO
@diadematus I write code for a living and this has not been my experience. Chatgpt and Google search AI suggestions frequently hallucinate, more often than not they are wrong. Copilot PR summaries are usually wrong. Copilot auto complete can complete boiler plate tests line by line 60% of the time, but needs to be checked. Copilot chat is useless.
Update for @traders ...I have been a solid YES all year but I started using Claude 3.5 around ~2 months ago both at work and in pet projects and it's mindblowingly great. I'd say it has roughly halved the time i spend working on tickets, and allowed me to ship a chrome extension and (non-trivial) mobile app over one month, with relatively ease and only nominal input by me. Maybe these are relatively commoditized UI tasks of the sort LLMs might be good at, but this was not clearly the case 5 months ago. I plan to test drive Cursor over the course of this month and see what the hype is but I'm no longer so sure I would resolve YES if this was to resolve today.
what do you think about the gains on SWE-bench? I haven't tried any coding agents and I'm not sure which ones are even publicly available. But wouldn't be surprised if climbing this benchmark ends up tracking something real.
@JoshYou I really don't care about these sorts of benchmarks, just my subjective experience using these tools in my daily work (as described in the market)
Apparently even "Devin" hype video was apparently a lie - https://www.youtube.com/watch?v=tNmgmwEtoWE
https://www.cognition-labs.com/blog - pretty good but not 'insane' imo...i'm a little less confident of my skepticism now though