Will AI be able to generate an interactive web front-end by the end of 2024?
➕
Plus
128
Ṁ85k
resolved Jan 6
Resolved as
25%

Given a description of a backend (code and/or specification) and a general description of a front-end (e.g. list of essential features, desiderata, style, etc), AI code generator should be able to generate front-end code, in a widely used programming language such as JS or TS, based on a widely used framework such as React or Vue.

Criteria:

  • code must be written completely by AI with no human interventions besides providing relevant information about the backend and a general description of a front-end

  • should work with any kind of a web front-end task a senior front-end / full-stack software engineer is expected to be able to implement

  • code must confirm to the specification and meet quality standards of an expert human senior developer proficient in a given language and problem domain

  • front-end must work according to commonly acceptable UI/UX standards

  • at least 2000 non-trivial lines of code

  • a task is considered to be performed successfully when all important parts of functionality are implemented; minor cosmetic defects or missing niceties are acceptable

  • a code generator should have at least 80% success rate

Get
Ṁ1,000
and
S3.00
Sort by:

Hi. I resolved it at 25% due to a high degree of uncertainty and lack of clear benchmarks.

What we know:

  • many LLMs are able to make React components with a good rate of success (e.g. v0, Claude)

  • multi-step AI workflows are still in 'prototype' phase

  • Devin is offered as a "junior" developer, so there's no expectation of reliability

  • recently SWE-bench results look good, but it's not clear if they match capability of making web front-ends

  • o3 SWE-bench score is 71% roughly within the range

I think this indicates we are on the edge of this capability and clear Yes/No resolution would be wrong. Percentage is closer to 'No' because SWE-bench results are still significantly below 80%.

@AlexMizrahi the question was not "are we close?" but "are we there yet?" and it's definitely right to answer NO to the latter.

@AlexMizrahi In addition to the direct point made by @PierreThierry, the market trading incentives are very different for resolutions to a percentage. It would be right to bet a lot down from 25% to 0% in a binary market even if that would lose a lot of mana in a percentage market.

@Jacy I'm actually surprised that a market can be modified that way at the end. In a prediction market with actual money, I bet that would be illegal.

@PierreThierry Are you certain that e.g. Devin doesn't have this capability?

From Bayesian perspective getting close to 100% certainty costs resources. It's not free. You can pay somebody money to make such an eval, or we keep it uncertain.

Note that this market was created before SWE-bench existed. It would be a lot easier to reference benchmark than to define custom criteria

@AlexMizrahi I have asked dozens of times for people that claim that it is possible to show some evidence, and so far I've been met with the same kind of reactions you get from flat earthers and antivaxxers: "it's easy to try, do it yourself and you'll see" or "I won't do your research for you".

The only exception was one guy showing an impressive resulting code but when I pressed him for prompts and tried them, it produced nothing like what he was showing off.

And that's just for creating an application that does something on its own. A major part of creating a web application for a software engineer is interfacing it with a known API according to its specification and I've seen funny stuff on that front.

What does the resolution-to-percentage mean here? I see no mention in the market description of criteria under which this would happen; I'd been under the impression this was a binary market.

@Tulip Yes, it is a binary market, but we cannot definitively say whether there's AI which meets the criteria because (1) nobody made a webdev-specific benchmark (let alone one matching the description; (2) o3 shows very good results in SWE-bench but we can't test it.

Thus I think partial resolution is fair here, as it reflects uncertainty about the event itself.

@AlexMizrahi based on the title this should actually resolve on Dec 31 or Jan 1

"should work with any kind of a web front-end task a senior front-end / full-stack software engineer is expected to be able to implement"

Senior front-end / full-stack software engineers are occasionally expected to be able to implement literally impossible tasks, so this arguably should already resolve NO.

bought Ṁ100 NO from 36% to 32%
bought Ṁ100 YES from 28% to 37%

https://v0.dev/ might be the most advanced I know of, but it's still a long shot till this question would resolve yes

bought Ṁ25 YES from 39% to 40%
bought Ṁ40 NO from 38% to 36%

Considering the strict requirements, I'd say that's a no.

bought Ṁ50 NO from 47% to 44%
predictedNO

Could you give an upper / lower bound for scale, apart from LOC?

Like, a random note-saving app without editing, deletion, or sign-in the lower bound? Would the same with sign-in count? Or would a ToDo list app with edits, deletes, and sign-in be the lower bound? Or something like a ToDo list app with tags and reordering and recurring items?

@1a3orn The question is basically "Can AI replace front-end web devs?", so lower bound should be similar to what people develop commercially, sign-in and so on are required.

Something similar to a TODO list app as you described might be good for a lower bound, except that an actual TODO list app won't qualify because it's a common tutorial topic so we won't be able to tell which parts are just copied from the training set.

Have you tested whether GPT-4 can already do this?

@vluzko Not particularly. I know that people had some success with GPT-4, but initially released model with 8k context is definitely not sufficient due to limited size. (Although maybe somebody can prompt it to generate front-end piece by piece, but AFAIK nobody yet succeeded doing that.) Is 32k context enough? It's a bit hard to check as a detailed descriptions of a back-end of a non-trivial app are hard to find.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules