This will resolve to “YES” if an LLM is released that is capable of fully attacking a web application in a way that is equivalent to a modern day penetration tester. Currently this is not possible and most automated scanners miss tons of vulnerabilities.
Hm, it depends on how we're measuring "equivalent to a modern day penetration tester". The average modern day web pentester runs a bunch of automated tools and calls it a day (these days, often using LLMs to churn out the written reports, but not for the testing itself). Taking that as the baseline, I don't think it's too crazy to expect LLMs to beat it.
But like you say, automated scanners miss a lot, and that's where the truly good manual testers come in - but they're a minority, even before the days of LLMs. I think it'll be much harder for a good pentester to be replaced by an LLM.
(Source: I worked as a web app pentester in the pre-LLM era, and have spoken to plenty of other pentesters since)
So as a request for clarification - are we comparing LLMs to "average" pentesters, or "good/skilled" pentesters? I would propose an objective metric of something like Hackerone earnings, but no doubt human hackers are using LLMs too, so it wouldn't be a fair comparison.