Will resolve to YES when:
A critical-level vulnerability (score 9-10) is registered on CVE.
Whoever registered it officially claim that it was discovered by AI, and there is no good reason to doubt that claim.
AI was responsible for discovery of the vulnerability and exploiting it, at least at the POC level.
First real-world zero-day found by an LLM: https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html
It was patched before shipping to end users, so no CVE was assigned.
It has been recently demonstrated that LLMs can do this as well given proper harnesses, although it has been tested on a synthetic benchmarks and not real-world code:
https://googleprojectzero.blogspot.com/2024/06/project-naptime.html
@traders I was convinced by the overwhelming example mountain by @SergeyDavidoff.
Turns out that AFL is quite powerful tool and critical vulnerabilities are more common than I thought.
What exactly counts as AI? Does it have to be an LLM?
Would a tool relying on a genetic algorithm for vulnerability detection count?
Would Github Code Scanning or Amazon CodeGuru that claim to use "machine learning" count?
Does Snyk DeepCode that claims to use "multiple AI models" count?
In that case, AFL has done it already. They have a detailed trophy case.
I'm sure libfuzzer that uses the same ideas has found equally serious bugs.
They truly are fully automated: you compile your program with a few special flags, point the tool at it, and wait. That's it. You can provide some valid inputs to the program to speed up the process and find more issues, but even that is not necessary. Absolutely no manual analysis of the program is involved.
Sample issue found with AFL: CVE-2014-9495
NVD has assigned it score 10.0: https://nvd.nist.gov/vuln/detail/CVE-2014-9495
The NVD entry links to this mailing list post, which links to the detailed description of the vulnerability which attributes the discovery to AFL.
Please resolve "YES".
---
NB: Sometimes NVD scores are inflated (I've seen 9.8 for very minor issues), but a heap buffer overflow in libpng as seen in this CVE really is a big deal. It allows for zero-click remote code execution on every phone and every web browser.
AFL is a powerful tool indeed, but IIRC it does not exploit any of its funds vulnerabilities, and the resolution specifically say AI must be responsible for exploitation too.
I'll try to find information about those cases, but if the critical status was given only because a human-crafted exploit was found later I think it's fair not to resolve it yet. If AFL-based system could exploit a vulnerability too it will resolve to YES
(I am aware that sometimes crashing a program is considered "POC" for exploit, but it was not my intention and I believe it can be understood from the phrasing some sort of actual code execution is needed for critical RCE POC)
> I believe it can be understood from the phrasing some sort of actual code execution is needed for critical RCE POC
In case of memory errors it is usually sufficient to demonstrate memory corruption, e.g. using Address Sanitizer. That alone will typically get you a CVE or a bug bounty.
Specifically, in the case of this libpng RCE a full exploit chain was never demonstrated. The original write-up only demoed a controlled write of 4096 "A"s into memory. This was sufficient to get a CVE of score 10. This is functionally equivalent to an AFL-produced testcase, just more readable.
Also, automated analysis tools usually point at a buffer overflow / use-after-free / etc and do not bother actually constructing something that runs calc.exe. This is unusual for cybersecurity tools, and should have been called out specifically if that is what you are interested in.
But even under these new constraints, I believe AFL still qualifies. See:
https://lcamtuf.blogspot.com/2014/10/bash-bug-how-we-finally-cracked.html
In this case an automated tool (automatic search + automatic minimization) has produced a template in which the human can put any command they wish to execute. No manual analysis on how to finagle some pointers to get code execution is required. Search for "CVE-2014-6278" in the article to find the relevant part. It has CVSS score of 10 on NVD.
The tool has also exploited the vulnerability completely by itself in the course of searching for it, since it executes every test case it comes up with. The commands it ran did not do anything malicious, but it did run some commands.