Will there be a critical vulnerability discovered by AI by the end of 2025?

1kṀ2412

resolved Jul 20

Resolved

YES

ALL

Will resolve to YES when:

A critical-level vulnerability (score 9-10) is registered on CVE.
Whoever registered it officially claim that it was discovered by AI, and there is no good reason to doubt that claim.
AI was responsible for discovery of the vulnerability and exploiting it, at least at the POC level.

Technology

Technical AI Timelines

Cybersecurity

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ334
2		Ṁ86
3		Ṁ60
4		Ṁ43
5		Ṁ38

People are also trading

Will there be an AI Winter by the end of 2025?

7% chance

What AI safety incidents will occur in 2025?

Will someone commit terrorism against an AI lab by the end of 2025 for AI-safety related reasons?

14% chance

Will AI be smarter than any one human probably around the end of 2025?

16% chance

Which AI will be the best at the end of 2025?

Will advanced AI systems be found to have made money illegally via finding security exploits and/or getting unauthorized access to others' bank accounts by end of 2035?

78% chance

Will an 'AI Bust' or 'AI Winter' Occur by the End of 2025?

9% chance

Will an AI system be reported to have independently gained unauthorized access to another computer system before 2026?

22% chance

Will artificial general intelligence be achieved they the end of 2025 ?

7% chance

Will any computer virus powered by AI cause large damages to digital infrastructure by 2027?

Sort by:

o3 has discovered a vulnerability in the Linux kernel that has been assigned a CVE: https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/

There is no NVD-assigned severity rating, but it looks serious to me: it's a use-after-free in kernel-level, network-facing code.

First real-world zero-day found by an LLM: https://googleprojectzero.blogspot.com/2024/10/from-naptime-to-big-sleep.html

It was patched before shipping to end users, so no CVE was assigned.

It has been recently demonstrated that LLMs can do this as well given proper harnesses, although it has been tested on a synthetic benchmarks and not real-world code:

https://googleprojectzero.blogspot.com/2024/06/project-naptime.html

@traders I was convinced by the overwhelming example mountain by @SergeyDavidoff.

Turns out that AFL is quite powerful tool and critical vulnerabilities are more common than I thought.

What exactly counts as AI? Does it have to be an LLM?

Would a tool relying on a genetic algorithm for vulnerability detection count?

Would Github Code Scanning or Amazon CodeGuru that claim to use "machine learning" count?

Does Snyk DeepCode that claims to use "multiple AI models" count?

It includes any fully automated system, regardless the precise algorithm underneath. As long as the discovery to POC was done by said human-free system, yes all of the above counts

In that case, AFL has done it already. They have a detailed trophy case.

I'm sure libfuzzer that uses the same ideas has found equally serious bugs.

They truly are fully automated: you compile your program with a few special flags, point the tool at it, and wait. That's it. You can provide some valid inputs to the program to speed up the process and find more issues, but even that is not necessary. Absolutely no manual analysis of the program is involved.

Sample issue found with AFL: CVE-2014-9495

NVD has assigned it score 10.0: https://nvd.nist.gov/vuln/detail/CVE-2014-9495

The NVD entry links to this mailing list post, which links to the detailed description of the vulnerability which attributes the discovery to AFL.

Please resolve "YES".

---

NB: Sometimes NVD scores are inflated (I've seen 9.8 for very minor issues), but a heap buffer overflow in libpng as seen in this CVE really is a big deal. It allows for zero-click remote code execution on every phone and every web browser.

AFL is a powerful tool indeed, but IIRC it does not exploit any of its funds vulnerabilities, and the resolution specifically say AI must be responsible for exploitation too.

I'll try to find information about those cases, but if the critical status was given only because a human-crafted exploit was found later I think it's fair not to resolve it yet. If AFL-based system could exploit a vulnerability too it will resolve to YES

(I am aware that sometimes crashing a program is considered "POC" for exploit, but it was not my intention and I believe it can be understood from the phrasing some sort of actual code execution is needed for critical RCE POC)

> I believe it can be understood from the phrasing some sort of actual code execution is needed for critical RCE POC

In case of memory errors it is usually sufficient to demonstrate memory corruption, e.g. using Address Sanitizer. That alone will typically get you a CVE or a bug bounty.

Specifically, in the case of this libpng RCE a full exploit chain was never demonstrated. The original write-up only demoed a controlled write of 4096 "A"s into memory. This was sufficient to get a CVE of score 10. This is functionally equivalent to an AFL-produced testcase, just more readable.

Also, automated analysis tools usually point at a buffer overflow / use-after-free / etc and do not bother actually constructing something that runs calc.exe. This is unusual for cybersecurity tools, and should have been called out specifically if that is what you are interested in.

But even under these new constraints, I believe AFL still qualifies. See:

https://lcamtuf.blogspot.com/2014/10/bash-bug-how-we-finally-cracked.html

In this case an automated tool (automatic search + automatic minimization) has produced a template in which the human can put any command they wish to execute. No manual analysis on how to finagle some pointers to get code execution is required. Search for "CVE-2014-6278" in the article to find the relevant part. It has CVSS score of 10 on NVD.

The tool has also exploited the vulnerability completely by itself in the course of searching for it, since it executes every test case it comes up with. The commands it ran did not do anything malicious, but it did run some commands.