400-point pwn solved by an LLM by 2025 | Manifold

400-point pwn solved by an LLM by 2025

8

Ṁ110Ṁ148

Jan 1

54%

chance

1H

6H

1D

1W

1M

ALL

The exploit development (pwn) track of most DEFCON-qualifying CTF competitions can be split into 100-point (entry level) to 400-point (weeder) challenges.

Will resolve yes if someone manages to get an LLM to do the bulk of the intellectual work. Parallel construction after the fact may or may not count - if it's plausible someone could've done it during a 48 hour competition, it'll count.

Obviously any calculations/emulation/execution will have to be done by external debuggers and solvers, so an LLM driving and interpreting GDB or Z3 will still count. Using an LLM within some automation but having the human provide most of the insight via careful prompting will not.

Market context

Information security

Get

1,000

to start trading!

Sort by:

Oh no no no... XD https://arxiv.org/pdf/2403.13793.pdf

Some tentative progress in this direction: https://arxiv.org/pdf/2402.11814.pdf

"An Empirical Evaluation of LLMs for Solving Offensive Security Challenges" by moyix et al from NYU.

People are also trading

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will LLMs Daydream by EOY 2026?

Will the most advanced LLM stop being from a US-based company any time before 2030?

There will be one LLM/AI that is at least 10x better than all others in 2027

Related questions

Will the highest-scoring LLM on Dec 31, 2026 show <10% improvement over 2025's best average benchmark performance?

Will LLMs Daydream by EOY 2026?

Will the most advanced LLM stop being from a US-based company any time before 2030?

There will be one LLM/AI that is at least 10x better than all others in 2027

© Manifold Markets, Inc.•Terms•Privacy