Will LLMs be able to formally verify non-trivial programs by the end of 2025?
➕
Plus
8
Ṁ590
Dec 31
33%
chance

This resolves yes if there is reasonable evidence (or my own experiments) that shows an LLM can reliably write specs/proofs for a deductive verifier for a reasonably complex program with no extra help (access to the verifier to try things and some scaffolding to make that possible are OK). This should be reasonably stable for different verifiers and kinds of programs and such.

My experiments with o1 and o3-mini currently fail for pretty basic examples. This may be because my prompting is not good enough (verification requires a fair amount of explanation). If it turns out that current models are already capable of this, then this also resolves yes. To me, the gap between the level of reasoning they are doing and what is needed feels substantial enough that I would be surprised if the current models succeed out of the box even with better prompting.

This is a somewhat subjective question. I will (and invite you to) collect evidence in the comments. I am also open to a more objective resolution criterion if people have ideas.

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules