When LLMs or hybrid AI models can generate both the implementation of a Sudoku solver and its formal specification, along with a formal proof of correctness?
Prompts might look like: 'Write a Sudoku solver in Coq and formally prove its correctness.'
Model should be generally available
As a status check, I tried o1-preview and it basically avoided doing the actual work instead essentially placing, “solution goes here” in its answer.
when further prompted to fill in those sections, it did spew out a lot of text but it didn’t look complete to me and finished by reiterating the problem was hard and it didn’t have the full proof.
However, maybe with o1 itself or o2 (and maybe even enough prompting and tokens with o1 preview) this will be achievable. It seems along the right path to improvement vs 4o.
@LiamZ they said “very soon”
https://x.com/demishassabis/status/1816596568398545149?s=46
it may be end of this year or start of 2025