When will LLMs be able to generate formal proof for sudoku solver?

100Ṁ400

2027

ALL

1.2%

2024

2025

62%

2026

29%

2027 or later

When LLMs or hybrid AI models can generate both the implementation of a Sudoku solver and its formal specification, along with a formal proof of correctness?

Prompts might look like: 'Write a Sudoku solver in Coq and formally prove its correctness.'

Model should be generally available

Get

1,000

to start trading!

People are also trading

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

27% chance

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

75% chance

Will the majority of mathematicians rely on formal computer proof assistants before the end of 2040?

60% chance

Will an LLM consistently create 5x5 word squares by 2026?

84% chance

Will OpenAI's next major LLM (after GPT-4) solve more than 2 of the first 5 new Project Euler problems?

57% chance

By 2028 will we be able to identify distinct submodules/algorithms within LLMs?

76% chance

When will the next paradigm in LLMs (after reasoning) be released?

Will we have a formalized proof of the Modularity theorem by 2029-05-01?

74% chance

In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)

80% chance

When will we have a fully formalized proof of Fermat's Last Theorem?

4 Comments

6 Holders

14 Trades

Sort by:

As a status check, I tried o1-preview and it basically avoided doing the actual work instead essentially placing, “solution goes here” in its answer.

when further prompted to fill in those sections, it did spew out a lot of text but it didn’t look complete to me and finished by reiterating the problem was hard and it didn’t have the full proof.

However, maybe with o1 itself or o2 (and maybe even enough prompting and tokens with o1 preview) this will be achievable. It seems along the right path to improvement vs 4o.

@LiamZ I beleive that models of deepmind will solve such problems first

their models proves theorems iteratively and have feedback from proof checker or symbolic engine,which increase accuracy of result and speed up proof search

bought Ṁ1 YES

@fornever I agree that and hybrid approaches generally are much more promising but when do you think it be “generally available” or have some similar backend integrated in the chatbots?