Will an LLM be able to solve the following programming problem?

Ṁ460Ṁ3.6k

resolved Apr 8

Resolved

YES

ALL

Will a GPT (pre-trained transformer model) give the right answer to this problem below?

Clarifications:

1) the model needs to be able to solve it for any arbitrary sequence of length 7

2) the model needs to be a pretrained transformer (does not need to be developed by OpenAI).

3) The model does not need to be called “GPT” or have “GPT” in its name, as long as I consider that it is likely based on a transformer architecture.

4) Variations in the prompt are permissible as long as the problem relies on the same logic and has the same solution. I will only try the prompt provided, unless a comment on the market suggests a different prompt.

Inspired by this tweet:

https://x.com/victortaelin/status/1776096481704804789?s=46&t=A87brvrcPNLRU2RL6LBEZQ

Market context

Technology

Science

Artificial Intelligence

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ224
2		Ṁ102
3		Ṁ46
4		Ṁ34
5		Ṁ21

People are also trading

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

80% chance

How Will the LLM Hallucination Problem Be Solved?

When will LLMs be able to generate formal proof for sudoku solver?

By 2028 will we be able to identify distinct submodules/algorithms within LLMs?

79% chance

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

63% chance

Will LLMs such as GPT4 be considered a solution to Moravec’s paradox by 2030?

20% chance

Will the best LLM in 2027 have <500 billion parameters?

12% chance

Will LLM based systems have debugging ability comparable to a human by 2030?

68% chance

Will any LLM be able to multiply together arbitrary decimal numbers by the end of 2027?

65% chance

Will the best LLM in 2027 have <250 billion parameters?

12% chance

Sort by:

This has been solved. If it’s good enough for the original creator to pay $10k, I trust it.

https://x.com/victortaelin/status/1777049193489572064?s=46&t=A87brvrcPNLRU2RL6LBEZQ

Taelin has conceded defeat for the more difficult 12-token case, with a prompt for Claude Opus achieving 94% accuracy on 50 attempts.

The prompt remains secret for now for some reason (unclear how keeping it secret helps the aim of finding a more efficient solution as discussed), but presumably will be public later. I would guess it will be able to do 7-token case flawlessly.

@Santiago would a fine-tuned GPT-3.5 count? Assuming it was trained in such a way as to preclude it simply memorising the examples it was to be tested on?

@chrisjbillington It doesn’t take that many examples to show the full universe of cases for the model to memorize (for a length of 11, it takes 4^11= ~4.2M examples, for a length of 7 it only takes ~16k), so the bar would be very high for the people who fine-tuned it to demonstrate it had not memorized (or cuasi-memorized) the examples.

If I was fully convinced that it hadn’t memorized it, then I think in the spirit of the original tweet I’d resolve Yes.

The other thing I’d have to be certain of is that the “finetuning” is not doing the computation outside of the transformer architecture (which is also fairly easy to do).

@Santiago well this person

https://github.com/reissbaker/clevergpt

fine-tuned GPT-3.5 using

2000 randomly generated nets of up to 21 tokens and 30 long-context nets (interaction nets with 40-100 tokens)

For which

2k random samples appears to be good enough to get very close to perfect performance at any size in the training range; originally we trained on 200 samples, which provided near-perfect performance on <10 token nets but could sometimes struggle with ones approaching 21 tokens. 2000 samples is still far below the 5 trillion valid 21-token inputs, so the model isn't memorizing the outputs, it's just getting better at applying the pattern it's learned.

@chrisjbillington Ha!

Have you found any evidence that it actually works?

It’s solving it in a way that I suspect is not in the spirit of the question, but I didn’t really anticipate it, so it technically meets the criteria.

I think if you find any evidence that it really works (other than the creator’s comment in his own repo), I’ll have to resolve Yes and create a separate market with the additional requirements I had in mind.

@Santiago No, code is provided so presumably would be easy to reproduce, though I don't know how much it costs to fine-tune GPT-3.5. I'm probably not going to bother, since probably we'll soon have a regular prompt that solves it using some existing model.

I scoffed at first, but after 30 minutes of talking with Claude 3 Opus, it is completely unable to understand what's going on, and even screws up this badly on 3 token problems:

A# B# #A
Step 1: #B A# #A // A# B# replaced with #B A#
Step 2: #B A# #A // No rules apply
Final state: #B A# #A

I don't see an excuse this time. It really just can't understand it.

@singer I had the same reaction.

Is this the answer? I couldn't tell if the symbols all get swapped at once, or if only one change happens per line:

A# B# B# #A B# #A #B
A# B# #A B# B# #A #B
A# #A B# B# B# #A #B
B# B# B# #A #B 
B# B# #A B# #B
B# #A B# B# #B
#A B# B# B# #B
#A B# B#

@singer Based on the steps described, my understanding is that they get swapped one at a time, left to right.

So yes, that looks like the correct sequence to me.

People are also trading

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

80% chance

How Will the LLM Hallucination Problem Be Solved?

When will LLMs be able to generate formal proof for sudoku solver?

By 2028 will we be able to identify distinct submodules/algorithms within LLMs?

79% chance

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

63% chance

Will LLMs such as GPT4 be considered a solution to Moravec’s paradox by 2030?

20% chance

Will the best LLM in 2027 have <500 billion parameters?

12% chance

Will LLM based systems have debugging ability comparable to a human by 2030?

68% chance

Will any LLM be able to multiply together arbitrary decimal numbers by the end of 2027?

65% chance

Will the best LLM in 2027 have <250 billion parameters?

12% chance

🏅 Top traders

People are also trading

People are also trading

Related questions