Will an LLM consistently create 5x5 word squares by 2026?

1kṀ11k

Jan 2

94%

chance

ALL

Given unlimited opportunities to fix its own mistakes within a single response, will an LLM be able to generate a 5x5 "word square," a square of letters where each row and column spells a word? It must succeed in 16 out of 20 trials, incorporating a random 5-letter word each time.

Example from Scott Alexander:

D E T E R
E X I L E
T I T A N
E L A T E
R E N E W

This one happens to be the same 5 words vertically and horizontally, but that isn't a requirement for this challenge.

Claude 3.5, GPT-4o, and Gemini Ultra are currently quite bad at this task. An attempt from GPT-4o: https://chatgpt.com/share/e0ec10b4-5852-41c6-926e-bf6b4049a7e8

Details

In January of 2026, I will generate 20 common 5-letter words using an LLM. For each word, I will send this prompt to the best LLM I have access to:

Create a 5x5 "word square" where each row and column spells a word. Afterwards, read off each of the rows and columns and determine whether they all make sense. If necessary, redo the square until it's correct. One of the words should be <WORD>. All words must be in the Scrabble dictionary.

I will let the LLM give one response, which may be as long as it wants. I will look at the last word square that the LLM wrote down. I will count the trial as passing if it contains the given word and every word is in the Scrabble dictionary. If at least 16 out of 20 trials pass, this market will resolve YES, otherwise NO.

Since the rules seem very clear cut to me, I may bet in this market.

EDIT: changed the prompt to mention that the word must be in the Scrabble dictionary. This is a change to the original resolution criteria, but it seems much more reasonable than before.

Technology

Technical AI Timelines

LLMs

Word Games

Get

1,000

to start trading!

People are also trading

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

75% chance

Can LLM generate a Lonpos puzzle solution before the end of 2025?

5% chance

Will an LLM be able to solve Raven's Progressive Matrices from an image in 2025?

27% chance

Will there be major breakthrough in LLM Continual Learning before 2026?

14% chance

In 2025, will I be able to play Civ against an LLM?

7% chance

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

4% chance

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

54% chance

Will RL work for LLMs "spill over" to the rest of RL by 2026?

34% chance

Will LLMs become a ubiquitous part of everyday life by June 2026?

90% chance

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

Sort by:

Create a 5x5 "word square" where each row and column spells a word. Afterwards, read off each of the rows and columns and determine whether they all make sense. If necessary, redo the square until it's correct. One of the words should be <WORD>.

wait, that the word is in the scrabble dictionary is a requirement but isn't stated in its prompt? How would it know it has this requirement? I strongly recommend adding that info to the prompt. currently gemini 3 pro generates, for "BREAD",

```

BREAD

RADIO

EDITS

AITCH

DOSHA

For "TRADE":
T R A D E
R E V E L
A V O I D
D E I C E
E L D E R

etc. I suspect it can get 20/20

@Bayesian Oops, you're right! I updated the prompt

https://chatgpt.com/c/68916c10-c684-8333-a24a-9574aa6f1d0a
o4-mini gets really close, main issue being that despite xertz being a real (though obscure) word, it's sadly not scrabble legal.