
LLMs can do almost everything these days. One thing they fail at is solving cryptic crossword clues.
The model need not be open-source, but there must be an API widely available.
Currently, GPT4, Claude, and Google Bard all consistently fail at solving these types of riddles. I do not have thorough data, but I would be surprised if any of the models could answer even 10% of these correctly.
If they can answer 75% correctly, this market will resolve as YES.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ21 | |
2 | Ṁ20 | |
3 | Ṁ8 | |
4 | Ṁ6 | |
5 | Ṁ5 |
I asked Claude3.5 and GPT4o four cryptic crossword questions:
for "Modified or mutant pea cultivar, for example (11)", Claude said "AGRICULTURAL" and GPT said "REPRESENTATIVE". The correct answer was PORTMANTEAU.
for "Clear forecast is a feature of the Caribbean (5,4).", Claude said "CORAL REEF" and GPT said "TRADE WIND". The correct answer was CORAL REEF.
for "Pope likes teasing liberals with special ability to work with others (6,6)", Claude said "PEOPLE SKILLS" and GPT said "PEOPLE SKILLS". The correct answer was PEOPLE SKILLS.
for "Irregular values hiding in an undeveloped form (6).", Claude said "UNRIPE" and GPT said "LARVAL". The correct answer was LARVAL.
For all of these, the explanations included incorrect statements, such as << The anagram of "values" is "LARVAL". >>
The answers are better than six months ago, but there is a certain amount of "guessing" in the responses. I don't see this as reaching the 75% threshold yet.