Will "large reasoning models" significantly outperform traditional LLMs in creative writing by 2026?

1kṀ1633

Jan 2

40%

chance

ALL

While o1-preview outperformed previous models in math and programming, it wasn't preferred over GPT-4o in the "personal writing" domain:

However, it seems like models that think before they respond should be able to improve their writing significantly, in principle. By analogy, if I had to write a poem off the top of my head, it would be much worse than a poem that I took an hour to write. It seems that o1 and o3 are primarily trained to solve problems with a definitive correct answer, but I don't think this will be the case forever.

For the purposes of this question, "large reasoning models" (LRMs) are LLMs that are trained to generate intermediate outputs that are not part of the final answer. This could include chain-of-thought training or "continuous thought" techniques like Meta's COCONUT.

Creative writing could include poetry, short stories, novel outlines, comedy routines, and similar things.

Things I might look for to determine a resolution:

The "Creative Writing" category in https://lmarena.ai/?leaderboard. If it's clear that LRMs outperform other models, the market will likely resolve YES. However, if it's simply the case that all "serious" models are LRMs and the "base LLM" version of those models isn't even available, then this wouldn't count unless I think the reasoning is really the cause of the performance gap.
If an LRM has the option to modulate the length of its reasoning, I'd look for metrics showing that increased test-time compute corresponds to a marked increase in response quality in the creative writing domain.
In the absence of good metrics, I'd rely on general vibes based on the experience of myself and others.

Ultimately, I'm thinking about a scenario where LRMs are clearly better for creative writing than traditional LLMs, in the same way that o1 is clearly better than GPT-4o at (well-specified) coding problems. If that bar hasn't been reached, this market should resolve NO.

I won't trade in this market.

Technical AI Timelines

Literature

LLMs

Writing

Get

1,000

to start trading!

People are also trading

Will LLMs be the best reasoning models on these dates?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

59% chance

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

11% chance

Will RL work for LLMs "spill over" to the rest of RL by 2026?

34% chance

Will any major science fiction short story markets lift their categorical ban on writing with LLMs by 2030?

43% chance

Will I write an academic paper using an LLM by 2030?

65% chance

Will researchers extract a novel program from the weights of an LLM into a Procedural/OO programming language by 2026?

27% chance

Will an LLM have been used as part of the composition a work which wins the Nobel Prize in Literature by 2040?

51% chance

Will LLMs be better than typical white-collar workers on all computer tasks before 2026?

9% chance

Will an LLM consistently create 5x5 word squares by 2026?

Sort by:

https://x.com/chatgpt21/status/1933050795240415449

But

https://x.com/JuliusSimonelli/status/1932944540517945732

sold Ṁ29 NO

anecdotally, DeepSeek r1 is in my opinion a vastly better writer than v3. I'm reversing my prediction based on this

@MingCat Yeah, I also think this is probably true. DeepSeek r1 is really good at certain writing tasks in my experience. This also checks out when you look at the LMSYS leaderboard for creative writing:

I'm not in a rush to resolve this market, but I think it's quite likely to resolve to YES. I'll wait until there's a greater universal consensus about this.