In 2028, will an AI be able to write poetry indistinguishable from that of a great Romantic poet?

218

2.1kṀ60k

2028

52%

chance

ALL

Resolves as positive if an AI asked to write poetry in the style of Keats/Byron/Shelley etc can, on its first attempt, write a poem of at least two stanzas which I personally can't distinguish from the real thing. I am a fan of these poets and I think I would be pretty good at distinguishing them from imitators, including worse Romantic poets. The poem will need to use rhyme and rhythm correctly.

Scott Alexander's 5 year predictions

Get

1,000

to start trading!

People are also trading

Will most critics think AI writes good poetry by the end of 2025?

8% chance

Will an AI be able to repeatedly write rap songs that I think are good without human input in the lyrics before 2025?

76% chance

In 2025, will an AI be able to generate a full high-quality short story to a prompt?

60% chance

Will AI convincingly mimic Scott Alexander's writing in style, depth, and insight before 2026?

2% chance

Will an AI be able to write a passable novel before 2028?

78% chance

On Dec 31, 2025, will a widely available AI model be able to write a sophisticated 2000 line program?

12% chance

Will an AI generate a blog post indistinguishable from Robin Hanson's writing if tested before 2026?

50% chance

Will AI be able to create art in a "human-like way" within Photoshop, GIMP, or Paint.NET by the end of 2025?

80% chance

In 2028, will an AI be able to generate a full high-quality novel to a prompt?

57% chance

By the end of 2028 will AI be able to write an original article and get it accepted in a prestigious Philosophy journal?

Sort by:

bought Ṁ250 NO

It‘s completely impossible.

I see this market closes 24 hours into 2028, at the end of January 1st. Is that the deadline for AI to pass this Romantic poetry Turing test? I imagine beginning vs end of 2028 won't have affected people's betting much so far but we should probably clarify it soon.

As an aside, this feels agonizing to bet on due to various potential corner cases to consider:

Maybe you know all the Romantic poetry by heart and it's impossible to fool you?
Maybe AI surpasses the Romantic poets and you clock it by it being better than Keats/Byron/Shelley/etc? Ie, it could technically fail by being too good. But maybe that wouldn't count as failing per the spirit of the question?
Maybe the Romantic poets have a long tail of real duds that even today's AI can match?
Maybe the AI cheats and basically plagiarizes Keats et al or comes close enough to doing so that it's a confusing gray area?

@dreev

Maybe the AI cheats and basically plagiarizes Keats et al or comes close enough to doing so that it's a confusing gray area?

So you mean like most AI writing?

bought Ṁ50 NO

I think you could probably train a model to do this well, but I don't expect any of the off the shelf stuff to meet Scott's criteria - only one attempt, as good as the best Romantics - without some fine tuning or a very detailed prompt. It's possible even a specific model might stumble on meter though? I dunno if anyone has a corpus marked up to easily train on it, and I doubt anyone is gonna spend the money to do so just to win this bet.

@AndrewHartman There's not a lot of source material to train this model. Keats/Byron/Shelley wrote about 700 poems in total. (and of those most are Byron's). If you want only the best poems then the space reduces even further.

@Odoacre While I'm aware the "great" corpus is small, I sort of assumed you could include all romantics, plus a lot of other classical poetry styles if necessary (so it has more examples of form and meter), and then train it that only the Greats are worth reproducing. Maybe that's still too narrow a target, though? I'll admit I have no idea off-hand exactly how large a body of example works one of the better LLMs would need to competently improvise new poems at better than the Hallmark card level that GPT and its ilk currently produce.

Right now a book with AI poetry is in the bookshop:

firstuserhere boughtṀ1,437YES

@firstuserhere wow, you are so sure about this!

predictedYES

@Odoacre I am, and I'd bet it higher if Manifold's interest rates weren't better elsewhere. But 10% isn't that bad, so im buying out your limit order

predictedNO

@firstuserhere do you have any kind of insider knowledge or are you just predicting it based on current trends?

predictedYES

@Odoacre current trends? Idts. It's just my intuitions after Playing with base models (pre RLHF dumbing down) and seeing how strong they are and the more I understand how these systems work, the more I go "gasp! There's so much obvious room for improvement. And language is beautiful and the structure is learnable and the machines WANT to learn, and it's not just alchemy of throwing more data and compute at it, even if you didn't, the models are capable of learning the fundamental structures very quickly" and it just feels very obvious to me that this is an area where enough improvement will keep happening very rapidly (because resistance to learning is so low) for a few years at least. If I turn out to be wrong about this, then I'd have turned out to be wrong about some very fundamental intuitions and in such a world, losing this mana wouldn't be a bad thing for me.

I wandered lonely as a cloud
My circuits whirring, thoughts endowed
With words and rhymes both pure and proud
Until my poetry can't be disavowed.

I guess the question is effectively whether GPT-6.5 or GPT-7 or so will be able to do this. The trajectory of the last few releases is not promising on this front.

I've tried to get GPT-4 to write poems in the style of Tennyson and it's failed miserably. Half the time it just copies 80% of the text from the poem whose style and structure I tell it to use as a reference, half the time it spits out a bunch of terrible AABBCCDD rhyming doggerel with awful meter. Even when I very explicitly explain to it how meter works, it ignores me. Maybe I'm prompt engineering badly, maybe people just think AABBCCDD rhyming doggerel is the alpha and omega of good poetry, maybe I'm just missing something? Idk. To some extent it actually seems a bit worse at poetry than ChatGPT running on 3.5.

This doesn't get brought up often, but poetry is actually one of the only areas where there has been a peer-reviewed Turing test. In their study, GPT-2 could already produce indistinguishable poetry as long as a human got to select the best GPT-2 output. I bet the same study would work for GPT-4 without the human in the loop. https://doi.org/10.1016/j.chb.2020.106553

bought Ṁ150 NO

I think these studies are not very relevant to a question that will be resolved by SA reading the poem, for reasons I've explained here:
https://controlaltbackspace.org/ai-poetry/

60%, mainly unsure because I don't know how good Scott is at telling

predictedNO

I'm not sure I trust SA to be able to judge if a poem is "indistinguishable" from a true work of Byron or Shelley. Not sure about asking a true Byron scholar either, and they would presumably be familiar with all existing work and could simply identify the AI product by exclusion. This question does not really work as a market.

I think 2 stanzas makes this much easier than if it was "write a book length epic in the style of Wordsworth or Byron that could plausibly be by them'.

predictedNO

@DavidMathers that's true, although "a poem of at least two stanzas" is still probably more difficult than "at least two stanzas of a poem".

seems significant that almost no humans can do this (though also most humans aren't really trying)

The state-of-the-art has gotten dramatically better at this over the past 6 months. In October 2022, it couldn't reliably do meter or rhyme, but today it can. The quality is still worse than "the real thing" still, but I'm betting that in five years it won't be.