In 2028, will an AI be able to write poetry indistinguishable from that of a great Romantic poet?
Basic
180
49k
2028
60%
chance

Resolves as positive if an AI asked to write poetry in the style of Keats/Byron/Shelley etc can, on its first attempt, write a poem of at least two stanzas which I personally can't distinguish from the real thing. I am a fan of these poets and I think I would be pretty good at distinguishing them from imitators, including worse Romantic poets. The poem will need to use rhyme and rhythm correctly.

Get Ṁ600 play money
Sort by:
bought Ṁ50 NO

I think you could probably train a model to do this well, but I don't expect any of the off the shelf stuff to meet Scott's criteria - only one attempt, as good as the best Romantics - without some fine tuning or a very detailed prompt. It's possible even a specific model might stumble on meter though? I dunno if anyone has a corpus marked up to easily train on it, and I doubt anyone is gonna spend the money to do so just to win this bet.

@AndrewHartman There's not a lot of source material to train this model. Keats/Byron/Shelley wrote about 700 poems in total. (and of those most are Byron's). If you want only the best poems then the space reduces even further.

@Odoacre While I'm aware the "great" corpus is small, I sort of assumed you could include all romantics, plus a lot of other classical poetry styles if necessary (so it has more examples of form and meter), and then train it that only the Greats are worth reproducing. Maybe that's still too narrow a target, though? I'll admit I have no idea off-hand exactly how large a body of example works one of the better LLMs would need to competently improvise new poems at better than the Hallmark card level that GPT and its ilk currently produce.

Right now a book with AI poetry is in the bookshop:

firstuserhereboughtṀ1,437YES

@firstuserhere wow, you are so sure about this!

predicts YES

@Odoacre I am, and I'd bet it higher if Manifold's interest rates weren't better elsewhere. But 10% isn't that bad, so im buying out your limit order

predicts NO

@firstuserhere do you have any kind of insider knowledge or are you just predicting it based on current trends?

predicts YES

@Odoacre current trends? Idts. It's just my intuitions after Playing with base models (pre RLHF dumbing down) and seeing how strong they are and the more I understand how these systems work, the more I go "gasp! There's so much obvious room for improvement. And language is beautiful and the structure is learnable and the machines WANT to learn, and it's not just alchemy of throwing more data and compute at it, even if you didn't, the models are capable of learning the fundamental structures very quickly" and it just feels very obvious to me that this is an area where enough improvement will keep happening very rapidly (because resistance to learning is so low) for a few years at least. If I turn out to be wrong about this, then I'd have turned out to be wrong about some very fundamental intuitions and in such a world, losing this mana wouldn't be a bad thing for me.

I wandered lonely as a cloud
My circuits whirring, thoughts endowed
With words and rhymes both pure and proud
Until my poetry can't be disavowed.

I guess the question is effectively whether GPT-6.5 or GPT-7 or so will be able to do this. The trajectory of the last few releases is not promising on this front.

I've tried to get GPT-4 to write poems in the style of Tennyson and it's failed miserably. Half the time it just copies 80% of the text from the poem whose style and structure I tell it to use as a reference, half the time it spits out a bunch of terrible AABBCCDD rhyming doggerel with awful meter. Even when I very explicitly explain to it how meter works, it ignores me. Maybe I'm prompt engineering badly, maybe people just think AABBCCDD rhyming doggerel is the alpha and omega of good poetry, maybe I'm just missing something? Idk. To some extent it actually seems a bit worse at poetry than ChatGPT running on 3.5.

This doesn't get brought up often, but poetry is actually one of the only areas where there has been a peer-reviewed Turing test. In their study, GPT-2 could already produce indistinguishable poetry as long as a human got to select the best GPT-2 output. I bet the same study would work for GPT-4 without the human in the loop. https://doi.org/10.1016/j.chb.2020.106553

60%, mainly unsure because I don't know how good Scott is at telling

predicts NO

I'm not sure I trust SA to be able to judge if a poem is "indistinguishable" from a true work of Byron or Shelley. Not sure about asking a true Byron scholar either, and they would presumably be familiar with all existing work and could simply identify the AI product by exclusion. This question does not really work as a market.

I think 2 stanzas makes this much easier than if it was "write a book length epic in the style of Wordsworth or Byron that could plausibly be by them'.

predicts NO

@DavidMathers that's true, although "a poem of at least two stanzas" is still probably more difficult than "at least two stanzas of a poem".

seems significant that almost no humans can do this (though also most humans aren't really trying)

The state-of-the-art has gotten dramatically better at this over the past 6 months. In October 2022, it couldn't reliably do meter or rhyme, but today it can. The quality is still worse than "the real thing" still, but I'm betting that in five years it won't be.