Resolves as positive if an AI asked to write poetry in the style of Keats/Byron/Shelley etc can, on its first attempt, write a poem of at least two stanzas which I personally can't distinguish from the real thing. I am a fan of these poets and I think I would be pretty good at distinguishing them from imitators, including worse Romantic poets. The poem will need to use rhyme and rhythm correctly.
I think you could probably train a model to do this well, but I don't expect any of the off the shelf stuff to meet Scott's criteria - only one attempt, as good as the best Romantics - without some fine tuning or a very detailed prompt. It's possible even a specific model might stumble on meter though? I dunno if anyone has a corpus marked up to easily train on it, and I doubt anyone is gonna spend the money to do so just to win this bet.
@AndrewHartman There's not a lot of source material to train this model. Keats/Byron/Shelley wrote about 700 poems in total. (and of those most are Byron's). If you want only the best poems then the space reduces even further.
@Odoacre While I'm aware the "great" corpus is small, I sort of assumed you could include all romantics, plus a lot of other classical poetry styles if necessary (so it has more examples of form and meter), and then train it that only the Greats are worth reproducing. Maybe that's still too narrow a target, though? I'll admit I have no idea off-hand exactly how large a body of example works one of the better LLMs would need to competently improvise new poems at better than the Hallmark card level that GPT and its ilk currently produce.
@Odoacre I am, and I'd bet it higher if Manifold's interest rates weren't better elsewhere. But 10% isn't that bad, so im buying out your limit order
@firstuserhere do you have any kind of insider knowledge or are you just predicting it based on current trends?
@Odoacre current trends? Idts. It's just my intuitions after Playing with base models (pre RLHF dumbing down) and seeing how strong they are and the more I understand how these systems work, the more I go "gasp! There's so much obvious room for improvement. And language is beautiful and the structure is learnable and the machines WANT to learn, and it's not just alchemy of throwing more data and compute at it, even if you didn't, the models are capable of learning the fundamental structures very quickly" and it just feels very obvious to me that this is an area where enough improvement will keep happening very rapidly (because resistance to learning is so low) for a few years at least. If I turn out to be wrong about this, then I'd have turned out to be wrong about some very fundamental intuitions and in such a world, losing this mana wouldn't be a bad thing for me.
I've tried to get GPT-4 to write poems in the style of Tennyson and it's failed miserably. Half the time it just copies 80% of the text from the poem whose style and structure I tell it to use as a reference, half the time it spits out a bunch of terrible AABBCCDD rhyming doggerel with awful meter. Even when I very explicitly explain to it how meter works, it ignores me. Maybe I'm prompt engineering badly, maybe people just think AABBCCDD rhyming doggerel is the alpha and omega of good poetry, maybe I'm just missing something? Idk. To some extent it actually seems a bit worse at poetry than ChatGPT running on 3.5.
This doesn't get brought up often, but poetry is actually one of the only areas where there has been a peer-reviewed Turing test. In their study, GPT-2 could already produce indistinguishable poetry as long as a human got to select the best GPT-2 output. I bet the same study would work for GPT-4 without the human in the loop. https://doi.org/10.1016/j.chb.2020.106553
I'm not sure I trust SA to be able to judge if a poem is "indistinguishable" from a true work of Byron or Shelley. Not sure about asking a true Byron scholar either, and they would presumably be familiar with all existing work and could simply identify the AI product by exclusion. This question does not really work as a market.
@DavidMathers that's true, although "a poem of at least two stanzas" is still probably more difficult than "at least two stanzas of a poem".