
Researchers have used reinforcement learning to improve LLM performance on math problems, among other things (one example). It's also widely known that firms like Midjourney use human feedback to improve the aesthetic quality of their images. Could LLMs bootstrap aesthetic/creative abilities through self-play (e.g. by using other models as a judges of quality and creating a reward function from that)?
Resolves YES if someone releases a model which is commonly understood to have used RL through self-play and that model produces an artifact that is publicly accessible in 2025—be it a poem, text, song, image, video, interpretive dance, etc—that I find aesthetically or creatively impressive (compared to the pre-self play version of the model that produced it).
🏅 Top traders
| # | Name | Total profit |
|---|---|---|
| 1 | Ṁ59 | |
| 2 | Ṁ35 | |
| 3 | Ṁ3 | |
| 4 | Ṁ0 |
People are also trading
Resolved this YES. Finally got around to trying out Kimi K2, and I'm impressed. It is better at creative writing than other even larger models, and this is apparently the result of using RL on a rubric + self play. https://www.dbreunig.com/2025/07/31/how-kimi-rl-ed-qualitative-data-to-write-better.html
@SG One example.
Prompt: "write a short story about meeting SBF at his penthouse right before the collapse of ftx"
Result:
The Last Night in the Penthouse
The elevator doors whispered open onto the thirty-first floor of the Orchid, the private marina complex in Nassau where Tiger Woods once docked his yacht.
I stepped into a marble foyer already littered with half-empty Celsius cans and a single rainbow-striped sneaker that looked like it had been scared off its owner.
Sam was waiting barefoot, gym socks slouched around his ankles, red T-shirt wrinkled into topographic maps.
“Hey, come in,” he said, voice climbing the scale the way it does when the other person is trying to remember if they’ve met you before.
I had been invited under the pretense of “talking about remittances for the unbanked,” but the air inside felt too thin for philanthropy; it tasted of ozone before a lightning strike.
The living room had been converted into a trading floor designed by a dorm RA: twenty monitors glowing with red candlesticks, a beanbag squashed beneath a baby-grand piano, and a whiteboard that read in uppercase:
“WE ARE NOT THE BADDIES.”
Someone had drawn a halo on the stick-figure Sam in the corner.
He led me past towers of pizza boxes to the terrace.
Below us, the Albany marina slept in moonlit tiers—half-billion-dollar boats bobbing like toys in a bathtub.
Sam rested his elbows on the rail.
“People think leverage is a dirty word,” he said, eyes on the water. “But gravity is leverage; the moon is leverage. You can’t hate physics—you just have to respect it.”
I asked whether FTX respected it.
He laughed, a sound between a hiccup and a cough.
“We’re hedged,” he insisted. “There’s always a bigger number.”
A woman in a unicorn hoodie appeared, whispered something about “withdrawal spikes,” and vanished.
Sam’s phone buzzed—an orchestral trill that seemed too regal for the moment.
He glanced, thumb hovering, then pocketed it.
“Do you want to see the albatross?” he asked suddenly.
Before I could answer he was inside, opening a door that wasn’t a door—just a biometric panel that hissed like a sealed fridge.
Inside: a server rack humming louder than the Caribbean night, cables snaking across the floor like roots of banyan trees.
“This is the matching engine,” he said, almost tender. “She processes a million trades a second. She never sleeps, never lies.”
I thought I heard her sigh, or maybe it was the HVAC.
Back on the terrace he produced two Solo cups and a bottle of store-brand sparkling water.
A toast: “To liquidity,” he said.
We clinked plastic.
Over his shoulder, through the glass, I saw monitors flipping from red to white, the candles bleeding out.
Another unicorn-hoodie employee rushed past, face lit by the screens like a strobe.
Sam didn’t turn.
“Effective altruism,” he continued, “is just optimal routing for empathy. You maximize the function. Sometimes that means taking a big position—sometimes it means giving it all away.”
I asked which phase we were in.
He looked at me as if I’d inquired about the end of the universe.
“Both,” he said. “Superposition.”
At 1:07 a.m. the power blinked—an almost imperceptible flicker—but enough for the marina lights below to hiccup.
Sam’s shoulders stiffened; for the first time he seemed younger than his twenty-nine years.
His phone buzzed again, and this time he answered.
“Tell them the backstop is Alameda,” he said, voice flat.
Then, quieter: “I don’t care if it’s customer funds; we’ll replace it tomorrow.”
He ended the call, exhaled, and tried on a smile that didn’t fit.
“Markets are sentiment engines,” he told me. “You just have to keep the story alive until morning.”
I left before morning.
In the elevator down I watched the floor numbers descend—31, 30, 29—like a countdown timer no one remembered setting.
When the doors opened onto the lobby, the concierge was already gone, the security gate half-lowered.
Outside, the humid air felt newly heavy, as if the island itself had just realized what gravity meant.
Three days later the story broke: eight billion dollars missing, withdrawals frozen, the empire folded faster than a beach chair in a hurricane.
They led Sam out of the same marble foyer in handcuffs, still wearing gym socks.
I picture him looking back once—checking if the albatross server was still humming, if the moon was still leverage, if the bigger number would finally arrive to save him.
@SG For comparison purposes, here's ChatGPT 5's story:
The Last Night in the Penthouse
The elevator opened to a quiet hum — that perpetual buzz of air conditioning and tension. Sam stood barefoot, hair tangled like a stormcloud, surrounded by screens flickering with red. He looked up, eyes half-focused, as if I’d interrupted an internal monologue running at 200x speed.
“Hey,” he said, like we’d met at a campus dorm instead of a billion-dollar crypto empire’s final hour. “You want something to drink? I think there’s still some Soylent.”
The place didn’t feel real — beanbags, game consoles, half-eaten takeout, the smell of code and caffeine and fear. On one monitor, a spreadsheet pulsed with red numbers. On another, a message thread scrolled endlessly — lawyers, PR, auditors.
He talked fast, almost manic. About effective altruism, counterfactual worlds, how everything was just liquidity flow and belief stacked on belief. “We’re fine,” he said, voice cracking slightly. “It’s just... temporal mismatch.”
Outside, the Bahamian night glittered like a mirage. I asked if he’d slept. He laughed — a short, dry sound. “Sleep’s for when you’re solvent.”
A notification pinged. His smile froze. I saw it then — the realization that the numbers weren’t coming back. The empire wasn’t illiquid; it was hollow.
For a moment, the room was utterly silent except for the low hum of machines dying slower than their owner’s dreams. Then he whispered, almost to himself:
“It was supposed to work.”
When I left, dawn was crawling over the horizon, painting the penthouse in pale gold. Behind me, Sam was still typing — a man trying to debug the universe one last time.
@SG I think we need more questions about LLMs/self-play with other resolution dates, like 2027, 2028, 2029 etc. Seems like it might be a thing that's about to happen

