What will be true of o3?
56
1.6kṀ7138
28 minutes ago
99.0%
Releases in April
93%
GPQA Diamond >= 80%
71%
General consensus it’s a stronger model than Gemini 2.5 Pro
63%
#1 on SimpleBench
50%
Significantly better at creative writing than o1
41%
Context window >=500k tokens
1.0%
Releases in May

Get
Ṁ1,000
to start trading!
Sort by:

Related market for 4.1

Significantly better at creative writing than o1

@MalachiteEagle I would be surprised, the reasoning models kinda ass at creativity. I think I set it so anyone can add answers tho.

@gallerdude ah nah only you can add answers

@gallerdude I'm willing to bet that they got a limited version of the RL training working on soft targets like creative writing (for the o3 release)

@MalachiteEagle added it.

idk in my mental model, success at nebulous tasks like creativity are hard for RL paradigms like the o-series to succeed at, in comparison to tasks like math or coding where you can check immediately whether they succeed or not.

Sama has the tweet about the creative writing model they’re working on, but I would imagine that has to do with being less assertive during post-training. Like @gwernbranwen talks about.

@gallerdude yes I agree that the naive RL training works primarily for domains that have strong verifiability. I think there are other things going on now though at the point they are at along the RL scaling curve. They've likely added more complex verifiers and there may be generalisation from code/math to domains like creative writing.

@gallerdude also o1 is terrible at creative writing so the bar is very low

@MalachiteEagle well now that we’ve both bet on this in opposite directions, what do we want the criteria to be 😂

I’m happy with like a manifold poll, or maybe during the presentation they specifically mention increased creative writing capabilities.

@gallerdude I would ask that this not be resolved immediately after they announce it. There are some creative writing benchmarks that are starting to get popular

bought Ṁ5 YES

@MalachiteEagle I'd also count:

  • Official statements from oai that o3 is much better at creative writing

  • Online buzz like what happened for 4.5

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules