Resolves as YES if there is strong evidence that training GPT-5 on traces produced by o1/o3 type models (using CoT+RL / Inference-time scaling) improves performance across a broad set of tasks.
This performance bump must be observable outside of engineering, mathematics and science problems (which o1/o3 appear to be specifically trained towards solving) in order for this question to resolve as YES. The evidence must point in some way to a significant gain on these tasks being obtained by training on o1/o3 traces, beyond what the model would be capable of achieving without them.
If GPT-5 is released, and OpenAI makes a statement saying that the initial release version of GPT-5 was not trained on traces from o1/o3 type models, then this question remains open since they may successfully perform such fine-tuning at a later date.
If no strong evidence pointing towards such an effect emerges before this question's end date, then this question resolves as NO. If GPT-5 is never released publicly, then this question resolves as NO.
If there is no evidence that training on these traces generalises to improved performance on non-science/math/code tasks, then this question resolves as NO. If there is only weak evidence for this effect, then this question resolves as N/A. If GPT-5 is released, but there is no any evidence that it was ever trained/fine-tuned on o1/o3 traces before this question's end date, then this question resolves as NO.
The intuition here is that training LLMs on hard science data, and especially code, improves performance across the board for GPT-3 / GPT-4 class models. It is possible that the same is true for GPT-5 class models, however human-generated data is mostly saturated. Using o1/o3 to generate code/science/math reasoning traces may or may not result in a similar effect for GPT-5 class models.