Was synthetic video data generated for training Sora, eg large scale clips from Unreal Engine as suggested by this tweet. It does not count if there happens to be synthetic data in the training data. This will resolve yes if the consensus is large scale synthetic data generation was used for Sora.
If there is no clear consensus, I will resolve it based on my judgement using any material released by OpenAI.
Related questions
https://twitter.com/Carnage4Life/status/1768267008192164313
No mention of internally generated synthetic data 🤔
The description specifically says:
Was synthetic data generated for training Sora, ie large scale clips from Unreal Engine as suggested by this tweet
We already knew that autocaptioning was used from the openai sora post before the market was created
Training text-to-video generation systems requires a large amount of videos with corresponding text captions. We apply the re-captioning technique introduced in DALL·E 330 to videos. We first train a highly descriptive captioner model and then use it to produce text captions for all videos in our training set. We find that training on highly descriptive video captions improves text fidelity as well as the overall quality of videos.
I think it's obvious that the intent of this market didn't include auto-captioning, and was intended to capture if that tweet was accurate.
(if we want to go by literal english meaning, "i.e." means "is" or "specifically" so it specifies what the previous clause means, while if it said "e.g." the video captions would potentially be one example of many)
@Shump Good question! I extended the close time until the end of the year. I will resolve this question early if new information is available before then and resolve as N/A otherwise.
limit order up for 10k no!