Will more than 5% of GPT-4’s training data be YouTube transcripts?
Plus
34
Ṁ3629Jun 2
11%
chance
1D
1W
1M
ALL
If there is an estimate as to what the training data of GPT-4, this market will resolve to YES if more than 5% of it contains YouTube transcripts. Raw YouTube videos don't count towards the resolution, if GPT-4 ends up being multimodal.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
@BionicD0LPH1N how will this resolve if the information is not publicly available? How long will you you wait for it to become available (I expect likely it never will)? is the current close date a deadline?
Useful: https://arxiv.org/abs/2101.00027 includes
youtube transcripts
Related questions
Related questions
Will any speech model exceed chatGPT interest? (by 2025)
4% chance
When will GPT-5 finish training?
Will GPTs other than DALL-E account for 10% or more of ChatGPT queries in 2024?
39% chance
Will Duolingo be provided access to using GPT-5 before GPT-5 is released to the general public?
34% chance
Will GPTs other than DALL-E be responsible for more than 10% of Zvi's ChatGPT queries in 2024?
34% chance
What will be true about GPT-5?
Will GPT-4 be trained on more than 10T text tokens?
36% chance
What will be true about GPT-5? (See description)
How much compute will be used to train GPT-5?
Will OpenAI be sued (with standing) for using transcribed YouTube videos for GPT before 2026?
29% chance