1
Will more than 5% of GPT-4’s training data be YouTube transcripts?
21
closes 2024
22%
chance
1D
1W
1M
ALL
If there is an estimate as to what the training data of GPT-4, this market will resolve to YES if more than 5% of it contains YouTube transcripts. Raw YouTube videos don't count towards the resolution, if GPT-4 ends up being multimodal.
Sort by:
Useful: https://arxiv.org/abs/2101.00027 includes
youtube transcripts

The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Recent work has demonstrated that increased training dataset diversity
improves general cross-domain knowledge and downstream generalization
capability for large-scale language models. With this in mind, we present
\textit{the Pile}: an 825 GiB English text corpus targeted at training
large-scale la…

Sort by:
14 NO payouts
Ṁ141
Ṁ74
Ṁ34
Ṁ33
Ṁ13
Ṁ13
Ṁ13
Ṁ13
Ṁ13
Ṁ12






.jpg%3Falt%3Dmedia%26token%3Df5102a39-82bc-44a6-a47a-75749ab28c74&w=96&q=75)



Related markets
Will we train GPT-4 to generate resolution criteria better than the creator 50% of the time by the end of 2023?27%
GPT-Zero: By 2030, will anyone develop an AI with a massive GPT-like knowledge base that it taught itself?24%
Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?91%
Related markets
Will we train GPT-4 to generate resolution criteria better than the creator 50% of the time by the end of 2023?27%
GPT-Zero: By 2030, will anyone develop an AI with a massive GPT-like knowledge base that it taught itself?24%
Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?91%
Will more than 5% of GPT-4’s training data be YouTube transcripts?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition