Will future large video model (understanding) use pixel loss or embedding loss?
Basic
7
Ṁ922028
1D
1W
1M
ALL
35%
Pixel loss
27%
Embedding Loss
38%
Neither
Examples of models with pixel loss:
MAE
iGPT
LVM
Examples of models with embedding loss:
I-JEPA
If end up people use diffusion model (DDPM) to pretrain large video understanding model, then resolve pixel level.
Will resolve in EOY 2027 by consulting expert/public opinions. Among all factors that decides the resolution, the paradigm that the SOTA video understanding model uses will be most indicative.
Discrete cross entropy loss (transformer+vqvae) will resolv eto Neither
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines?
82% chance
Will video generation AI make more product revenue than text models in 2025?
32% chance
Will OpenAI release next-generation models with varying capabilities and sizes?
77% chance
Will AI figure out who has the highest visual frame compression efficiency to working memory ratio by 2030?
50% chance
Will video dominate 2024 machine learning?
16% chance
Will OpenAI's next major LLM release support video input?
55% chance
Will we learn by EOY 2024 that large AI labs use something like activation addition on their best models?
23% chance