Will future large video model (understanding) use pixel loss or embedding loss?

235Ṁ92

2028

ALL

35%

Pixel loss

27%

Embedding Loss

38%

Neither

Examples of models with pixel loss:

MAE
iGPT
LVM

Examples of models with embedding loss:

I-JEPA

If end up people use diffusion model (DDPM) to pretrain large video understanding model, then resolve pixel level.

Will resolve in EOY 2027 by consulting expert/public opinions. Among all factors that decides the resolution, the paradigm that the SOTA video understanding model uses will be most indicative.

Discrete cross entropy loss (transformer+vqvae) will resolv eto Neither

Get

1,000

to start trading!

1 Comment

6 Holders

10 Trades

Sort by:

I thought pixel loss will increase with DDPM

here is the ambiguity part. What if someone uses V-JEPA as encoder, diffusion as backbone, and something else as decoder?

People are also trading

Will OpenAI's next major LLM release support video input?

37% chance

Will OpenAI release next-generation models with varying capabilities and sizes?

75% chance

Will video generation AI make more product revenue than text models in 2025?

21% chance

Will AI figure out who has the highest visual frame compression efficiency to working memory ratio by 2030?

50% chance

People are also trading

Related questions