Will Ethan Caballero get a LOSS FUNCTION named after him? (the "Caballero loss function")
Yes
No
See results

Short answer: uncountably many. You can cook up a new loss by twiddling a temperature, composing with a monotone transform, or duct-taping two others with a Lagrange multiplier. Humans keep inventing them because apparently wheels weren’t enough.

Useful answer: most losses fall into a few big tribes. Here’s the map so you don’t drown.

Core families

  • Squared-error & friends (Bregman divergences)

    • MSE, MAE, Huber, Quantile/Pinball, Tukey biweight.

    • All come from convex generators; robust vs outlier-loving is just a knob.

  • Proper scoring rules for probabilistic models

    • Log loss / NLL (cross-entropy), Brier, Continuous Ranked Probability Score (CRPS), Energy score.

    • “Proper” means you minimize expected loss by telling the truth about your beliefs. Rare concept online.

  • f-divergences between distributions

    • KL, reverse KL, Jensen-Shannon, Hellinger, total variation.

    • GANs implicitly target JS or other f-divergences via variational bounds.

  • Integral Probability Metrics (IPMs)

    • Wasserstein (Earth-Mover), MMD, Energy distance.

    • Popular when you want geometry, not just overlap.

  • Margin/surrogate classification losses

    • Hinge, squared hinge, logistic, exponential, focal, label-smoothing CE.

    • All trade calibration, margins, and gradient behavior.

  • Ranking/ordinal/structured

    • Pairwise (BPR), listwise (ListNet/ListMLE), NDCG surrogates, contrastive InfoNCE, triplet, ordinal regression losses.

  • Geometric/metric learning

    • Contrastive, triplet, N-pair, ArcFace/CosFace, Center loss.

    • Pull same things together, push different things apart, like high school lunch tables.

  • Regression beyond L2

    • Quantile (τ-pinball), expectile, asymmetric Huber, log-cosh, Poisson/NegBin deviance for counts.

  • Generative modeling

    • Likelihood-based: NLL for flows/autoregressive models; ELBO for VAEs (recon + KL).

    • Implicit: GAN objectives (non-saturating, WGAN + GP, f-GANs).

    • Score/diffusion: Denoising score matching, v-prediction, ε-prediction, hybrid DSM+CE.

  • Self-supervised/contrastive

    • InfoNCE, SimCLR, MoCo, BYOL’s predictor loss, Barlow Twins redundancy reduction, VICReg invariance-variance-covariance.

  • Segmentation/detection

    • Dice/F1 loss, Tversky, focal Tversky, IoU/GIoU/DIoU/CIoU, Hungarian matching loss for DETR-style models.

  • Sequence models

    • Token CE with label smoothing, CTC loss, RNNT, policy-gradient-style sequence risk (min Bayes risk, expected BLEU).

  • Reinforcement learning

    • Policy gradient (REINFORCE), entropy-regularized PG, PPO clipped objective, TRPO surrogate, Q-learning TD errors (Huber), actor-critic with value loss, distributional RL (C51, QR-DQN), offline RL penalties (CQL, BCQ), reward-model CE for RLHF/DPO.

  • Multimodal & retrieval

    • Symmetric cross-entropy + contrastive (CLIP), matching losses with temperature scaling, MIL/NCE variants.

  • Physics-informed & constraints

    • PDE residual MSE, boundary loss, hard/soft constraint penalties, differentiable simulators with task + physics consistency.

  • Calibration & uncertainty

    • Temperature scaling CE, ECE/Brier surrogates, NLL with proper priors, Dirichlet calibration losses.

  • Fairness/causal/robustness

    • Group DRO worst-case risk, CVaR risk, adversarial training (minimax with perturbation loss), IRM/IRMv1 penalties, counterfactual invariance losses.

  • Regularization terms that moonlight as “losses”

    • L1/L2 weight decay, spectral norm penalties, Jacobian/Frobenius penalties, orthogonality, sparsity (L0/Top-k), mutual information bounds, gradient penalties (WGAN-GP), consistency/EMA losses.

Why “infinite” isn’t just snark

  • Parameterized families: every proper scoring rule or Bregman divergence is a whole continuum. Pick a generator, get a loss.

  • Transforms/compositions: any monotone transform preserves argmin; weighted sums give new trade-offs; curricula anneal temperatures/margins.

  • Task-specific constraints: stick any domain residual or soft constraint onto your objective and congratulations, you invented Loss-XXL-2025.

Picking one without losing your mind

  • Predicting labels? Start with cross-entropy; try focal for class imbalance; consider label smoothing for better calibration.

  • Predicting numbers? MSE for Gaussian-ish noise, MAE/Huber for robustness, quantile for intervals.

  • Densities? Use NLL if you can write the likelihood; else try Wasserstein/MMD or a GAN variant.

  • Matching or retrieval? Go contrastive/InfoNCE with a temperature.

  • RL? Use the algorithm’s surrogate (PPO/TRPO) plus value loss and entropy.

  • Segmentation/detection? Mix CE + Dice/Tversky or GIoU for boxes.

  • Worried about worst-case groups or adversaries? Group DRO or adversarial loss.

  • Care about calibration? Optimize NLL and post-hoc calibrate.

So yes, infinitely many. But 95% of useful practice lives in a few dozen patterns, and the rest are glam-rock remixes with new hyperparameters and a different arXiv figure style.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy