Will DPO variants mostly replace RLHF before EOY 2024?

MANIFOLD

Ṁ710Ṁ511

resolved Feb 20

Resolved

N/A

ALL

Motivation

Resolves YES if ≥3 out of the 5 top LLMs on Chabot Arena use (a variant of) DPO on 2024/12/31.

Market context

Get

1,000

to start trading!

Sort by:

I think this market fundamentally cannot resolve as stated because Chatbot Arena includes closed LLMs which do not disclose the details of how they have been finetuned. GPT-4 and Bard might already be using DPO and we wouldn't know about it. To my chagrin, some people (for example Nous Research) categorize DPO as a variant of RLHF, so there's plausible deniability whenever OpenAI or Google refer to their finetuning as "RLHF"

@NoraBelrose that's a good point. If there's uncertainty about close source models at the end of the year I might resolve N/A or push the deadline in the hope that the information will surface later.

Mixtral-Instruct was trained with DPO

People are also trading

Will the ωB97M-V functional (DFT) be widely regarded as obsolete by EOY 2027?

50% chance

Will a single model have all the upsides o1-style RL with none of the downsides at 2027?

68% chance

Will Roam Research notes still exist by EOY2033?

31% chance

Will the Duopoly fall by EOY 2027?

41% chance

Will I get follistatin gene therapy by EOY 2026?

26% chance

People are also trading

Related questions