Will DPO variants mostly replace RLHF before EOY 2024?
Basic
12
Ṁ411
Dec 31
40%
chance

Motivation

Resolves YES if ≥3 out of the 5 top LLMs on Chabot Arena use (a variant of) DPO on 2024/12/31.

Get
Ṁ1,000
and
S3.00
Sort by:

I think this market fundamentally cannot resolve as stated because Chatbot Arena includes closed LLMs which do not disclose the details of how they have been finetuned. GPT-4 and Bard might already be using DPO and we wouldn't know about it. To my chagrin, some people (for example Nous Research) categorize DPO as a variant of RLHF, so there's plausible deniability whenever OpenAI or Google refer to their finetuning as "RLHF"

@NoraBelrose that's a good point. If there's uncertainty about close source models at the end of the year I might resolve N/A or push the deadline in the hope that the information will surface later.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules