By when will Redwood Research publish a paper on sandbagging?
Basic
7
Ṁ4180
resolved Jun 1
Resolved
NO
2024-03-01
Resolved
NO
2024-04-01
Resolved
NO
2024-05-01
Resolved
NO
We give up on this project
Resolved
YES
2024-10-01
Resolved
YES
2024-08-01
Resolved
YES
2024-06-01

We're doing some empirical work on sandbagging, focused mainly on exploration hacking (https://www.lesswrong.com/posts/dBmfb76zx6wjPsBC7/when-can-we-trust-model-evaluations#2__Behavioral_RL_Fine_Tuning_Evaluations).

We're targeting a somewhat shorter project than prior projects.

If redwood disbands, but we finish a project which is basically continuous with this project at another organization, that counts. Anything which is continuous with our current work in my judgement counts.

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules