By when will Redwood Research publish a paper on sandbagging?
Basic
7
Ṁ4180resolved Jun 1
Resolved
NO2024-03-01
Resolved
NO2024-04-01
Resolved
NO2024-05-01
Resolved
NOWe give up on this project
Resolved
YES2024-10-01
Resolved
YES2024-08-01
Resolved
YES2024-06-01
We're doing some empirical work on sandbagging, focused mainly on exploration hacking (https://www.lesswrong.com/posts/dBmfb76zx6wjPsBC7/when-can-we-trust-model-evaluations#2__Behavioral_RL_Fine_Tuning_Evaluations).
We're targeting a somewhat shorter project than prior projects.
If redwood disbands, but we finish a project which is basically continuous with this project at another organization, that counts. Anything which is continuous with our current work in my judgement counts.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will I feel "very happy" about the ex-post results of the Redwood Research sandbagging project on 2025-04-01?
85% chance
Will Redwood Research still exist in 3 years?
36% chance
Will Redwood Research still exist in 2 years?
79% chance
Will Redwood Research still exist in 5 years?
32% chance