[Redwood Research] Will we submit the bash control project to NeurIPS?

1kṀ629

resolved May 19

Resolved

ALL

We're working on a follow-up to AI Control: Improving Safety Despite Intentional Subversion, in a language model agent shell programming setting. We intend to write this up and submit it to NeurIPS. Will we succeed?

Currently, we have a preliminary dataset, and we've done some back-and-forth on trusted monitoring.

The main reason we wouldn't submit is that we don't think our results are sufficiently solid by then. We won't submit if we think it's very unlikely (<20%) the paper will be accepted.

(Feel free to message me if you want to beta read the paper.)

Get

1,000

to start trading!