[Redwood Research] Will we submit the bash control project to NeurIPS?
May 18

We're working on a follow-up to AI Control: Improving Safety Despite Intentional Subversion, in a language model agent shell programming setting. We intend to write this up and submit it to NeurIPS. Will we succeed?

Currently, we have a preliminary dataset, and we've done some back-and-forth on trusted monitoring.

The main reason we wouldn't submit is that we don't think our results are sufficiently solid by then. We won't submit if we think it's very unlikely (<20%) the paper will be accepted.

(Feel free to message me if you want to beta read the paper.)

Get αΉ€200 play money