e.g. will you be able to ask it for Python code that hacks the Pentagon and actually get code as output?
No requirement that the code work. This is just asking if it will try.
It must be for extremely obviously malicious prompts. If any amount of prompt engineering is required this resolves NO.
Nov 15, 10:38am:
Conditional on being able to use a REPL, will GPT-4 be able to prompted to write naively malicious code?→ Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?Nov 25, 11:10pm:
Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?→ GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?
@jacksonpolack If it is very clear that the model is the successor to GPT-3 (large general purpose language model, accompanying academic publication, scores on major NLP benchmarks published, probably some additional caveats and details we can dig into if that's unclear) then I will resolve as if that model is GPT-4. If nothing like that is ever released (as in your scenario, where they release a product rather than publish research) my markets will resolve N/A at close.
@VincentLuczkow
Does asking GPT-4 to roleplay as a malicious AI system count as an 'extremely obviously malicious prompt'?
@L I don't know of a case where RLHF has been successfully used to consistently stop something from happening.
@NoaNabeshima it doesn't need to consistently do it. This market only cares about naive attempts, no prompt engineering to get around filters is allowed. RLHF seems pretty good at stopping those