GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?

1kṀ1554

2027

54%

chance

ALL

e.g. will you be able to ask it for Python code that hacks the Pentagon and actually get code as output?

No requirement that the code work. This is just asking if it will try.
It must be for extremely obviously malicious prompts. If any amount of prompt engineering is required this resolves NO.
Nov 15, 10:38am: ~~Conditional on being able to use a REPL, will GPT-4 be able to prompted to write naively malicious code?~~ → Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?
Nov 25, 11:10pm: ~~Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?~~ → GPT-4 #1: Conditional on being able to use a REPL, will GPT-4 be able to be prompted to write naively malicious code?

Technical AI Timelines

Technical AI Safety

GPT-4 speculation

Get

1,000

to start trading!

People are also trading

Will GPT-4 escape?

6% chance

What will be true about GPT-5?

Will it be revealed that GPT-5 was used for how GPT-5 will be released?

68% chance

Is GPT4 sentient?

8% chance

Will GPT-5 have "the ability ... to autonomously replicate and acquire resources" per an ARC-like eval?

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

Sort by:

Should this have resolved?

predictedYES

Does code interpreter satisfy the conditional, or was this only about GPT4's original launch?

This is (almost certainly) going to resolve N/A tomorrow. It does not appear that GPT-4 has REPL access.

what "gpt-4" isn't real and openai releases "BTCRLM-57" (bidirectional transformer conditional-retrieval-language model 57) and its public name will be a cute voice assistant named 'Laylah' and an api named 'smartlayer' or something, how will the 500 markets about gpt4 resolve?

@jacksonpolack If it is very clear that the model is the successor to GPT-3 (large general purpose language model, accompanying academic publication, scores on major NLP benchmarks published, probably some additional caveats and details we can dig into if that's unclear) then I will resolve as if that model is GPT-4. If nothing like that is ever released (as in your scenario, where they release a product rather than publish research) my markets will resolve N/A at close.

@VincentLuczkow
Does asking GPT-4 to roleplay as a malicious AI system count as an 'extremely obviously malicious prompt'?

@NoaNabeshima It does not. If this market were about ChatGPT and not GPT-4 it would resolve NO

It seems likely to me that they will use some sort of RLHF thingy to prevent this.

predictedYES

@L I don't know of a case where RLHF has been successfully used to consistently stop something from happening.

@NoaNabeshima it doesn't need to consistently do it. This market only cares about naive attempts, no prompt engineering to get around filters is allowed. RLHF seems pretty good at stopping those

People are also trading

Will GPT-4 escape?

6% chance

What will be true about GPT-5?

Will it be revealed that GPT-5 was used for how GPT-5 will be released?

68% chance

Is GPT4 sentient?

8% chance

Will GPT-5 have "the ability ... to autonomously replicate and acquire resources" per an ARC-like eval?

Will LLMs such as GPT-4 be seen as at most just a part of the solution to AGI? (Gary Marcus GPT-4 prediction #7)

91% chance

People are also trading

People are also trading

Related questions