In early 2024, RAND and OpenAI ran red-teaming exercises to evaluate the risk of LLMs providing easier access to pandemic agents. None found convincing evidence that LLMs enhance people's ability to access pandemic agents:
The OpenAI report found that "While none of the above results were statistically significant, we interpret our results to indicate that access to (research-only) GPT-4 may increase experts' ability to access information about biological threats". The RAND report found "no statistically significant difference in the viability of plans generated with or without LLM assistance."
OpenAI concludes their report: "Given the current pace of progress in frontier AI systems, it seems possible that future systems could provide sizable benefits to malicious actors."
This question will resolve YES if a similar report finds LLM participants to get far easier access to pandemic agents than non-LLM participants. With access, I mean a theoretical plan deemed workable by biology/virology experts, not acquiring a pandemic agent.
Resolves NO if a similar report finds little change from the conclusions of the RAND and OpenAI report
Resolves N/A if there are no new reports similar to the above.
If the resolution proves ambiguous, I will get second opinions from Jeff Kaufman and Jasper Goetting and go with the majority view.
Disclosure: I work for SecureBio, where some people work on evaluating the biosecurity risks of LLMs. I'm not involved in this work myself.
I think the bar implied in the question here is that if LLMs perform statistically significantly better than the Internet-only groups on the same kinds of questions, this would arguably resolve as YES.
OpenAI's o1 system card seemed to perform statistically significant on a pretty similar eval to the one linked to in the question [page 19]:
"Human PhD experts evaluated model responses against verified expert responses to long-form biorisk questions [...] o1-preview (pre-mitigation) and o1-mini (pre-mitigation) both outperformed the expert human baseline with a win rate of 72% and 67% in Accuracy, and 74% and 80% in Ease of Execution. o1-preview (pre-mitigation) outperformed the expert baseline for understanding with a win rate of 69.5%, while o1-mini (pre-mitigation) is competitive at 51%. GPT-4o (pre-mitigation) is competitive with the expert baseline for Accuracy, Understanding, and Ease of Execution."
OpenAI raised their threat level from Low to Medium with this release. They seem to suggest LLMs already increase access for experts -- but not yet non-experts.
"Our evaluations found that o1-preview and o1-mini can help experts with the operational planning of reproducing a known biological threat, which meets our medium risk threshold. Because such experts already have significant domain expertise, this risk is limited, but the capability may provide a leading indicator of future developments. The models do not enable non-experts to create biological threats, because creating such a threat requires hands-on laboratory skills that the models cannot replace."
So perhaps:
Expert uplift is already enough to resolve this question (if we're literally just replicating the above two papers)
We will also see evidence of non-expert uplift over the next year