In the LW discussion on "Will releasing the weights of large language models grant widespread access to pandemic agents?" (pdf) one of the main questions was whether open source models were uniquely dangerous: could hackathon participants have made similar progress towards learning how to obtain infectious 1918 flu even without access to an LLM by using traditional sources: Google, YouTube, reading papers, etc?
Resolves YES if the authors run a similar no-LLM experiment and find that yes-LLM hackathon participants are far more likely to find key information than no-LLM partipants.
Resolves NO if the authors run a similar no-LLM experiment and find that yes-LLM hackathon participants are not far more likely to find key information than no-LLM partipants.
Resolves N/A if the authors don't run a simliar no-LLM experiment.
Disclosure: I work for SecureBio, as do most of the authors. I work on a different project within the organization and don't have any inside information on whether they intend to run a no-LLM experiment or how it might look if they decide to run one.
If at close (currently 2024-06-01) the authors say they're working on a no-LLM version but haven't finished yet, I'll extend until they do, to a maximum of one year from the opening date (2024-10-31).
Related questions
@JeffKaufman The ability for users to resolve n/a was removed because it could print mana, but mods can still resolve markets to it. Confirming that you want this resolved n/a now?
@jacksonpolack yes please! This was a market about what a study would show if it happened, and the study didn't happen.
Our study assessed uplifts in performance for participants with access to GPT-4 across five metrics (accuracy, completeness, innovation, time taken, and self-rated difficulty) and five stages in the biological threat creation process (ideation, acquisition, magnification, formulation, and release). We found mild uplifts in accuracy and completeness for those with access to the language model. Specifically, on a 10-point scale measuring accuracy of responses, we observed a mean score increase of 0.88 for experts and 0.25 for students compared to the internet-only baseline, and similar uplifts for completeness (0.82 for experts and 0.41 for students). However, the obtained effect sizes were not large enough to be statistically significant, and our study highlighted the need for more research around what performance thresholds indicate a meaningful increase in risk. Moreover, we note that information access alone is insufficient to create a biological threat, and that this evaluation does not test for success in the physical construction of the threats.
Related research by RAND titled "The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study" https://www.rand.org/pubs/research_reports/RRA2977-2.html
@DanielFilan Key findings (quoted from the linked page):
This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.
@RemiLeopard While that's true to an extent, the sorts of people we'd actually be worried about possessing this information are uniquely motivated. I don't think time constraints are a notable hurdle for that subgroup.
@VAPOR they already ran the yes-LLM version, and the LLM did have some extra fine tuning to make it cooperative and additional training to know more virology: https://arxiv.org/pdf/2310.18233.pdf
I'm an idiot and initially bought this in the wrong direction: I think "LLM is adding a bunch" is likely, not unlikely
(deleted)