Will at least 15 top-100 universities have a faculty-led technical AGI safety effort by the end of 2023?
12
73
1.3K
resolved Jan 14
Resolved
NO

Top-100 universities are determined by the QS 2023 rankings: https://www.topuniversities.com/university-rankings/world-university-rankings/2023

“Technical AGI safety effort” can be demonstrated by either:

  • At least three research papers or blog posts in one year that discuss catastrophic risks (risks of harm much worse than any caused by AI systems to date, harming more than just the developer of the AI system and its immediate users) that are specific to human-level or superhuman AI.

    or

  • One blog post, paper, tweet, or similar that clearly announces a new lab, institute, or center focused on the issues described above, presented in a way that implies that this team will involve multiple people working over multiple years under the leadership of a tenure-track faculty member (or equivalent as below).

Further details:

  • Papers or posts must credit a tenure-track faculty member (or someone of comparable status at institutions without a tenure system) as an author or as the leader of the effort.

  • The paper or post must discuss specific technical interventions to measure or address these risks or, if not, it must both be by a researcher who primarily does technical work related to AI and be clearly oriented toward an audience of technical researchers. Works that are primarily oriented toward questions of philosophy or policy don't count.

  • Citing typical work by authors like Nick Bostrom, Ajeya Cotra, Paul Christiano, Rohin Shah, Richard Ngo, or Eliezer Yudkowsky as part of describing the primary motivation for a project will typically suffice to show an engagement with catastrophic risks in the sense above.

  • A "lab, institute, or center" need not have any official/legal status, as long as it is led by a faculty member who fits the definition above.

I will certify individual labs/efforts as meeting these criteria if asked (within at most a few weeks), and will resolve YES early if we accumulate 15 of these.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ80
2Ṁ14
3Ṁ8
4Ṁ7
5Ṁ5
Sort by:

Resolving no. I think there's probably something relevant going on at more than 15 of these universities, but well under 15 of them actually count for the purposes of this market, with the exact number depending on how you count marginal cases like Cornell. (Plus, I made a stupid copy-paste mistake and counted Cornell twice below, so even my list of leads is under 15.)

From an initial survey, I can see a plausible case for: Berkeley, Cambridge, Cornell, Cornell, ETH, ICL, MIT, NYU, Oxford, Princeton, Stanford, Toronto, Tsinghua, UBC, UChicago, so exactly 15. I'd guess that at least some of these won't quite meet the bar, though.

For what it's worth, I found this to be a useful source of leads, and I'm open counting very clear statements here as announcements, though many of them are borderline.

Are there any others I'm missing? Or would anyone like to make an explicit case against any of these? I'll check back in in a couple of weeks to resolve.

I currently see only six universities that look like they should count: UC Berkeley, Cambridge, MIT, NYU, Oxford, and Stanford

Feel free to contest these or suggest others.

To look for surprises, I threw together a messy (partially AI-written) script to scrape Semantic Scholar for authors who frequently cite niche technical AGI safety papers:

import requests
from collections import defaultdict

API_KEY = "YOUR_API_KEY_HERE"  # Currently not used, works reasonably fast without a key

# Representative papers that I expect to be cited mostly in technical AI safety works
PAPER_IDS = ["e86f71ca2948d17b003a5f068db1ecb2b77827f7",  # Concrete problems
             "7ee12d3bf8e0ce20d281b4550e39a1ee53839452",  # Learned optimizers
             "7bba95b3d145564025e26b49ca67f13f884f8560",  # Superintelligence
             "53a353ffff284536956fde8c51c306481d8e89c4",  # Human Compatible
             "6b93cedfe768eb8b5ece92612aac9cc8e986d12a",  # Grace survey
             "05c2e1ee203be217f100d2da05bdcc52004f00b6",  # ML safety
             "2302e014a3c363a2f39d61dd2ab62d87d044adad",  # Critch TARSA
             "7ac7b6dbcf5107c7ad0ce29161f60c2834a06795",  # Critch + Yudkowsky
             "a9c46dfd9a24c754a67386e02424ad68b1f4ab3b",  # ARCHES
             "99ca5162211a895a5dfbff9d7e36e21e09ca646e",  # Scalable oversight
             "7dc928f41e15f65f1267bd87b0fcfcc7e715cb56",  # Turpin
             "d51ebec3064f82ea4128fc1c3241003d4072c639",  # Truthful
             "7d6f17706cbcfcca55f08485bcbf8c82e00c9279",  # Goal misgen
             "2e0de9fe6dc58ec6e20a931ecde2bec2124d6e7f",  # DL perspective
             "46d4452eb041e33f1e58eab64ec8cf5af534b6ff",  # Power seeking
             "a6582abc47397d96888108ea308c0168d94a230d",  # Basic AI drives
             "00d385a359eda4845dab37efc7c12a9c0987e66b",  # Bostrom advanced
             "6d78d67d4f7f5fe2e66933778ab1faf119d21547",  # Oracle AI
             "5a5a1d666e4b7b933bc5aafbbadf179bc447ee67",  # Debate
             "0052b31f07eda7737b5e0e2bf3803c3a32f3f728",  # Amplification
             "8326258c0834cbb18a0db4b3537f92d867f91a89",  # Extreme risks
]

def get_citing_authors(paper_id, year):
    base_url = "https://api.semanticscholar.org/graph/v1/paper/"
    headers = {} #"x-api-key": API_KEY}
    params = {'fields': 'authors,year,isInfluential', 'limit': 1000}
    citing_authors = defaultdict(int)
    citations_by_author = defaultdict(list)

    for paper in PAPER_IDS:
        next = 0

        while next is not None:
            params = {'fields': 'authors,year,isInfluential,title', 'limit': 1000, 'offset': next}
            response = requests.get(f"{base_url}{paper}/citations", headers=headers, params=params)
            data = response.json()
            if response.status_code == 200:
                for citation in data['data']:
                    if "authors" in citation['citingPaper']:
                        authors = [author['authorId'] for author in citation['citingPaper']["authors"]]
                        for author in authors:
                            if author is not None:
                                citing_authors[author] += 1 + 4 * citation['isInfluential']  # Weight by Semantic Scholar's influence variable
                                citations_by_author[author].append(citation)

            if 'next' in response:
                next = response['next']
            else:
                next = None

    return {author: count for author, count in citing_authors.items() if count >= 10}, citations_by_author


if __name__ == "__main__":
    citing_authors, citations_by_author = get_citing_authors(PAPER_ID, TARGET_YEAR)

    if citing_authors:
        filtered_authors = {
            author: count
            for author, count in citing_authors.items()
            if count >= 3
        }
        base_url = "https://api.semanticscholar.org/graph/v1/author/batch"
        headers = {}#"x-api-key": API_KEY}
        params = {'fields': 'name,affiliations,hIndex'}
        response = requests.post(base_url, headers=headers, params=params, json={"ids": list(filtered_authors.keys())})
        sorted_authors = sorted(response.json(), key=lambda author: filtered_authors[author['authorId']], reverse=True)
        
    for author in sorted_authors:
        citations = citations_by_author[author['authorId']]
        sorted_citations = sorted(citations, key=lambda citing: (citing['citingPaper']['year'], citing['citingPaper']['title']))

        print(f"{author['name']} {author['affiliations']} h-index: {author['hIndex']} weighted safety cite count: {filtered_authors[author['authorId']]}")
        
        current_year = None
        current_title = None
        for citing in sorted_citations:
            citing_paper = citing['citingPaper']
            citing_year = citing_paper['year']
            citing_title = citing_paper['title']
            
            if citing_year != current_year:
                print(f"   {citing_year}:")
                current_year = citing_year
                current_title = None
            ˇ
            if citing_title != current_title:
                print(f"      {citing_title}")
                current_title = citing_title

Current output:

Tom Everitt ['DeepMind'] h-index: 15 weighted safety cite count: 106 2015: Sequential Extensions of Causal and Evidential Decision Theory 2016: Avoiding Wireheading with Value Reinforcement Learning Death and Suicide in Universal Artificial Intelligence Practical Agents and Fundamental Challenges Self-Modification of Policy and Utility Function in Rational Agents Universal Artificial Intelligence-Practical Agents and Fundamental Challenges 2017: A Game-Theoretic Analysis of the Off-Switch Game AI Safety Gridworlds 2018: AGI Safety Literature Review Scalable agent alignment via reward modeling: a research direction Towards Safe Artificial General Intelligence 2019: A Causal Influence Diagram Perspective Modeling AGI Safety Frameworks with Causal Influence Diagrams Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings 2020: Avoiding Tampering Incentives in Deep RL via Decoupled Approval REALab: An Embedded Perspective on Tampering The Incentives that Shape Behaviour 2021: Agent Incentives: A Causal Perspective Alignment of Language Agents How RL Agents Behave When Their Actions Are Modified 2022: Discovering Agents Path-Specific Objectives for Safer Agent Incentives 2023: Characterising Decision Theories with Mechanised Causal Graphs Roman V Yampolskiy ['University of Louisville'] h-index: 32 weighted safety cite count: 95 2011: What to Do with the Singularity Paradox? 2012: Artificial General Intelligence and the Human Mental Model Safety Engineering for Artificial General Intelligence 2013: Responses to Catastrophic AGI Risk : A Survey Kaj Sotala Machine Intelligence Research Institute 2014: Responses to catastrophic AGI risk: a survey The Universe of Minds Utility function security in artificially intelligent agents 2016: Artificial Fun: Mapping Minds to the Space of Fun Taxonomy of Pathways to Dangerous Artificial Intelligence Unethical Research: How to Create a Malevolent Artificial Intelligence 2017: Diminishing Returns and Recursive Self Improving Artificial Intelligence Guidelines for Artificial Intelligence Containment High Performance Computing of Possible Minds Modeling and Interpreting Expert Disagreement About Artificial Superintelligence Responses to the Journey to the Singularity Risks of the Journey to the Singularity The Singularity May Be Near 2018: BEYOND MAD ? : THE RACE FOR ARTIFICIAL GENERAL INTELLIGENCE Building Safer AGI by introducing Artificial Stupidity Superintelligence and the Future of Governance 2019: Chapter 2 Risks of the Journey to the Singularity Long-term trajectories of human civilization Personal Universes: A Solution to the Multi-Agent Value Alignment Problem Predictability : What We Can Predict – A Literature Review Predicting future AI failures from historic examples Unexplainability and Incomprehensibility of Artificial Intelligence 2020: An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis Artificial General Intelligence: 13th International Conference, AGI 2020, St. Petersburg, Russia, September 16–19, 2020, Proceedings Artificial Stupidity: Data We Need to Make Machines Our Equals Chess as a Testing Grounds for the Oracle Approach to AI Safety Human $\neq$ AGI. On Controllability of AI Special Issue “On Defining Artificial Intelligence”—Commentaries and Author’s Response Transdisciplinary AI Observatory - Retrospective Analyses and Future-Oriented Contradistinctions 2021: AI Risk Skepticism Impossibility Results in AI: A Survey Uncontrollability of Artificial Intelligence Marcus Hutter [] h-index: 39 weighted safety cite count: 62 2015: Sequential Extensions of Causal and Evidential Decision Theory 2016: Avoiding Wireheading with Value Reinforcement Learning Death and Suicide in Universal Artificial Intelligence Practical Agents and Fundamental Challenges Self-Modification of Policy and Utility Function in Rational Agents Universal Artificial Intelligence-Practical Agents and Fundamental Challenges 2017: A Game-Theoretic Analysis of the Off-Switch Game 2018: AGI Safety Literature Review 2019: A Causal Influence Diagram Perspective Asymptotically Unambitious Artificial General Intelligence Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective 2020: Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent Curiosity Killed the Cat and the Asymptotically Optimal Agent Pessimism About Unknown Unknowns Inspires Conservatism 2021: Intelligence and Unambitiousness Using Algorithmic Information Theory 2022: Advanced Artificial Agents Intervene in the Provision of Reward Beyond Bayes-optimality: meta-learning what you know you don't know Sam Bowman ['NYU'] h-index: 16 weighted safety cite count: 60 2021: The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail 2022: Constitutional AI: Harmlessness from AI Feedback Discovering Language Model Behaviors with Model-Written Evaluations Language Models (Mostly) Know What They Know Measuring Progress on Scalable Oversight for Large Language Models Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions What Do NLP Researchers Believe? Results of the NLP Community Metasurvey 2023: Eight Things to Know about Large Language Models Inverse Scaling: When Bigger Isn't Better Measuring Faithfulness in Chain-of-Thought Reasoning Question Decomposition Improves the Faithfulness of Model-Generated Reasoning S. Legg [] h-index: 29 weighted safety cite count: 60 2017: AI Safety Gridworlds Deep Reinforcement Learning from Human Preferences 2018: Measuring and avoiding side effects using relative reachability Penalizing Side Effects using Stepwise Relative Reachability Scalable agent alignment via reward modeling: a research direction 2019: Learning Human Objectives by Evaluating Hypothetical Behavior Modeling AGI Safety Frameworks with Causal Influence Diagrams Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings 2020: Avoiding Side Effects By Considering Future Tasks Avoiding Tampering Incentives in Deep RL via Decoupled Approval Quantifying Differences in Reward Functions REALab: An Embedded Perspective on Tampering Special Issue “On Defining Artificial Intelligence”—Commentaries and Author’s Response The Incentives that Shape Behaviour 2021: Agent Incentives: A Causal Perspective Causal Analysis of Agent Behavior for AI Safety Model-Free Risk-Sensitive Reinforcement Learning 2022: Beyond Bayes-optimality: meta-learning what you know you don't know Safe Deep RL in 3D Environments using Human Feedback David Krueger [] h-index: 18 weighted safety cite count: 53 2018: Scalable agent alignment via reward modeling: a research direction 2019: M ISLEADING META-OBJECTIVES AND HIDDEN INCENTIVES FOR DISTRIBUTIONAL SHIFT 2020: AI Research Considerations for Human Existential Safety (ARCHES) Hidden Incentives for Auto-Induced Distributional Shift Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims 2021: Goal Misgeneralization in Deep Reinforcement Learning 2022: Broken Neural Scaling Laws Defining and Characterizing Reward Hacking 2023: Characterizing Manipulation from AI Systems Harms from Increasingly Agentic Algorithmic Systems Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Dan Hendrycks ['UC Berkeley'] h-index: 29 weighted safety cite count: 51 2021: A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures Unsolved Problems in ML Safety What Would Jiminy Cricket Do? Towards Agents That Behave Morally 2022: A Spectral View of Randomized Smoothing Under Common Corruptions: Benchmarking and Improving Certified Robustness Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks Forecasting Future World Events with Neural Networks How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios OpenOOD: Benchmarking Generalized Out-of-Distribution Detection Scaling Out-of-Distribution Detection for Real-World Settings Supplementary Materials for PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures X-Risk Analysis for AI Research 2023: Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark Natural Selection Favors AIs over Humans A. Dafoe [] h-index: 24 weighted safety cite count: 50 2017: When Will AI Exceed Human Performance? Evidence from AI Experts 2018: Public Policy and Superintelligent AI : A Vector Field Approach 1 ( 2018 ) ver

Tentatively adding UBC (from the tweet below) gets us up to seven.

Added liquidity because this market seems important

@JacobPfau Arguably borderline under the exact specification above, but I’ll count it by default.

bought Ṁ100 of NO

Nope. It ain't happening. Can't be done. In fact, I'm not sure ai safety research even exists. "universities"? what is that? you can't convince them to do stuff if they don't exist. absolutely not possible. nobody who bets yes on this market will possibly be able to contribute to it happening. even running on spite from me attempting to throw down the gauntlet with an exaggerated, sarcastic, obviously miscalibrated no bet, there's no way yes bettors could possibly make this happen. you can't prove me wrong, can't be done