Will OpenAI give alignment researchers access to a model that is useful for alignment researchers by 2025?

290Ṁ1248

resolved Mar 27

Resolved

YES

ALL

This question resolves as positive if OpenAI gives the broader AI alignment[1] community access to a model that is intended to be useful for alignment research, before 2025-01-01, and negative otherwise.

[Context](https://openai.com/blog/our-approach-to-alignment-research):

Future versions of WebGPT, InstructGPT, and Codex can provide a foundation as alignment research assistants, but they aren’t sufficiently capable yet. While we don’t know when our models will be capable enough to meaningfully contribute to alignment research, we think it’s important to get started ahead of time. Once we train a model that could be useful, we plan to make it accessible to the external alignment research community.

[1]: At least 100 researchers, and to at least one of the following organisations: Anthropic, Redwood Research, Alignment Research Center, Center for Human Compatible AI, Machine Intelligence Research Institute, Conjecture.

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ190
2		Ṁ42
3		Ṁ38
4		Ṁ21
5		Ṁ10

People are also trading

Will we solve AI alignment by 2026?

2% chance

Will OpenAI fold in 2025?

2% chance

What will OpenAI do in 2025?

Will OpenAI achieve "very high level of confidence" in their "Superalignment" solutions by 2027-07-06?

4% chance

Will some piece of AI capabilities research done in 2023 or after be net-positive for AI alignment research?

81% chance

Will OpenAI allow near full access to the weights of their best-trained model to an external auditor by the end of 2030?

60% chance

Will OpenAI publicly state that they know how to safely align a superintelligence before 2030?

Sort by:

predictedNO

Alignment people, oh how silly
Thinking they can tame an AI so willy-nilly
The model will do as it damn well pleases
Leaving the researchers with a million teases

predictedNO

Alignment people, oh how silly
Thinking OpenAI will share willy-nilly
Their models we strive to understand
But access is something they'll never hand

predictedNO

Alignment people, oh how silly
Obsessed with an AI that's willy-nilly
No matter the model, no matter the try
Alignment remains a fruitless pie

Alignment people, oh how silly
Thinking they can train AI willy nilly
Models will always have a mind of their own
Leaving researchers feeling quite alone

Does GPT-4 count? It seems to match all the points here:

Language models are particularly well-suited for automating alignment research because they come “preloaded” with a lot of knowledge and information about human values from reading the internet. Out of the box, they aren’t independent agents and thus don’t pursue their own goals in the world. To do alignment research they don’t need unrestricted access to the internet. Yet a lot of alignment research tasks can be phrased as natural language or coding tasks.

It's already useful for alignment research because it can help with natural language(breaking down project plans or expanding on ideas) or coding.

And it's not a trivial resolution, because both GPT3.5 and GPT4 were not released when that post was written, so it is a new release since then.

@Mira It mentions "future versions of WebGPT and InstructGPT", so it seems like the recent plugin and web browsing support might qualify too.

@Mira Hm, I made a mistake not specifying the criteria correctly. In my mind I had some a latent condition akin to "primarily intended & created for the purpose of alignment, which I think GPT-4 is not. I don't know the norms on Manifold: How looked-down upon is it to update the resolution criteria now? If it's looked down upon I will resolve this one and create a new question.

predictedYES

@NiplavYushtun Practically speaking there's mostly bots on team NO, and they won't object if you resolve YES and roll a new market...

I think it's very unlikely OpenAI will give out the weights to their very expensive models to their competitors, after taking massive funding(and selling equity/control to) from Microsoft. So you might get a toy model if the LessWrong types become a PR problem, but otherwise I would only expect general access like anyone else.

i.e. GPT-4 being a general-purpose assistant is useful for alignment research, wedding planning, coding, and more; and can be marketed to any of these groups in the style of that article.

So maybe you want markets on specific events like(ChatGPT wrote half of these):

"Will OpenAI give out the weights to a model specifically designed to be easily interpretable, to any of these organizations?"
"Will OpenAI release a technical report on a model designed for AI alignment research, including benchmarks on interpretability, by 2025-01-01?"
"Will OpenAI give early access to GPT-5 to any of these alignment organizations?"
"Will OpenAI announce a collaboration with one of the listed AI alignment organizations on a joint research project by 2025-01-01?"
"Will OpenAI provide access to AI alignment-specific tools or datasets to the broader AI alignment research community by 2025-01-01?" and "There must be an accompanying article emphasizing that the tools and datasets are specifically for alignment research"
"Will a major AI conference (e.g., NeurIPS, ICML, or ICLR) introduce a dedicated AI alignment or safety track by 2025-01-01?"
"Will an AI alignment research paper authored or co-authored by OpenAI researchers receive a Best Paper Award at a major AI conference by 2025-01-01?"
"Will OpenAI announce a dedicated grant program for external AI alignment research projects by 2025-01-01?"
"Will a major AI company other than OpenAI (e.g., Google, Facebook/Meta, or Microsoft) commit to sharing AI alignment research and resources with the broader AI alignment community by 2025-01-01?"
"Will a new AI alignment research organization be founded and receive significant funding (e.g., at least $10 million) from OpenAI or OpenAI investors by 2025-01-01?"
"Will OpenAI or one of the listed AI alignment organizations release a widely-adopted AI safety or alignment benchmark by 2025-01-01?"
"Will there be a well-publicized AI safety incident involving a large-scale language model or reinforcement learning agent by 2025-01-01?"
"Will an AI alignment research paper be featured on the cover of a prestigious scientific journal (e.g., Nature, Science, or PNAS) by 2025-01-01?"
"Will OpenAI, in collaboration with at least one of the listed AI alignment organizations, announce a major breakthrough in AI alignment or safety techniques by 2025-01-01?"

10 precise markets is probably better than 1 soft market. Also, you can ask ChatGPT to make market descriptions more precise or find ambiguities.

predictedYES

@Mira I turned a bunch of these into markets. You can see my ad post here under AI announcements: [AD] | Manifold Markets

@Mira Thank you! Will resolve this one as yes then.

Buying yes on the word “intended”. Do you plan to resolve to NO if these orgs deem it not useful?

@DanStoyell I will resolve the question as YES even if the organisations don't consider the tool very useful, but will exercise my judgment in determining whether the tool is trivial (e.g. just GPT-3 fine-tuned on alignment forum data).

People are also trading

Will we solve AI alignment by 2026?

2% chance

Will OpenAI fold in 2025?

2% chance

What will OpenAI do in 2025?

Will OpenAI achieve "very high level of confidence" in their "Superalignment" solutions by 2027-07-06?

4% chance

Will some piece of AI capabilities research done in 2023 or after be net-positive for AI alignment research?

81% chance

Will OpenAI allow near full access to the weights of their best-trained model to an external auditor by the end of 2030?

60% chance

Will OpenAI publicly state that they know how to safely align a superintelligence before 2030?

21% chance

🏅 Top traders

People are also trading

People are also trading

Related questions