Are Mixture of Expert (MoE) transformer models generally more human interpretable than dense transformers?

1kṀ1318

Dec 31

45%

chance

ALL

Can resolve PROB in case of high uncertainty, resolves YES or NO based on the best understanding we have at the end of 2025.

If you'd like to lock in your priors or intuitions, without making trades, here's a free poll:

AI Safety

Technical AI Safety

Get

1,000

to start trading!

People are also trading

Is gpt-3.5-turbo a Mixture of Experts (MoE)?

84% chance

By EOY 2025, will the model with the lowest perplexity on Common Crawl will not be based on transformers?

5% chance

Will the most capable, public multimodal model at the end of 2027 in my judgement use a transformer-like architecture?

63% chance

Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%?

Sort by:

predictedYES

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
https://arxiv.org/pdf/2206.02770.pdf

This is using something like top-k so it's not that interesting

predictedYES

@NoaNabeshima To be clear, a lot of my YES weight is on worlds where experts don't cleanly specialize and yet MoE models are still more interpretable.

@NoaNabeshima

@NoaNabeshima One consideration is that if we’re controlling for performance, MoE models would have more parameters.

@JSD But I’m still a yes overall - even accounting for this, I don’t expect MoE models to be less interpretable. But I don’t expect MoE’s extra interpretability to be particularly actionable/useful?

predictedYES

@JSD Actionable/useful in what sense?

@NoaNabeshima For whatever application people have in mind for interp (latent advex, steering via activations, full reverse engineering, mechanistic anomaly detection), I don’t think people working on that application will say things like "phew, thank god we’re working with MoEs", or "if only we were working with MoEs".

Stella’s response is interesting (https://www.lesswrong.com/posts/umkzHoeH2SJ2EcH6K/are-mixture-of-experts-transformers-more-interpretable-than?commentId=k6xkyzbSXCdkhhhrJ), but what if experts are trained on different datasets, which rumors suggest was the case for GPT-4?

@JSD yep, I commented there before making this market. Its possible that experts were trained on different datasets, its interesting.

@JSD I mean I'd guess multi-headed attention is also more interpretable than single-headed attention.

People are also trading

Is gpt-3.5-turbo a Mixture of Experts (MoE)?

84% chance

By EOY 2025, will the model with the lowest perplexity on Common Crawl will not be based on transformers?

5% chance

Will the most capable, public multimodal model at the end of 2027 in my judgement use a transformer-like architecture?

63% chance

Will mechanistic/transformer interpretability [eg Neel Nanda] end up affecting p(doom) more than 5%?

10% chance

People are also trading

People are also trading

Related questions