My interest here is in how hard it is to have a "broad domain" MCTS trained algorithm. "Broad domain" is a single set of weights that can answer questions, write poetry, do coding, etc -- "narrow domain" means each set of weights can only do LeetCode style problems, can only prove theorems, etc.
It seems to me like it should be hard -- because MCTS without ground-truth success / failures seems like it would be really tough. But hey, I'm not a DeepMind researcher.
If DeepMind's Gemini (when revealed) does not use MCTS this will resolve N/A. This will be true even if it uses a MCTS-inspired algorithm like Muesli -- it needs to actually involve searching over a tree.
If it uses MCTS, and the result is that each set of weights can do LeetCode style problems only, or can do theorem-proving only, this resolves true.
If we have a general system like GPT-4, which can do a whole bunch of things from writing poetry to programming a compute, while still using MCTS, this resolves false.
This could involve some subjectivity, so I will not bet.
“If we have a general system like GPT-4, which can do a whole bunch of things from writing poetry to programming a compute, while still using MCTS, this resolves false.”
It’s sounds like you’re saying GPT-4 is not a “narrow domain” model. But GPT-4 is a MoE, so each expert technically operates in a somewhat narrower domain than a non-MoE transformer. If Gemini’s architecture is known to have a similar feature along with MCTS, but the final system can do all of the general things GPT-4 can, does this resolve NO?
@AdamK Ah, good question.
I'd treat GPT-4 as broad domain, because there's no human-enumerated lists of domains made at any point, or human selecting of hyperparameters per domain (as people do for RL, often, for instance).
Also, my understanding is that for the kind of transformer we think GPT-4 is -- honestly I haven't read the rumors that closely -- the split into experts is only per-layer, so it still forms a somewhat unified whole? If the experts were split up into almost entirely separate networks (https://arxiv.org/pdf/2303.14177.pdf) I'd be more inclined to say it's actually narrow domain, but even in that case I'd still probably say that it's broad domain because the sorting into particular domains is automatically done.