My interest here is in how hard it is to have a "broad domain" MCTS trained algorithm. "Broad domain" is a single set of weights that can answer questions, write poetry, do coding, etc -- "narrow domain" means each set of weights can only do LeetCode style problems, can only prove theorems, etc.
It seems to me like it should be hard -- because MCTS without ground-truth success / failures seems like it would be really tough. But hey, I'm not a DeepMind researcher.
If DeepMind's Gemini (when revealed) does not use MCTS this will resolve N/A. This will be true even if it uses a MCTS-inspired algorithm like Muesli -- it needs to actually involve searching over a tree.
If it uses MCTS, and the result is that each set of weights can do LeetCode style problems only, or can do theorem-proving only, this resolves true.
If we have a general system like GPT-4, which can do a whole bunch of things from writing poetry to programming a compute, while still using MCTS, this resolves false.
This could involve some subjectivity, so I will not bet.