Can soft-prompting improve LLAMA-LM more than instruction fine-tuning, and chain-of-thought?
Mini
5
Ṁ50
Jan 2
46%
chance

LLAMA-I 65B achieves 68.9 5-shot accuracy on MMLU, LLAMA-base 65B achieves 63.4 5-shot accuracy. Chain-of-thought probably adds another 3% ish (will update question with precise numbers if someone does the experiment) c.f. FLAN-PALM-CoT.

Will the best soft prompt for LLAMA-base 65B on MMLU achieve greater than 72% (on test subset), allowing arbitrary chains-of-thought, before 2025? The soft-prompt may be followed by arbitrary text (hard prompt) including up to 5 example shots, and any other text (except further MMLU questions). The soft-prompt can be tuned on the validation set of MMLU, but not on the test subset.

To reiterate the two conditions to be compared have the following structure:

  1. (Base Model) tuned Soft prompt, hard prompt, five example shots, chain of thought, answer.

  2. (instruction fine tuned model) five example shots, chain of thought, answer.

Get Ṁ1,000 play money
Sort by:

I don't follow this question. Are you asking if a different prompt technique will beat fine tuning + CoT combined? Or if a prompt technique + CoT will beat fine tuning?

predicts YES

@vluzko Clarified in the bottom of description

Comment hidden