Meaning an algorithm that can take an LLM as input and out a new language model with <=75% of the original model's parameters (and comparable performance, of course).
Must actually be demonstrated to work on a non-specialized language model with >=10B parameters (e.g. by downloading an open source foundation model and compressing that)
If there already is such an algorithm and I just don't know about it market will still resolve YES.
I take it LLM.int8() doesn't count? you're looking for parameter reduction, presumably in addition to int8?
Short-Term AI #6: By the end of June 2023, will there be compression algorithm for large language models that achieves a 4:3 compression ratio?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition