
Any proof that this is possible counts.
People are also trading
Mistral Small seems similar in performance to GPT 3.5: https://mistral.ai/news/la-plateforme/
Should be a matter of days until someone runs it on their iPhone
It runs on a MacBook, fairly trivial to get ~3x gain in tflops and 3-10x gain in model performance over this window
Might happen much sooner than I expected
https://twitter.com/harmlessai/status/1626769581858758661?s=46&t=-aOs5vi8y_5tlqgbjNoRKA
As evidence in favour:
this type of advance in making existing transformers smaller at same performance: https://twitter.com/arankomatsuzaki/status/1624947959644278786?s=20&t=sWoi47Zz-RRprcyDC-jZog
optimal training data vs parameter count (a la Chinchilla)
the hefty "neural engines" that Apple is continuously making better in their apple silicon
that we may find new architectures that work better for language