Or whatever mobile device has replaced phones by then.
Needs to return responses within a few seconds.
Inspired by https://twitter.com/Grayyammu/status/1635574200621465601
@cloudprism I think it was already true when I created the market, I just didn't know.
@IsaacKing llama 7B on pixel 7 support: https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
@firstuserhere
Should be faster than 1 word per second (Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark) @IsaacKing
@IsaacKing yes, its qualitatively similar to gpt3.5. in fact, the 65B model outperforms GPT3 on many tasks despite being way smaller (more than 10x smaller) (and trained on only publically available data)
In fact, from the abstract of LLaMa paper:
"In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks"
@CarsonGale You have to actually be running the model itself, not a webpage that submits API calls to the model over the internet.
@IsaacKing the thread mentions 5 mins but that is very primitive as it isn't making using of the pixel's Nn chip
@IsaacKing actually i just reopened the thread since the day i posted it, and someone apparently did some sort of porting? Let's see
@firstuserhere @IsaacKing oh wow under 30 seconds for .cpp rewrite, this is insane (see demo in embedded tweet)
@firstuserhere and it's not even on a pixel 6, its a 5 and doesn't have the 6's tensor SoChip which presumably will speed it up quite a bit