Copywriting, translating, conversing, semantic search, data labelling, text summarisation, and code generation are all established uses of LLMs that are being commercialised. I am interested in whether large language models (LLMs) — which are also referred to as foundation models — will have their capabilities extended beyond the natural language domain, in a reliable and commercially sustainable way. Will I believe this has happened by the end of 2023?
This resolves positively if the nature of its use is extended by being given an interface with other software, but there has to be evidence of it being used (sustainably) for real commercial applications. A simple demonstration will not suffice, nor does it count if its usage is for experimental/R&D purposes (as opposed to direct commercial ones).
If there is software that can design architectural models, or perform accounting tasks, and is fundamentally based on an LLM — that counts. The LLM cannot just be an added feature that does some natural language tasks on the side, it must actually drive the software's core functionality.
A lot rests on what I consider to be "notably different" in nature, but if you give me examples of new capabilities, I'll let you know what I think. Since this is a subjective market, I will not bet in it.
Jan 7, 1:54pm: Will I believe that the nature of commercial LLM usage is notably different by the end of 2023? → Will non-experimental, LLM-based commercial software have new, reliable, non-language capabilities by the end of 2023?
Apr 27, 11:39am: Will non-experimental, LLM-based commercial software have new, reliable, non-language capabilities by the end of 2023? → Will LLMs' non-language capabilities be used commercially the end of 2023?
Apr 27, 11:47am: Will LLMs' non-language capabilities be used commercially the end of 2023? → Will LLMs' non-language capabilities be used commercially by the end of 2023?
@firstuserhere These all look like they're using LLMs' 'language' capabilities (code generation was explicitly stated to be a 'language' capability in the market description). LLMs running its own code interpreter to do 'non-langauge' tasks would count as a change in its nature, but code generation itself isn't of a different nature. (Plus, it has to be used in a commercially sustainable way).
I feel like there ought to be some demonstration of ChatGPT being used commercially as a replacement for software that isn't at all a 'language' task. e.g. as part of a video transcription workflow, or being the default tool for generating the data viz for paywalled articles (but consistently, not as a one-off). Will have a look once I get the chance.
@finn Ok sure, what about GPT-4 vision?
is commercial
is LLM based
does non-language tasks
@firstuserhere I'm looking for it to be used commercially (as opposed to the LLM itself being commercial), so not going to resolve based off of the existence of GPT-4 vision itself (but there's a reasonable chance there's evidence of it being used for commercial non-language tasks).
@finn I am referring to ChatGPT being the commercial use of the LLM.
It is an application that is powered by this LLM, and this LLM's non language abilities are being used. People pay $20 a month to use this application.
@firstuserhere from an earlier comment I made:
I'm not going to resolve it yet because I want to see "evidence of it being used (sustainably) for real commercial applications". I realise ChatGPT is a commercial product on its own, but the intention is to resolve this only if the non-language capabilities LLM-based software are commercially viable.
In my head this was clearer that ChatGPT wouldn't itself count, but in retrospect I didn't clearly say that. And anyway, I do think the bar has been cleared, since plenty of people will have bought a ChatGPT Plus subscription to be able to use its data analysis, and if some other company made that as a standalone software based on an LLM it would be hard to dispute. Will leave a top-level comment in case a NO bettor wants to argue their case before I resolve it.
@firstuserhere have you got any new examples that very clearly fit the criteria? There's probably something out there but I haven't gone hunting for anything yet (I was going to bank on a clear-as-day example showing up so that there's very little ambiguity when I do resolve this market).
I'm erring on the side of waiting, because it usually takes time for these tools to be integrated in a way that sustainably generates revenue
@finnhambly If a company implemented GPT based explanations of alerts in their commercial software (using per-engineered prompts to select reliable sources and get useful output), would that qualify as "non-language"?
https://risky.biz/RB703/ - Minute 42 for the interview
(note - I have no position in this market)
@JustNo interesting, and cool to hear about how this stuff is getting integrated. It's being used to explain a cyber security problem to a user, which is very much a language task IMO, but thank you for sharing!
@jacksonpolack thanks, I've realised the title of this question is slightly misleading given the criteria - I've changed the title now (it's easier to phrase now that GPT4 is multimodal and has plugins)
Be My Eyes is indeed commercial, but I'm wanting to know if the tool they're selling is actually used by other companies to generate revenue (sorry that this wasn't clearer). If there's evidence that it's used for commercial tour guides (or something like that) I could see Be My Eyes resolving this positively.
@jacksonpolack sorry, I realise I didn't really reply to your question properly! You're right about the business section, but that's currently all focused on Be My Eyes' current approach of using human volunteers to help people.
I think my last comment was just a bit dumb by moving the goalposts in an unclear way; businesses regularly paying to use Be My Eyes' virtual assistant should definitely count for resolving this market! There just needs to be some evidence of it — I don't know if any of their business partners use the virtual assistant in place of human volunteers yet.
@finnhambly The GPT4-V(ision) system card says that the product was "Be my AI" and was indeed included in Be my eyes and used commercially. There's an option for calling a human operator to ensure that whatever the model is saying is correct.
@firstuserhere thanks — it talks about the beta testing group, but I can't see anything to say it's been successfully deployed in its commercial products yet, so wouldn't be ready to resolve this positively until then.
Let me know if there's anything else you've spotted. It's the evidence of the step from non-language capability (eg the advanced data analysis mode, or vision) to consistently generating revenue that's proving hardest to find evidence for.
@finnhambly I was replying to your comment above about Bee My eyes using GPT-4 Vision, and not presenting evidence for a resolution (of which I believe there are 100s, but I'd rather not do free labor unless I have to, and I'm happy to take a 7% return till then). Manifold doesn't have a clear way to showing which particular comment in a chain is being replied to.
@firstuserhere thanks, keep these coming. Most of these are examples of code generation, but the Genmo chat one and the browser automation one would count (I think) if used commercially.
@Dreamingpast if you think there's any evidence of commercial usage that would resolve this positively then please do link to it here!
I've not seen any concrete evidence yet, but I've not been searching for it — my comments below discuss what it'll take for ChatGPT to resolve this positively :)
@Dreamingpast it was all about when the release will happen! Now it has, and a LLM can steer 400+ models! and learn to do task plan! and use audio models and math models and speech to speech models and image to speech and video so coool
@Dreamingpast I just need evidence of a user using such capabilities as part of revenue generating activities (that aren't experimental/temporary).
I think I could have done better in specifying the focus of this market, but it's meant to be on the nature of its actual usage, rather than its demonstrated capabilities.