Will any usable open-source chatbot be released this year?
91%
chance

For a YES resolution, I don't require it be competetive with the state of the art at the time. It must be comparable to chatgpt (not strictly as good as, but not far inferior), and I require it actually run on a single consumer GPU.

Added March 19: this market does not use the strict OSI definition of open-source (which would forbid terms like "don't be evil"). It is sufficient if it is legal for one to use the bot for most purposes, including commercial purposes.

Same idea as this market:

Sort by:
ScottLawrence avatar
Scott Lawrence

https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/

I think this is counts as "open source". It isn't a chatbot yet, but that this exists means that derivatives can also be open source, so I expect to see an open-source chatbot in the next month. Then I'll try and figure out if it's comparable to chatgpt (the original, not any GPT-4 deluxe version).

ScottLawrence avatar
Scott Lawrence
ScottLawrence avatar
Scott Lawrence

Thanks everyone for your input on the "open-source" issue. I've updated the description accordingly. In particular: Alpaca, under the current terms of use, is not eligible.

Gigacasting avatar
Gigacastingis predicting YES at 91%
Gigacasting avatar
Gigacastingis predicting YES at 91%

note 99% of production ML violates the ImageNet license or Yolov5+ License or text data license, or all three!

ScottLawrence avatar
Scott Lawrence

@Gigacasting If companies are using a technically "non-commercial license" chatbot for business purposes, doing so in a publicly visible manner, and doing so without getting sued, then that's good evidence that it's de facto legal.

Gigacasting avatar
Gigacastingis predicting YES at 91%

Meta winked by not deleting the pull request—and Stanford winked right back

Gigacasting avatar
Gigacastingis predicting YES at 92%

This is YES

“Consumer GPU” “usable” imply personal use

ScottLawrence avatar
Scott Lawrence

"Open source" implies open source.

ScottLawrence avatar
Scott Lawrence

I was not expecting the most difficult part of resolution to be the phrase "open source". Yikes! Kinda embarrassed that I missed that one, actually...

I think it's unambiguous that LLaMa does not count---leaks aren't the same as open-source. That also excludes Alpaca, at least for now. (Of course, if LLaMa is to be "officially released" later, then that could make it count.)

At the same time, I'm leaning away from actually invoking OSI's full definition (https://opensource.org/osd/). That's very strict, and would exclude any chatbot developed with a term in the license that says "you may not use this to destroy the world". I don't think that's what anybody has in mind when they read this question.

I think the requirements for this to resolve YES are:

  • I can download the weights and run it myself.

  • It's competitive with ChatGPT.

  • It's legal for me to use it, for the most part, however I want. Alpaca's terms of use prevent me from using it for "entertainment"---that's a hard NO. I think the easiest sharp line to draw is to say that it has to be possible for me to use it legally in a variety of ways that turn a profit. Therefore rules forbidding use for "business" would be disqualifying.

I'm not going to require that it be collaboratively developed, or anything like that. That's in some people's definition of open source, but not mine.

Thoughts?

MartinRandall avatar
Martin Randall

@ScottLawrence Tricky. I normally think of model weights as data, not source.

ScottLawrence avatar
Scott Lawrence

@MartinRandall Now that I've been away from this question for a few days, I don't think it's tricky at all.

For example: LLaMa. In the absence of model weights, it's not a chatbot. It's not even a language model. It's code for training one of the above, but if that code was applied to a different data set, it would yield a different thing. The reason we call these things "chatbots" and so on is because of how they're trained. (Related: scaling discussions necessarily include estimates of how much training data is available.)

For all we know, with the right training data and enough compute, Karpathy's MinGPT might achieve ChatGPT-comparable performance. (In fact this is overwhelmingly likely to be true, right? "Enough compute" and "the right data" are doing heavy lifting here.) But MinGPT was available long before the creation of this market, and of course nobody thought that should yield a YES resolution. The fact that "yes, this code can in principle be used to construct a chatbot, and this code is open source" is not sufficient.

MartinRandall avatar
Martin Randall

@ScottLawrence Comparison: MediaWiki software isn't an encyclopedia, it's code for hosting and collaboratively writing text. MediaWiki is open source. Wikipedia is produced using open source, and its data is covered by a Creative Commons license, so people informally say that it's an "open source encyclopedia".

If you interpret the model weights as the compiled program, then releasing the model weights would not be enough to be traditional open source, because they are even less scrutable than assembly or byte code.

I roughly think that an "open source" chatbot would be one where all of the code needed to create the model from scratch is open source, including the code that gets training data, filters out bad training data, tunes for chat instead of competition, etc, such that I can "compile" it as easily as compiling Linux, albeit at a much higher cost.

(please nobody make one of those)

But that's not what you're asking about, I think, and I don't know how I'd phrase what you're asking about.

ScottLawrence avatar
Scott Lawrence

@MartinRandall if the chatbot weights are released under CC, that'll count as open source for this market.

Many websites are hosted by nginx/apache. Those websites are not open source.

This market already requires that it be runnable on a single consumer GPU. If the "compilation" involves training from scratch, that will not be the case.

Gigacasting avatar
Gigacastingis predicting YES at 95%

Twelve meta’rs and twelve hackers >>> OpenAI

Gigacasting avatar
Gigacastingis predicting YES at 95%

LLMs will not be living in the pod or eating the bugs it seems

Gigacasting avatar
Gigacastingbought Ṁ95 of YES

The llama is loose

StrayClimb avatar
Reynolds

Interesting question. I think we need a clearer definition. Are you going to use the official error rates, statistically? And then require it to be better than chatGPT? By that I assume you're referring to gpt3.5? What about the distinction between raw da vinci, and da vinci+the safety/helpful improvements they've added on top of it?

ScottLawrence avatar
Scott Lawrence

@StrayClimb As I say in the description, it is not required to be better than ChatGPT, but merely comparable. I think I'm requiring a conversational model, so I do mean ChatGPT rather than GPT3.5.

I'm not at all confident that I can accurately compare official error rates between ChatGPT and another bot, which may choose to use different metrics.

I can't see how to try DaVinci for myself; is there a place where I can?

How do you feel about a cutoff of "conversational, and better than anything appearing prior to ChatGPT". Is that both clear enough and a reasonable interpretation of "comparable to ChatGPT"?

StrayClimb avatar
Reynolds

@ScottLawrence I was under the impression that chatGPT is some variant of GPT3.5

https://platform.openai.com/playground

select text-davinci-003

the link is to GPT playground, something similar to chatgpt but with fewer content filters and less post-generation interference for any reason.

"Better" is very hard to evaluate here. I'd also be very interested in whether there is an open source strong LLM.

ScottLawrence avatar
Scott Lawrence

@StrayClimb I promptly get "you've reached your usage limit", despite never having been on the playground before. Ah well!

ChatGPT is indeed a conversational variant of GPT3.5. I'll only be resolving this market YES if the open-source bot is conversational, so I suppose there's no need to compare against things like davinci.