Will I (Peter Wildeford) think that there is an open source LLM as good as GPT3.5 by EOY 2023?
resolved Dec 29

On Twitter, @StephenLCasper predicted:

"Like what happened with DALLE2 and Stable Diffusion, I predict that within a few months, a ChatGPT copycat model will be open sourced. And then all of OpenAIs work to make their model safe will be negated by the copycats they directly enabled."

I'm personally skpetical of this prediction.

In this question, I will use my subjective judgement to decide if there has been an open source LLM as good as GPT3.5 by the end of this year.

Grading the quality of an LLM is difficult. I'm planning to evaluate this in large part based on whether I can access this LLM and find it to be subjectively about is good, but I will also be interested in appealing to moderately standard benchmarks like MMLU.

Whether something is "open source" is defined liberally here and also will be determined by my subjective judgement, but generally I will deem something open source if (a) anyone can access it and (b) it wasn't the result of an unintentional leak/exfiltration, regardless of the precisions of the license.

I will rely on my subjective judgement to evaluate the credibility of cases. In the case this question is to resolve, I will allow 48 hours of discussion before resolving.

I will not personally be trading on this market because it relies on my subjective judgement.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
Sort by:
predicted NO

Which LLM do you think is as good as GPT3.5? How did you evaluate it?

predicted NO

So which is the OS LLM in question? Or was it just closed to close it without serious thought?

@PeterWildeford Are you going to resolve this market? And if no, do you want to propose an alternate resolution system?

Delegate this to trustworthy users?

predicted NO

@bohaska two comments down he says he’s still going to resolve all his open markets

predicted YES

bought Ṁ9,900 YES

Peter's most recent comment says that he's quitting Manifold. Most likely mods will just N/A and return the funds, but it's something to keep in mind if you want to bet.

@tfae I still commit to resolve all my open questions

bought Ṁ100 of NO

You gotta be careful with the benchmark though. I think the most commonly used version of GPT-3.5 (gpt-3.5-turbo) today is likely worse than text-davinci-003 in terms of peak performance, but usually preferred because it's a lot smaller/faster/cheaper.

predicted NO

@PeterWildeford Have you tried any of the open-source models recently? The software LM Studio is getting pretty popular and you can try pretty much any of the open source models

predicted YES

Will you take LLaMA2 as open source? It's technically not, but for most people it's as if it were (unless you're a business with 700 million users as of last month)

@firstuserhere If I can personally access the weights and it's not due to some unusual characteristic of me, then yes

predicted YES

@PeterWildeford yes you can personally access the weights but they do ask you to fill out a form prior to that. The form asks for just the name afair. It took me 15 seconds to fill the form and i had weights within a minute after that

@firstuserhere Sounds open source enough to me

predicted NO

https://chat.lmsys.org/?leaderboard seems potentially relevant

Do you mean an RLHF or SFT model like ChatGPT or an inframodel?

@ampdot any of them

predicted YES

@PeterWildeford Guanaco-65b & Guanaco-33b have already beat ChatGPT on benchmarks, you might want to try those

@ShadowyZephyr Do you have a link?

@PeterWildeford Just to be clear, if either an open source inframodel (base model) beats GPT-3.5 base or an open source tuned model beats ChatGPT, you will resolve this to YES?

@ampdot Correct

Though we should be careful to explain what "beats" means, and this will be subjective and come from a moderately skeptical approach.

bought Ṁ25 of NO

By “anyone can access it,” do you mean anyone can access the weights?

predicted YES

@Alana Presumably so, since that would make it open source.