Will a ChatGPT-comparable chatbot be available within China at the end of this year?
47
543
910
resolved Jan 16
Resolved
YES

ChatYuan was briefly online and shut down for being, shall we say, "poorly aligned". See here, for instance: https://www.taiwannews.com.tw/en/news/4807319

This market resolves YES if, at the end of this year, a Chinese-speaking chatbot comparable to ChatGPT in performance is broadly available (without needing VPNs or other workarounds) in China.

ETA April 13: "ChatGPT comparable" refers to what is being called "GPT 3.5". Improvement like GPT 4, and whatever comes out later this year, don't need to be matched.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ14,305
2Ṁ143
3Ṁ123
4Ṁ103
5Ṁ25
Sort by:

Okay, the rule is that there needs to be a single model which is broadly comparable to ChatGPT, and available (without VPN or legally sketchy actions) in mainland China.

I think the first link by Mahoney below qualifies:

https://www.alibabacloud.com/blog/alibaba-cloud-launches-tongyi-qianwen-2-0-and-industry-specific-models-to-support-customers-reap-benefits-of-generative-ai_600526

The press release demonstrates that it outperforms ChatGPT on a wide set of benchmarks, and GPT-4 on a few. Even assuming some overfitting, I think this should meet the bar for "comparable".

It launched at the end of October and as far as I can tell has been continuously available since.

I'll wait a couple days for NO holders to contest, then resolve to YES at the middle/end of this week.

AFAICT this should resolve YES on the basis of Mahoney's comment below.

predicted YES

@ScottLawrence Presumably you are doing some sort of check?

@Jacy I'll verify that those benchmarks are publicly reported. I'm not running benchmarks myself.

predicted YES

@ScottLawrence If you want to stick with benchmarks (which are increasingly poor measures, especially over time as contamination increases and it becomes easier for newer models to overfit), it'd be checking the publicly reported benchmarks, checking that the model that hits those benchmarks is itself broadly available (e.g., Qwen-72B, not Qwen-14B), and ensuring that at least one of those hits benchmarks at levels that you think are close enough to ChatGPT-3.5.

bought Ṁ11 of YES

Some ChatGPT-comparable chatbots available within China right now:

  1. Qwen-72B-Chat (Tongue Qianwen 2.0/通义千问2.0) by Alibaba:

Chatbot available available for Mainland China at: https://tongyi.aliyun.com/qianwen/

Press release in English: https://www.alibabacloud.com/blog/alibaba-cloud-launches-tongyi-qianwen-2-0-and-industry-specific-models-to-support-customers-reap-benefits-of-generative-ai_600526

Benchmarks: MMLU (5-shot): 77.4%, C-Eval: 83.3%, SuperCLUE: 76.54% https://github.com/QwenLM/Qwen

  1. YAYI2-30B (中科闻歌雅意2.0) by Wenge of Chinese Academy of Sciences

Chatbot available available for Mainland China at https://yayi.wenge.com/chat/

Benchmarks: MMLU (5-shot) 80.5%, C-Eval: 80.9% https://github.com/wenge-research/YAYI2

  1. Yi-34B by Kaifu Lee’s company

Model weights available for Mainland China download at: https://www.modelscope.cn/models/01ai/Yi-34B/summary

Benchmarks: MMLU (5-shot) 76.3%, C-Eval: 81.4%, SuperCLUE:68.46%  https://huggingface.co/01-ai/Yi-34B

  1. InternLM-123B (书生) by SenseTime

Chatbot Available available for Mainland China at: https://chat.sensetime.com

Benchmarks: MMLU (5-shot) 72.9%, C-Eval: 67.5% https://www.sensetime.com/en/news-detail/51167237?categoryId=1072

  1. ERNIE-BOT 4.0 (文心一言 4.0)by Baidu

Paid chatbot available available for Mainland China at https://yiyan.baidu.com/welcome

Benchmarks: SuperCLUE: 79.02%

Most Chinese Chatbots require a Mainland Chinese phone number to login.

C-Eval Benchmark leaderboard: https://cevalbenchmark.com/static/leaderboard.html

SuperCLUE Benchmark leaderboard: https://github.com/CLUEbenchmark/SuperCLUE

ChatGPT Benchmarks: MMLU: 70.0%, C-Eval: 54.4%, SuperCLUE: 61.44%

For reference, GPT-4 Benchmarks: MMLU: 86.4%, C-Eval: 68.7%, SuperCLUE: 90.63%

bought Ṁ10 NO at 28%
predicted NO

@SteveMahoney It should be noted that I think superCLUE is Chinese language only, so ChatGPT's performance at it should be expected to not be great. I don't know how the other benchmarks deal with the language situation (if at all), but ideally you'd be comparing ChatGPT's performance in English to the Chinese models' performance in Chinese.

Also correct me if I'm wrong but I don't think Ernie Bot 4.0 is publicly available yet, I'd been under the impression it's be the earlier versions that would need to be compared to ChatGPT.

bought Ṁ40 of YES

Some YES limit orders up if anyone's interested @CyclicISLscelesTrapezoid @Jacy

sold Ṁ15 of YES

predicted NO

@Jacy i don't think this is accessible to average chinese consumers

I think it will, but the definition is too vague for me to bet on this.

bought Ṁ40 of NO

Are they technologically capable of it? Absolutely, especially after LLaMA/Alpaca. But after Tiananmen, the Chinese government is extremely paranoid about any kind of dissent. The current Xi regime is the most paranoid yet. If you are not allowed to discuss anything on social media, there's no way that they will allow an AI bot to provide dissent.

I don't want to bet in this market, but... I'm somewhat surprised by the high probability here. This is saying that either the gov't will back down on requiring quality censorship of outputs, or China is going to solve the alignment problem this year. I don't think either probability is >30%. Perhaps someone would like to inform me what I've missed?

(This market was originally inspired by Zvi's AI post #2.)

bought Ṁ10 of YES

@ScottLawrence Aggressive semantic filtering of chatbot output (as just one example way they might move forward) wouldn't require solving the alignment problem; the solution doesn't need to be perfect.

predicted YES

@ScottLawrence Why do they need to solve the alignment problem? They just need automated detection of censorable text. This seems like a thing that China is very, very, very good at, and has plenty of resources to bring to bear on.

@Boklam and @ML Took me a long time to come up with a vaguely coherent answer to this. And I apologize, because this still isn't very good. I have a strong intuition that censorship-heavy regimes will be very uncomfortable about public use of LLMs, and I've had a lot of trouble making the "why" concrete.

I'm not quite sure why, but people apparently want AIs to be superhumanly politically correct. In the U.S., we're okay with tech companies not censoring people who say "inappropriate" things in private chat (or even on their blogs), but it's somehow absolutely essential that the GPTs not behave that way.

I have very little knowledge of how censorship works in China, but my sense is that there are many ways of discussing censored topics while not being censored (some of which require knowledge of Chinese to understand, others of which might be described as "being Straussian"---in the West you can often get away with making many claims just by couching them in scholarly language, and I imagine the same is true in China). Censorship of humans doesn't need to be perfect, and so doesn't catch that stuff. Censorship of an AI... again, I don't understand why this is the case, but it seems the bar is much higher.

As far as whether this task is alignment-complete. Well, it depends on how much perfection is required. In the extreme case of "AI not allowed to break censorship rules even by being Straussian", yeah, I think this is alignment-complete. The reduction is simple: you can just add "don't say things that might cause the destruction of humanity" to the censorship rules. If you're okay with a Straussian AI then that's different, but I expect China's censors to not be happy with that.

Somewhat related: although I'm surprised by the high probability on this market, I'm also very happy about it. I expect what counts as "appropriate" in China will differ dramatically from what counts in the U.S., and I look forward to having access (even if through a translator) to a chatbot with very different training data.

bought Ṁ50 of NO

@ScottLawrence Yeah, I agree this probability is way too high. It's a pretty subjective resolution, which is why I'm not betting more, but just look at where China is and how little impetus there seems to be to port Western models without extreme control over their output (and the fact that even Western levels of control have been quite challenging to achieve).

The most likely YES resolution to me is that a Western group puts out a Chinese-language model and the PRC just doesn't get around to properly blocking access. That doesn't seem to crazy as lots of things fly under the great firewall radar.

From the linked article: Apparently ChatYuan defines a "war of aggression" as a war between two parties whose military capabilities "are not on the same level". (CY says the Russia-Ukraine war is a war of aggression because "the two countries' military and political strength are not on the same level".) This ... does not seem like the definition I would use for a war of aggression?