[1000M subsidy] Is there a significant discrepancy in the early cases data in the final WHO report on Covid-19 origins?
13
591
1.2K
Jan 31
78%
chance

Background:

In the discussion here, I claimed that there is a significant discrepancy between the early cases data in the final WHO report when compared to early peer-reviewed studies from China, SCMP leak.

Source: https://archive.is/yNxMm

@PeterMillerc030 responded by saying that these numbers are made up by Gilles Demaneuf (DRASTIC member who performed this analysis) and that there is no discrepancy.

I have responded by pointing out that Peter's comment is factually incorrect primarily because he doesn't make the distinction between confirmed and diagnosed cases. You can look at the discussion and validate the arguments by referring to the primary sources.

Resolution:

Resolves based on my subjective assessment of the weighted consensus of Manifold users (More weight given to users based on a mix of my/community's assessment of their trustworthiness, calibration scores, and/or quality of arguments).
I do not intend to rush to a resolution, so if there's a significant objection to resolve the question by any such user with high weight, then I will wait for their concerns to be reasonably addressed before resolving.

I wlil not trade on this but I will evaluate the evidence presented/ present my own arguments.

If my resolution is deemed controversial, I will very likely delegate the resolution to other user(s).

Update 2: Clarified the market description and extended the end date.

Get Ṁ200 play money
Sort by:

@Akzzz123 Are you familiar with this page that Flo Debarre and Babar set up to compare early case counts across various sources?

https://flodebarre.github.io/covid_firstCases/visualization.html

It doesn't seem to output cumulative cases but I added up December cases for a few sources by hand:

Local CDC 2019: 20
Huang The Lancet: 40
Li NEJM: 47
China CDC 2020: 164
WHO 2020: 202
Li Lancet ID: 191
Pan 2020: 197
WHO 2021: 174

It looks like some of those also divide it up into lab-confirmed, I added up a few of those, as well:
China CDC 2020 102
WHO 2020 108
Pan 2020 115
WHO 2021 100

Flo did not include Wei et al or Shi et al in this tool, I think because neither had a daily case curve. Wei did graph cases over time in larger bins, and we know that the data from Wei matches the WHO report (assuming Wei is talking about total cases, not specifically lab confirmed cases). For Shi the December case total is higher but IIRC the paper doesn't have a daily graph for December cases.

So if the simple question is, "do various data sources disagree with each other?", then it's an obvious yes. But I think you're going too far in your interpretation of that.

Like, you said below that "there was no inclusion of any serious retrospective search in the final WHO report". I would disagree with that claim -- it's obvious there was a retrospective search that happened after the early published papers (like Huang and Li NEJM) that increased the December count from the 40's to 160+. The WHO report must contain > 120 cases from a retrospective search.

I think Gilles claims something further, that there was a further retrospective search and then that search found earlier cases but it got "rolled back" somehow. That claim is much harder to evaluate, because I don't know the criteria by which the WHO 2021 report case count got reduced from the WHO 2020 report or from the numbers in Pan 2020.

You can at least graph the case count over time and see on what dates the case numbers got reduced. It's reductions on certain days in late December, it's not like there were 30 cases in November and early December that got removed:

Some of the days also change in the opposite direction -- note that on December 25th and December 28th the WHO report has added cases, relative to Pan 2020, and on December 27th, the WHO report removed cases. That sounds more consistent with some kind of messy merging of records from hospitals and doctors, rather than some plot to remove cases to obscure the origins of covid. But it's hard to say for sure what happened, without access to the original medical records.

The original question that created this market is the Washington Post article based on Gilles Demaneuf's work. I will stand by my claim that Gilles is incorrect. The only possible way to get the 257 December case total that he did is to take Wei et al and assume they are talking about "lab confirmed cases" when that paper is actually talking about total cases, then add the "clinically diagnosed" cases from Shi et al and then also add the 19 cases from the WHO report.

Aside from that being a misinterpretation of the data in Wei et al, the 19 number would almost certainly be incorrect under Gilles interpretation, because that would be "lab-confirmed cases from the WHO report", which would be too low of a number. If Wei at al is actually talking about lab-confirmed cases, and it's consistently higher than the WHO report numbers, then the 19 should be higher in Wei's numbers, as well:

Gilles thinks it was 146 lab-confirmed cases on Dec 29th but the WHO report only has 81 lab-confirmed at that date. So even if Gilles was correct, then the lab-confirmed cases for December 30 and 31 would also be proportionately higher, perhaps 34 instead of 19. You might end up with 146+92+34 = 272 December cases, which would not be a confirmation of the SCMP article.

Also, if you insist that Gilles' interpretation of the data (based on Shi and Wei) is the only correct interpretation of the data, then you aren't just saying that the 2021 WHO report is lying and undercounting, you're also saying that all these other papers (like Pan 2020 ) are lying and undercounting.

You could add numbers across any 2 or 3 of these papers and claim that those are the only correct sources. But it gets pretty weird if you decide that, of all these data sources, all of them are incorrect but only Wei and Shi can be trusted, and even then you can only trust those numbers after adding them across Wei and Shi and also including some numbers from the WHO report.

I'd love to bet on this, but I don't trust you to resolve this market correctly.

Like, if you were going to stick with the original question that started this market, based on the Washington Post article, I think you would have to resolve the market as "No". Gilles is wrong, that's based on a misinterpretation of the Wei et al case counts and it wouldn't work out to 257 December cases even if Gilles was correct about that (because the 19 would be wrong).

But I suspect you'll just move the goalposts and say that the market is instead about whether or not the 2021 WHO report has less cases than some other papers. That's demonstrably true, but the relevance is unclear, as the case counting process for each paper is not documented well enough anywhere. You definitely can't use the published data to prove that there are secret November cases that the 2021 WHO report is hiding or that the SCMP article was accurate.

@Akzzz123
In the other thread, we discussed the reason I think that Wei uses total cases, not lab confirmed cases. That's both because the December cases match the WHO report and because the 2020 case counts match the other papers under that interpretation.

Recall that Wei et al says: "As of 27 February 2020, a total of 48,313 COVID-19 cases were confirmed in Wuhan"

Shi et al says "a total of 29,886 confirmed cases and 21,960 clinically diagnosed cases with COVID-19 in Wuhan" (that's a total of 51,846, up to 24 February 2020)

Pan 2020 says "In this cohort study that included 32 583 patients with laboratory-confirmed COVID-19 in Wuhan from December 8, 2019, through March 8, 2020"

https://jamanetwork.com/journals/jama/fullarticle/2764658

And Chinese CDC 2020 lists 31,974 confirmed cases in Wuhan, see Table 1.

https://weekly.chinacdc.cn/en/article/doi/10.46234/ccdcw2020.032


So that's 4 sources confirming that Wei is not listing the lab-confirmed cases but rather the total number of cases. And I cited multiple papers above confirming that the total number of December cases is less than 257.

Is that enough to prove that Gilles is wrong here?

You can, of course, still say that you think the SCMP article is accurate and China found November covid cases that they're now hiding. That's possible. It just can't be confirmed by any of the published sources. And I also don't think it's likely, based on case growth models.

@PeterMillerc030

I've accepted that Wei numbers look like total cases. So I don't see the need to discuss Gilles' use of Wei numbers any further. You are free to keep bringing up Gilles but I was hoping we are past that. I am not relying on Gilles' interpretation of Wei et al. I have limited time to respond, so I would appreciate if you could keep your arguments to the point and avoid repeating what's already been discussed.

Thanks for sharing the link to the cases visualization. The different sources of data add even more weight to my claim that the WHO report's 174 look incomplete.

. But it gets pretty weird if you decide that, of all these data sources, all of them are incorrect but only Wei and Shi can be trusted, and even then you can only trust those numbers after adding them across Wei and Shi and also including some numbers from the WHO report.

I am making no such claim. You, on the other hand, are trusting an "analysis" on a specific version of the data that somehow gives you 99%+ confidence.

based on the Washington Post article, I think you would have to resolve the market as "No". Gilles is wrong

Did I word the question "Is Gilles wrong?"?

I'd love to bet on this, but I don't trust you to resolve this market correctly.

Please feel free to make a market "Is Gilles wrong?" and resolve it any way you want. That's not what I intended while creating this market and my comments show that. I have already clarified this point multiple times in the market. I won't repeat it again.

That's demonstrably true, but the relevance is unclear, as the case counting process for each paper is not documented well enough anywhere.

Check next section for relevance.

"there was no inclusion of any serious retrospective search in the final WHO report". I would disagree with that claim -- it's obvious there was a retrospective search that happened after the early published papers (like Huang and Li NEJM) that increased the December count from the 40's to 160+.

Finally, coming to the main argument.

China by Jan 17 was reporting only 44 cases total. There was a retrospective search but not a serious one.

The Dec cases then increased as follows

  • 102 C, 62 D, total 164 [CCDC, Feb 11]

  • 108 C, 83 D, total 191 [First WHO report, Feb 20]

  • 115 C, 82 D, total 197 [Pan et al, Mar 8]

  • 124 C, [WHO ToR, Aug]

There's a clear increasing trend because the retrospective search was ongoing in Feb.

Yu Chuanhua (Feb 27 article): "As of February 25, our entire database has about 47,000 cases."

The total cases that you mention from studies after Feb 27 mention 48-51K total cases, which suggests that even on Feb 25, the retrospective search was not complete.

Final WHO report's 174 are what would be in the system between Feb 11 and Feb 20. But we know that even by Feb 25 the retrospective search was not complete.

So my claim that the retrospective search was not serious is correct.

Additionally, here's a chart that you will really like.

This is from the first WHO report from Feb 2020.

First lab-confirmed Hubei onset (outside Wuhan) - Dec 10

First lab-confirmed China onset (outside Hubei) - Dec 18

This is also confirmed by the Feb 21 CCDC weekly report

Still have 99%+ confidence in market origin?

@Akzzz123 Thanks. I didn't realize that you'd conceded about Wei's numbers.

Your original question focuses on the Washington Post article, though, and that's based on Gilles' argument. So if he is wrong, that seems like you've conceded the original argument.

For instance, you wrote: "@PeterMillerc030 responded by saying that these numbers are made up by Gilles DeManeuf (DRASTIC member who performed this analysis) and that there is no discrepancy. I have responded by pointing out that Peter's comment is factually incorrect primarily because he doesn't make the distinction between confirmed and diagnosed cases."

And you've already conceded both that Gilles is wrong and that you were wrong about the distinction between confirmed and diagnosed cases. So, it seems pretty clear that you were wrong about the question that you created this market for.

If it's not about that question, then what question is it about?

Are you asking if the WHO report numbers match the SCMP numbers? Of course they don't.

Are you asking if the WHO numbers match Wei's numbers? We've agreed they do.


Are you asking if the WHO numbers match Shi's numbers? We've agreed they don't, but Shi does not list enough cases to validate the SCMP numbers.

Are you asking if there are any real covid cases that are not in the WHO report? We've agreed there's at least one: it's missing Wei Guixian, the actual first known case at the market.

Maybe it's a waste of time to argue further about this, since I have no idea what the resolution criteria are.

I'd be happy to chat about discrepancies in early data, though, it is always good to see if I'm missing anything.

I'm short on time right now but I'll look through your other points and get back to you.

One quick note: you mentioned that the first case outside of Hubei is December 18th. I'm aware of one Beijing case with a December 17th onset. That was actually a pharmacist at the Huanan market who went to Beijing, and he's lab confirmed and sequenced. I can dig up the genome, it's lineage B. I would assume that's the same guy.

@Akzzz123 Here's a thread on the earliest known Beijing case:
https://twitter.com/Engineer2The/status/1539388054690930693

And his genome is here, looks like he's lineage B + one mutation:
https://www.ncbi.nlm.nih.gov/nuccore/MT034054

Ok not going to argue anymore in view of the revision. I view the fact that you revised the question so significantly as effectively a concession on the original argument about the case counts, even though I know you don’t agree.

@NateWatson

Here's my comment on the other thread from a few days ago where I clearly mention that my intent is to check whether the data is incomplete or not.

Call to fact-check:

The narrative that Huanan Seafood Market (HSM) is the point where SARS-CoV-2 first entered humans is primarily built on the data provided by the Chinese authorities to the WHO during WHO's investigation in early 2021. There's plenty of evidence that this data is incomplete.

The missing early cases have been called out even by the WaPo editorial board [1]. And according to the SCMP leak of early case data, the earliest detected cases went as far back as Nov 17 [2].

Also, once you asked whether other sources of data are allowed or not, I said yes without a doubt. Now if you want to fixate on one part of the argument feel free to do so but I have made it clear, even before you explicitly asked, that it's not my intention.

Since you have not made a meaningful reply to my comment below, I will assume that you are no longer arguing for a NO.

It's not just because of the Dec 1 case, there are two Dec 10 cases too.

And there's another paper that shows this discrepancy in the very early cases.

https://www.nejm.org/doi/full/10.1056/nejmoa2001

There's another paper by Pan et al that shows around ~200 cases in December.

There's also the SCMP leak which shows around 260 cases in 2019.

The WHO terms of reference (July 2020) mentions 124 confirmed cases and specifically refers to a retrospective search for cases.

Shi et al mentions 135 lab confirmed cases and 227 total.

15 deaths in the Wuhan CDC report from Dec onset cases compared to 33 in Shi et al. We know that deaths were underreported.

That the WHO numbers resemble an early Feb Wuhan CDC numbers imo strongly suggests that there was no inclusion of any serious retrospective search in the final WHO report.

So as far as I can see, there are plenty of good reasons to resolve this YES.

@Akzzz123 Not engaging with the new question at all, you can resolve however you want. I am feeling pretty vindicated in my meta-bet of not betting on the market.

I don't think it's reasonable to cite a post on another market as more authoritative as to the spirit of the question than actual question you posted. Even the paragraph you posted says "the cases have been called out even by the WaPo editorial board" so it is pretty clear what you're talking about if you look at that.


Originally the question was a clear sub-debate or the original debate. The question now nearly resolves to "is the WHO data correct" (and this means for you not the data they had access to or investigated, but the data as presented in the report -- there's almost no scenario where there's a real case before December 15 or so that has ever been identified as a possible case and which is not included in the WHO report where you couldn't argue the answer is YES). That is (at least for me) is somewhat close to "Did COVID have a natural origin" because the putative pattern of cases consistent with the wet-market is currently the main argument for natural origin.

If you thought "discrepancy between the number of cases(confirmed and diagnosed) in the final WHO report when compared to early peer-reviewed studies from China, SCMP leak" argument was strong you'd be willing to stand on it.

@NateWatson Also even ignoring the change to the question if you say "I think X because A, B, and C" and I take issue with A, you should defend A or concede the point, or at least stop citing it as part of your case, not reply "Well I'm right because of A, B, C plus also D and E, you haven't responded to all of those" which is your reply above. (To be clear I'm still not going to respond if you do defend A due to the change in question though.)

@NateWatson

If you thought "discrepancy between the number of cases(confirmed and diagnosed) in the final WHO report when compared to early peer-reviewed studies from China, SCMP leak" argument was strong you'd be willing to stand on it.

I am still standing strong on it.

Originally the question was a clear sub-debate or the original debate.

Incorrect. As I have pointed to you, I am open to using other papers as well. Why would I agree to this if I only intended to look at what's mentioned in the WaPo article?

It's generally not a good idea to infer what others are thinking purely based on what someone writes in the market text. It's normal for humans to miss out mentioning the entire context when making the market. I have also shared evidence about what I was broadly thinking when I created the market.

I've also linked the comment in the description which has more context as well as the broader claim.

https://manifold.markets/chrisjbillington/will-bsp9000-win-the-rootclaim-chal#qfLql9zKh1zDG5P0ilXF

@Akzzz123

It's your market you can do what you want. Literally no one has bet NO and you revised in favor of YES so it's kind a moot point. Will let readers judge if you have indeed stood strong on the counts discrepancy issue (my issue isn't fairness to bettors it's moving the goalposts). The question is no longer useful as a sub-debate and it will not hinge on the counts thing now because it is the weakest of many possible arguments for YES so no point in debating it in this context.

@NateWatson

Your assumption that the question is only about the subdebate is incorrect.

The question clearly mentions early peer reviewed studies, SCMP leak. So it was never about one particular subdebate. Further, when I was asked what constitutes early peer reviewed studies, I elaborated as below. Not really sure what you're on about but please feel free to do as you wish.

A more fundamental question: what, in your view, is the most probably accurate count of lab-confirmed cases of COVID on or before Dec 31, 2019? Wei? Shi? Or some other count?

@NateWatson

My sense is that the lab confirmed cases are undercounted in the WHO report according to at least 4 different sources (possibly more).

Further, that the WHO report resembles a Feb 2020 Wuhan CDC report imo strongly suggests there was no inclusion of any serious retrospective search in the final report.

Early peer reviewed studies generally? Or do you mean those two papers specifically (not sure I’d consider them early)? And does it matter whether or not those papers have more accurate numbers than the WHO report or just that they are different from the WHO report?

@NateWatson

By early, I mean the studies that don't rely on the cases data present in the WHO-China report but instead use the cases data provided by the relevant authorities before China standardized what it reported to the WHO.

It need not be limited to the two papers as long as the authors or the journal look notable to me (eg. some affiliation with the top 25 universities in China). If you are confused about a particular paper, please feel free to share it here so I can take a call.

Here's another paper that would definitely qualify.

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext

What matters is that they contradict WHO numbers. However if someone can show a good reason why the numbers are inaccurate then I will not consider those numbers.

@Akzzz123 yep certainly agree that is a very early one and high prestige/credibility. What's the issue with that one?

@NateWatson

Three different reports mention three different onset dates for that case. Without access to early cases, we have to rely on whatever the Chinese authorities reported to the WHO.

https://x.com/Engineer2The/status/1682089612045606912?s=20

@Akzzz123 that's a different issue, if there's a particularly significant case that has potentially been doctored in it's onset date, doubtless that could potentially be important to the overall case (and it would be way easier to maybe get away with) but the question is about case counts... the lancet article has 40 cases they apparently went around to hospitals to see, doesn't seem relevant to case counts to me.

By early, I mean the studies that don't rely on the cases data present in the WHO-China report but instead use the cases data provided by the relevant authorities before China standardized what it reported to the WHO.


Shi was received in December 2020, less than a month before WHO panel started. It cites this paper to explain all their data, it is using data from the same database,

February 21, 2020, chinese CDC, bajillion authors:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8392929/

Checkout this graph:
104 confirmed cases

Vs WHO Report

100 confirmed cases (blue bars)

It is pretty obviously the same data with minor differences (could be explained a few shifts of an onset date by 1, plus there is one case on the 13th that only WHO has). The biggest diff is five December 31 cases that are missing in the WHO report (my theory is they got moved to Jan 1). It's way too close not to be the same data.

@Akzzz123 I actually agree with you that there's no way to reconcile Shi's reported counts for "Before Dec 31" with this data, or with the CCDC data (or with Wei for that matter which it is just as far from.) It is especially perplexing because this is the paper they point to to explain where they got the data! But given the late date of the paper it seems implausible that it's because they, uniquely, have the original, uncorrupted data.

@NateWatson

The paper you linked is the weekly numbers released by China CDC. I think the citation to this by Shi et al is for the data collection methodology and not for the actual data itself.

There was also a retrospective search for cases around that time in which the corresponding author Chuanhua was involved.

There's some good discussion here.

https://www.researchgate.net/publication/364644929_Arrested_Development_The_number_of_Wuhan_cases_of_COVID-19_with_onset_in_2019

@Akzzz123
"All COVID-19 cases reported through February 11, 2020 were extracted from China’s Infectious Disease Information System."

vs

"All data from December 8, 2019 (date of the first onset) to 24 February 2020, were extracted from China’s Infectious Disease Information System."

It's the same database. And anyway you agree the data is to similar to be coincidence right?

@NateWatson

but the question is about case counts

I'm inclined to count this because the timing of the index case can be very important as even a single case that predates the market invalidates the theory.

@Akzzz123 I agree it is an import issue but it's not what the original dispute nor the question you wrote is about.

@NateWatson Yes, the Wuhan CDC data does bear resemblance to the WHO report data.

If you compare the deaths between Shi et al and Wuhan CDC numbers, there's a big discrepancy there as well.

@NateWatson The spirit of the question is to check whether the WHO data is complete or not. If there's an early epidemiological dataset in which the index case is missing, I would not call that dataset as complete even though it's just a single case missing.

@Akzzz123 I The December 1 (I think that's what you mean? can you be a little more explicit in your arguments?) case is an entirely different issue, albeit a very important one (I agree, if the December 1 case had Covid, and had symptom onset on December 1, it makes the wet market case far less likely, and I agree if Covid is unconnected to the wet market, it is probably originating from the lab.) The WHO was certainly aware of the December 1 case though so it's not exactly a discrepancy in the underlying data, they definitely knew about it and chose to exclude it, rightly or wrongly.

I disagree that the spirit of the question is "is the WHO data complete". It's very clearly about case counts. Not "is anything significant wrong with the WHO data". If you're really saying you'll resolve YES, not because of your original argument, not because of a significant discrepancy in counts, but because of the December 1 case, I don't want to debate.

I am pretty unsure about the Covid origins question as a whole, and if I found out tomorrow lab origin is true, I would think maybe the December 1 case was real and there was some cover-up going on with it. But I would still think Shi et. al. is a red herring and the discrepancy is not because they had access to some data others did not.

@NateWatson

I've updated the description to clarify the spirit of the market that I had in mind when I created it. I'm sorry for the miscommunication.

It's not just because of the Dec 1 case, there are two Dec 10 cases too.

And there's another paper that shows this discrepancy in the very early cases.

https://www.nejm.org/doi/full/10.1056/nejmoa2001316

There's another paper by Pan et al that shows around ~200 cases in December.

There's also the SCMP leak which shows around 260 cases in 2019.

The WHO terms of reference (July 2020) mentions 124 confirmed cases and specifically refers to a retrospective search for cases.

Shi et al mentions 135 lab confirmed cases and 227 total.

15 deaths in the Wuhan CDC report from Dec onset cases compared to 33 in Shi et al. We know that deaths were underreported.

That the WHO numbers resemble an early Feb Wuhan CDC numbers imo strongly suggests that there was no inclusion of any serious retrospective search in the final WHO report.

So as far as I can see, there are plenty of good reasons to resolve this YES.

@Akzzz123 I finally got around to reading this thread.

As long as we're nitpicking discrepancies in the early case counts, shouldn't we also ask why the SCMP article is inaccurate?

The article reads:

"According to the government data seen by the Post, a 55 year-old from Hubei province could have been the first person to have contracted Covid-19 on November 17.

From that date onwards, one to five new cases were reported each day. By December 15, the total number of infections stood at 27"

https://archive.is/2WSIi#selection-3571.0-3583.128

If there was at least one 1 case per day from November 17th to December 14th, that's already 28 cases. And if some of those days have 5 cases, it's definitely going to be more than the 27 listed.

There would also have to be more than 9 cases in November, if there's at least one case every day.

Clearly the 266 number is inaccurate and there are actually more cases that the SCMP is hiding. Why won't they tell you about the other November cases, besides the 9 they will tell you about?

@PeterMillerc030

I see strawmans are back on the menu.

Probably they meant one to five cases on each day when cases were reported.

Also, their article doesn't say there were only 9 Nov cases, they say first nine cases in Nov, so it's possible there are more than 9 Nov cases which is very different from the WHO report. The cumulative numbers they report are also very different from the WHO report.

@Akzzz123 The plural of "strawman" is "strawmen".

With so little attention to detail, I don't see how you can possibly solve the origins of covid.