$100 bounty: Will someone combine these two datasets between obesity & water contamination?
Basic
26
á¹€2141resolved May 28
Resolved
YES1D
1W
1M
ALL
Context: Slime Mold Time Mold has an interesting hypothesis: environmental contaminants cause obesity. Elizabeth has found two data sources which could be combined to test this hypothesis.
1: EWG's drinking water database, which has contaminants by zip code https://data.world/arnholdinst/drinking-water-contaminations
2: County Health Rankings, which has % obesity by county https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation
There's a $100 bounty for somebody to combine these two into a single spreadsheet, which would allow a statistician to run an analysis. https://simplemaps.com/data/us-zips could provide the mapping of zip codes to counties. More discussion here: https://www.lesswrong.com/posts/kjmpq33kHg7YpeRYW/briefly-radvac-and-smtm-two-things-we-should-be-doing?commentId=rwyGuSS5QnRtBRook
This market resolves to YES iff someone posts a link to such a combined dataset that is accepted by Elizabeth as sufficiently good. If you'd like to work on this, consider posting a comment beforehand indicating your intent!
Close date updated to 2022-12-31 11:59 pm
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
Looks like Elizabeth has gone ahead and published the results! https://www.lesswrong.com/posts/ardqtuGaXntyEN3M5/new-water-quality-x-obesity-dataset-available
Reopening so people can trade/cash out. This is still a backlog task for me to follow up with, someday...
I didn't specify a "criteria end date" in the original market description; I'm not sure if traders expected the original close date to be the criteria end date, though. I can see arguments either way; I hadn't thought much about it at market creation time, but basically would have gone with "indefinitely"
Sorry, the ball had been in my court. Elizabeth mentioned that the datasets need to be combined in a different way, and I needed to forward this to Josh/Oliver; I've done so now.
Thank you for the reminder, Adam - it was a very helpful ping! I've also already paid out half the bounty ($50) each to Josh and Oliver; resolving this market to YES would be contingent on Elizabeth accepting the final dataset.
Updated files linked here: https://www.dropbox.com/sh/3glaxsg9jx0hczm/AAALAVmLQ7uCSflNhTg4oQBoa?dl=0
There are two files, a CSV file with only the "payoff" sheet (contaminants, counties, includes obesity rates in rightmost columns); and an Excel document that contains all source sheets, including a cover page, both sources, a zip code database (I've manually edited it and noted those edits, and done additional analysis to the point where I would consider it a distinct source) and a country database. The Excel document would be worth looking at if you want to understand the methodology more.
Cheers, and let me know if there are any more issues!
Beautiful. So excited to have two separate sources confirming this - can't wait to see what the results of the analysis are!
Josh and Oliver, can you email me at akrolsmir@gmail.com with details for payment? I can do Paypal to an email or Venmo; or we can work something else out.
(And as before - if it's convenient, would love to see the source posted!)
https://docs.google.com/spreadsheets/d/1rIQeUQ0KfCCnJEa8Dw9UyqeAuatGYiz-epddKI1NE28/edit?usp=sharing
This was my attempt. I used R studio and can share my code if you want.
Hey Josh -- this sounds great! Unfortunately I can't actually open the file in Google Sheets, Excel, or Numbers, possibly due to the sheer size... perhaps a csv would be better?
Since I can't check the file - does the updated contaminants sheet also have a column for the %obesity, in addition to county name?
Finally, would you be willing to open-source the code? Completely optional, but having the code might allow the statistician to audit it/reconfigure it for studying something else.
Take a look here: https://docs.google.com/spreadsheets/d/1d-1pmlsQjTGyPWOIpDragzX_9fQyYDVh/edit?usp=sharing&ouid=112951040301709429437&rtpof=true&sd=true
Let me know if this is what you need or if there is additional work that the statistician needs before they can begin. Contaminants sheet has been updated with county names that fully match to the County Health data
Honestly, I'm much more concerned that no one will do this than too many people try. In the off chance multiple people accidentally take this on at the same time, I'm willing to guarantee full $100 bounties to the first 3 people who do a sufficiently good job. (Open to paying for further good submissions, as long as I don't think you're trying to exploit this.)
Related questions
Related questions
Will the contaminant hypothesis of modern obesity be judged true by expert consensus before 2025?
3% chance
Will the contaminant hypothesis of modern obesity be judged true by expert consensus before 2032?
8% chance
Convince Me: Does the 1980s acceleration in obesity increase a substantial break from previous trends?[100-1k bounty]
á¹€5 bounty