$100 bounty: Will someone combine these two datasets between obesity & water contamination?
26
385
108
resolved May 28
Resolved
YES
Context: Slime Mold Time Mold has an interesting hypothesis: environmental contaminants cause obesity. Elizabeth has found two data sources which could be combined to test this hypothesis. 1: EWG's drinking water database, which has contaminants by zip code https://data.world/arnholdinst/drinking-water-contaminations 2: County Health Rankings, which has % obesity by county https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation There's a $100 bounty for somebody to combine these two into a single spreadsheet, which would allow a statistician to run an analysis. https://simplemaps.com/data/us-zips could provide the mapping of zip codes to counties. More discussion here: https://www.lesswrong.com/posts/kjmpq33kHg7YpeRYW/briefly-radvac-and-smtm-two-things-we-should-be-doing?commentId=rwyGuSS5QnRtBRook This market resolves to YES iff someone posts a link to such a combined dataset that is accepted by Elizabeth as sufficiently good. If you'd like to work on this, consider posting a comment beforehand indicating your intent! Close date updated to 2022-12-31 11:59 pm
Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ31
2Ṁ28
3Ṁ14
4Ṁ8
5Ṁ7
Sort by:
predicted NO
Looks like Elizabeth has gone ahead and published the results! https://www.lesswrong.com/posts/ardqtuGaXntyEN3M5/new-water-quality-x-obesity-dataset-available
predicted NO
(FWIW, if I had to resolve it today, I would resolve it to 50%, as I've paid out 50% bounties to two separate sources)
predicted NO
Reopening so people can trade/cash out. This is still a backlog task for me to follow up with, someday... I didn't specify a "criteria end date" in the original market description; I'm not sure if traders expected the original close date to be the criteria end date, though. I can see arguments either way; I hadn't thought much about it at market creation time, but basically would have gone with "indefinitely"
After nearly 2 months, it seems that the "No"s had it IRL.
bought Ṁ1 of YES
Sorry, the ball had been in my court. Elizabeth mentioned that the datasets need to be combined in a different way, and I needed to forward this to Josh/Oliver; I've done so now. Thank you for the reminder, Adam - it was a very helpful ping! I've also already paid out half the bounty ($50) each to Josh and Oliver; resolving this market to YES would be contingent on Elizabeth accepting the final dataset.
bought Ṁ1 of NO
mostly confused as to why this hasn't resolved yet. I don't expect Austin to defect, but I could see a world where they analysis posted so far does not meet their definition of "combined" such that future attempts are discouraged.
bought Ṁ1 of YES
Updated files linked here: https://www.dropbox.com/sh/3glaxsg9jx0hczm/AAALAVmLQ7uCSflNhTg4oQBoa?dl=0 There are two files, a CSV file with only the "payoff" sheet (contaminants, counties, includes obesity rates in rightmost columns); and an Excel document that contains all source sheets, including a cover page, both sources, a zip code database (I've manually edited it and noted those edits, and done additional analysis to the point where I would consider it a distinct source) and a country database. The Excel document would be worth looking at if you want to understand the methodology more. Cheers, and let me know if there are any more issues!
bought Ṁ1 of NO
Beautiful. So excited to have two separate sources confirming this - can't wait to see what the results of the analysis are! Josh and Oliver, can you email me at akrolsmir@gmail.com with details for payment? I can do Paypal to an email or Venmo; or we can work something else out. (And as before - if it's convenient, would love to see the source posted!)
bought Ṁ1 of YES
Austin, I'll update and turn it into a CSV - I had kept it as an xlsx just because it was multiple sheets, but will remove the excess sheets and just consolidate!
bought Ṁ1 of NO
Thanks, Oliver! It seems like you need to share the sheet with the public?
bought Ṁ1 of YES
https://docs.google.com/spreadsheets/d/1rIQeUQ0KfCCnJEa8Dw9UyqeAuatGYiz-epddKI1NE28/edit?usp=sharing This was my attempt. I used R studio and can share my code if you want.
bought Ṁ1 of NO
Hey Josh -- this sounds great! Unfortunately I can't actually open the file in Google Sheets, Excel, or Numbers, possibly due to the sheer size... perhaps a csv would be better? Since I can't check the file - does the updated contaminants sheet also have a column for the %obesity, in addition to county name? Finally, would you be willing to open-source the code? Completely optional, but having the code might allow the statistician to audit it/reconfigure it for studying something else.
bought Ṁ5 of YES
Take a look here: https://docs.google.com/spreadsheets/d/1d-1pmlsQjTGyPWOIpDragzX_9fQyYDVh/edit?usp=sharing&ouid=112951040301709429437&rtpof=true&sd=true Let me know if this is what you need or if there is additional work that the statistician needs before they can begin. Contaminants sheet has been updated with county names that fully match to the County Health data
bought Ṁ1 of YES
Oh wow, it is a real cash bounty! Then I’d be quite surprised if no one did it.
bought Ṁ10 of NO
Chipping in $10 because this is a cool idea! :) If it turns out neither Josh nor Oliver do it, I might have a go at it. :)
bought Ṁ1 of NO
Yup, serious about it! Thanks for your attempt~ (If anyone else is thinking about trying and hasn't started yet, I'd recommend holding off since your effort might be duplicated; though ofc I'll abide by my initial guarantee)
bought Ṁ1 of YES
I was giving it a shot, but then the FIPS and county names weren't lining up and I called it a night. If you're serious about paying out $100 to multiple people I would finish it up sometime tomorrow.
bought Ṁ1 of NO
That's fantastic; thanks again!
bought Ṁ1 of YES
1 AM here so heading to bed, but should be able to get it done tomorrow, just need to make sure all the county names line up properly
bought Ṁ1 of NO
Only the first! Thanks for offering; any rough sense of when you'd expect to finish this by?
bought Ṁ20 of YES
Are you looking specifically for combining the contaminants by zip code with the % obesity by county? Or are you looking for the full data sets (e.g., all sheets in each dataset) to be merged? If the first, I can get this done.
bought Ṁ1 of NO
Honestly, I'm much more concerned that no one will do this than too many people try. In the off chance multiple people accidentally take this on at the same time, I'm willing to guarantee full $100 bounties to the first 3 people who do a sufficiently good job. (Open to paying for further good submissions, as long as I don't think you're trying to exploit this.)
bought Ṁ91 of YES
I think you should pick someone to do it from a list of people willing rather than first past the post. To avoid dupe effort. I am happy to be a backup person if no one else wants to do it in the next few days.
bought Ṁ1 of NO
To be clear: this would be a USD $100 bounty. Any M$ a trader would like to win from this market is on top of that.
bought Ṁ1 of YES
Manifold cash, I mean.