My Big Kink Survey has around 350k female responses. In this I gave people a list of mental illnesses as checkboxes. One of these was anxiety.

I had a second question which asked "Of these that you checked, which one is the most severe for you?"

If people checked having anxiety, but didn't check it as most severe, their answer counted as 1. If they did also check anxiety as it being the most severe, I counted their answer as 2.

Will 'having anxiety' in females correlate more than r=0.12 with any of the fetishes (~500 or so) in my data?

## Related questions

# 🏅 Top traders

# | Name | Total profit |
---|---|---|

1 | Ṁ687 | |

2 | Ṁ682 | |

3 | Ṁ546 | |

4 | Ṁ489 | |

5 | Ṁ419 |

@PatS Yes in the general case implies Yes in at least some specific cases. If you are no more likely to have any given fetish if you have anxiety than if you don't, then you are also no more likely to have a fetish in general, unless having anxiety makes you less likely to have multiple fetishes at the same time without affecting the probability of any specific fetish. The latter situation would be extremely bizarre, and I can't imagine any reasonable explanation for why it would be true, so my credence that the general case implies the disjunction of specific cases is extremely high.

I used to strongly expect that every psychological trait would correlate with some sexual fetish or other because you're doing a zillion comparisons, but then aella found a bunch of negative results so I updated downward. It is really weird that you can pick some trait at random and try to correlate it with 500 different fetishes and not even find one out of 500 with r>0.12. I still kind of suspect a bug in her code.

@JonathanRay In a high-dimensional vector space, the exponential majority of pairs of vectors will be ~independent.

@JonathanRay There's such thing as adjustment for multiple comparisons for p-value; @tailcalled's comment suggests r does this somewhat-automatically instead?

@b575 The traditional adjustments for multiple comparisons are done because people only work with small samples and therefore have some random noise in their computed correlations. But Aella has a sample size of like half a million, so the random noise in her correlations will only be +/- 0.004 or something like that. So basically the adjustment is not needed.

@tailcalled Wait, what? At least, of p-value, that's not why they are done; they are done because that is quite literally how the probability shifts for independent comparisons (if you use the Šidák formula of 1-(1-p)^n - or, reversed, 1-(1-p)^(1/n); if you use the Bonferroni correction, it's a little different by they at least share the intuition). It's not true that you can omit the adjustment if you increase your sample, because noises of your sample may well be persistent rather than truly random.

@b575 When I was talking about the correlations, I was talking about r-values, not p-values. What I'm saying is that there is absolutely no reason to use p-values in the regime Aella is working in, and therefore no reason to use multiple comparison corrections.

I guess I should further clarify. When doing a study, there are various ways you can quantify your estimates, such as raw differences or slopes, d values or r values, percentage differences, etc.. p-values do not go into this category as they do not quantify an estimate of anything.

Instead, the purpose of p-values is this: quantities such as r have an uncertainty due to the sampling process, and this uncertainty means that a seemingly-interesting value could have happened by chance. p-values are a way of quantifying the plausibility of chance making them interesting-looking.

Aella has a sample size of <unreasonably big>, which means that her results are not going to be due to chance, so probably all of her p-values are 0. However the prediction market is about the r-value rather than the p-value.

@tailcalled "Aella has a sample size of <unreasonably big>, which means that her results are not going to be due to chance" - I think we just disagree on how many supposedly interesting things can be due to chance.

@tailcalled OK, let me rephrase: I think you use an overly-strict definition of chance, where only truly-random-noise counts.

@tailcalled Sorry, missed your answer! If there's a slight but consistent bias due to a factor not controlled for explicitly, sample size ain't going to do anything about it, but I would say that this bias is chance in terms of the controlled-for-explicitly factors.

@b575 What I don't understand is what kinds of chance you might have in mind where my points about p-values don't apply but your case for Bonferroni correction applies. AFAIK Bonferroni correction inherently assumes that the notion of chance of interest is the kind I am talking about.

Could you give an example of the sort of factor you have in mind?

@tailcalled There is no case where Bonferroni/Šidák corrections don't apply to p-values of multiple comparisons. There are cases where you may consider p-values **themselves **too uninteresting to bother, but it doesn't stop its applicability.

OK, everything is correlated with everything, just the r is different, right? *Right*. And a number of weak correlations can make up a spurious correlation - whose p-value will be informative for its spuriousness. (Something-something less pirates - more global warming)

@b575 "whose p-value will be informative for its spuriousness": It seems like by "spurious" you just mean "not causal." I don't think p-values are a very good tool for figuring out which correlations correspond to causal relationships. How do you think the p-value for pirates vs global warming looks?

@placebo_username No, not just "not causal". If A and B are both caused by C, it is not (directly) causal but is non-spurious.

(Also, I didn't say I particularly like p-values - just that to the extent you use them, you *must* apply the correction.)

@b575 1. I don't think the less pirates - more global warming correlation is not spurious under this definition. I think they are both caused by more societal development:

less pirates <- more arrests of pirates <- societal development -> more use of fossil fuels -> more global warming

2. I still find your argument confusing, so let me try again.

AFAICT, there might be two notions of "due to chance" that you might be referring to:

a. Spurious correlations due to random sampling,

b. Uninteresting but systematic correlations due to confounding, collider bias, correlated measurement errors, etc.

My basic argument is this: if you are referring to problem a., then it is true that p-values and correcting for multiple comparisons is a somewhat valid and commonly used strategy. However, it is false that problem a. will be likely to drive the results, because problem a. only occurs with small sample sizes, and Aella has a sample size of <unreasonably big>.

On the other hand, if you are referring to problem b., then it is true that problem b. may persist even with a sample size of <unreasonably big>. However, p-values and multiple comparisons corrections are not designed to work for/appropriate to use for problem b..

There does not seem to be any problem which persists at large sample sizes and which is addressed by adjusting for multiple comparisons.

@tailcalled It is not really true that a and b are fully distinct problems. "Random", unless we're in deep quantum mechanics, is fancy name for "deterministic things we don't account for". It is also not true that problem a only happens in small sample sizes - for instance, it doesn't matter how big your overall sample is if one specific bin is ~twenty-thirty people, which **has** happened in Aella's polls before - and, due to the above, not true that p-values only help with problem a. Sampling can be (and usually is) off in multiple uninteresting ways, and the systematicity of this being off is a scale. If it's slightly off, it can generate p-value that's, say, 0.03 (or whatevs). And the chance is higher the more comparisons you draw, for the usual reasons.

As for pirate example, it's a good question whether this is spurious or not. An empirical question, if you will - albeit we can't really make the needed experiments on the needed scale.

"Random", unless we're in deep quantum mechanics, is fancy name for "deterministic things we don't account for".

@b575 This doesn't make a difference. If the deterministic things we don't account for have independent causes, then they would not induce correlations beyond what is expected by chance.

It is also not true that problem a only happens in small sample sizes - for instance, it doesn't matter how big your overall sample is if one specific bin is ~twenty-thirty people, which

hashappened in Aella's polls before

Holding the bin proportions constant, the number of people in each bin scales with the sample size. As such, if you increase sample size, you also increase the number of people in each bin.

Sampling can be (and usually is) off in multiple uninteresting ways, and the systematicity of this being off is a scale. If it's slightly off, it can generate p-value that's, say, 0.03 (or whatevs). And the chance is higher the more comparisons you draw, for the usual reasons.

Not sure what you are referring to here. Sampling being off isn't going to generate any specific p-value.

@tailcalled

> Holding the bin proportions constant, the number of people in each bin scales with the sample size. As such, if you increase sample size, you also increase the number of people in each bin.

Aha, I think this is the core problem: this is patently untrue of Aella's polls, where some bins are small just… just because, in strong disproportion to the overall sample size (e.g. age is rather obviously skewed).

> If the deterministic things we don't account for have independent causes, then they would not induce correlations beyond what is expected by chance.

And how do you measure your "what's expected by chance"? You make a prediction that all p-values for correlations with high r-values will be very small. If that's true, their being corrected won't make it worse - but if that's, as I argue, not quite true, corrections will help. So there is literally no reason not to apply the correction to p-values.

Aha, I think this is the core problem: this is patently untrue of Aella's polls, where some bins are small just… just because, in strong disproportion to the overall sample size (e.g. age is rather obviously skewed).

@b575 No, this is because Aella's polls are unrepresentative, rather than because the sample size is too low. Two different problems.

And how do you measure your "what's expected by chance"? You make a prediction that all p-values for correlations with high r-values will be very small. If that's true, their being corrected won't make it worse - but if that's, as I argue, not quite true, corrections will help. So there is literally no reason not to apply the correction to p-values.

The simplest way of measuring what's expected by chance is called a simulation study. In such a study, you would generate independent data points for two variables, and then compute their correlation. While the data points have been generated by independent means and so there are no systematic factors causing them to correlate, their correlation will not be exactly 0, due to random chance.

If you repeat the simulation study a bajillion times, you get a distribution of correlations due to random chance. From this distribution, you can then see how often the correlation is as big as your observed value in your real dataset. This gives you the p-value.

Usually in practice people don't do this with simulation studies, but instead with math that gives the same results as simulation studies. The key point still applies though: these methods only test against correlations that arise due to random chance in the sampling process, rather than correlations that arise due to systematic factors.

@tailcalled ...Yes. These are what p-values are. This doesn't mean they're not a safeguard against what you say they're not one against. Again, low-level systematic factors are basically indistinguishable from noise.

Again, low-level systematic factors are basically indistinguishable from noise.

@b575 False. Low-level systematic factors are often in the r=0.01 to r=0.15 range, but with a sample size of 460000, noise would be in the r=-0.005 to r=0.005 range. Generally in social science people don't even think of factors that induce correlations of less than 0.01 as being relevant, but that's what the noise would be.

@tailcalled "but with a sample size of 460000, noise would be in the r=-0.005 to r=0.005 range" - for a non-representative, known-to-be-skewed sample?

@b575 Non-representative, known-to-be-skewed refers to noise notion b. p-values and multiple comparisons correction and so on assume noise of notion a.

@tailcalled Again, this is a false dichotomy because (nearly) everything is correlated to (nearly) everything.

@tailcalled Yes. Never on Aella's level of sample sizes though, to be fair. (Actually, that's false, I did analyze Zaliznyak's noun sets, but for fairly simple things.)

@b575 There's lots of cases in serious statistical analysis where the dichotomy is useful, such as:

Power analysis: power analysis only applies to noise of kind a.

Sensitivity analysis: errors of kind a can be easily bounded with math/simulations, making the sensitivity analysis simple, while errors of kind b can take many different sizes.

Path tracing: if you have a specifc model of the data-generating process then that tells you what to expect in the case of kind b, but noise of kind a will still introduce deviations from the expectation.

Pretty much every decision you'd make about how to handle noise depends on whether it is of kind a or kind b. As such I find it very frustrating that you don't want to use the dichotomy.