On the obsession with being normal

In statistics, one of the first distributions that one learns about is usually the normal distribution. Not only because it’s pretty, also because it’s ubiquitous.

In addition, the normal distribution is often the reference that is used when discussion other distributions: right skewed is skewed to the right compared to the normal distribution; when looking at kurtosis, a leptokurtic distribution is relatively spiky compared to the normal distribution: and unimodality is considered the norm, too.

There exist quantitative representations of skewness, kurtosis, and modality (the dip test), and each of these can be tested against a null hypothesis, where the null hypothesis is (almost) always that the skewness, kurtosis, or dip test value of the distribution is equal to that of a normal distribution.

In addition, some statistical tests require that the sampling distribution of the relevant statistic is approximately normal (e.g. the t-test), and some require an even more elusive assumption called multivariate normality.

Perhaps all these bit of knowledge mesh together in people’s minds, or perhaps there’s another explanation: but for some reason, many researchers and almost all students operate on the assumption that their data have to be normally distributed. If they are not, they often resort to, for example, converting their data into categorical variables or transforming the data.

However, although scores are often (but not always) normally distributed in the population, they don’t have to be. The distribution of scores observed in a sample is a function of the population distribution and sampling/measurement error, and will therefore often reflect the population distribution: but the sample scores usually don’t need to be normally distributed, either. The t-test, for example, requires a normal _sampling _distribution, an assumption that is met in virtually all situations by virtue of the central limit theorem. The only situations where is isn’t are situations where you’re so underpowered you shouldn’t be conducting a t-test in the first place.

In most situations, lack of normality is first and foremost a good reason to doubt the validity of your design and/or operationalisations. If you expect the population distribution to be normal, a deviant distribution of sample scores is suspicious. Maybe your recruitment procedure selected for certain subpopulations (which decreases the validity of your design); or perhaps your measurement instruments (e.g. questionnaires, reaction time tasks, etc) didn’t work properly (which decreases the validity of those operationalisations). If nothing seems out of the ordinary, depending on the test you want to do, you often won’t have to do anything. As I explained above, for example, the t-test is robust against violations of normality, unless you’re grossly underpowered.

To illustrate this, take the distribution of age in the Netherlands in 2015:

This is quite a non-normal population distribution. Now, let us take samples of different sizes. Lots of them, so that we can see what the sampling distribution of the mean looks like as our sample size increases:

As we see here, already at a sample of 10 participants, the sampling distribution of the mean has obtained approximate normality.

Now, let’s take an extremely skewed distribution: the distribution of the number of times somebody wins something in a lottery:

This is very skewed to the right. Most people never win anything; some win once, some twice, etc. If we take samples from this distribution, again, the sampling distribution quickly becomes normal:

We need less than 100 participants to get a normal sampling distribution. Note that with 128 participants, you have only 80% power to detect a moderate effect size (Cohen’s d = .5). Most effects in psychology and education science are considerably smaller, and 80% power leaves a probability of Type-2 error (20%) that is much larger than what you accept for the Type 1-error (5%), so if you do a t-test with only 128 participants, you’re being quite underpowered to the degree that it’s questionable whether it’s worth spending resources on the study.

So, in the case of the t-test, normality is no problem. If you’re ever worries, you can use the function normalityAssessment in the userfriendlyscience R package to inspect your sampling distribution (see this help page).

However, for correlations, the situation is less clear. After all, if one distribution is extremely skewed and the other normal, the maximum attainable correlation can no longer be 1. For example, if almost everybody has the lowest score on one variable (so the distribution is right skewed), but the distribution of the other variable is normal, the all those people with the lowest score will necessarily have different scores on the other variable given the distribution shape. This introduces variation: but not covariance, and so, the possible maximum correlation is lower than 1.

To explore this, I wrote some simulations, where I increased the skewness of one distribution and then looked at maximum correlation that could be obtained. These are the results.

This is what we get when both distributions are normal:

If we then make the second distribution slightly skewed, this is what we get:

The maximum obtainable correlation is still quite high - much higher than anything you’d reasonably find in real life. Let’s continue the skewing:

Ok, now we dipped below the .90. Still, nothing as disastrous as you might intuitively expect based on the high skewness of the second distribution. Let’s push it a bit further:

Now, if the other distribution is also skewed, in the opposite direction, the effect becomes even stronger:

And if we exaggerate this, with two extremely skewed distribution (but oppositely skewed), the maximum obtainable correlation drops to .18:

However, these problems only occur if the skewness is opposite: if both distributions are skewed in the same way, the maximum obtainable correlation isn’t affected:

If the distributions are skewed in the same direction, they can even be extremely skewed without much danger to the maximum obtainable correlation:

The picture that emerges (literally, hehe) is that you need very severely skewed distributions before it becomes problematic.

And, even then - if your sample distribution closely matches the population distribution, this still doesn’t have to be problematic. After all, in that case, the correlation may still be an accurate estimator of the correlation in the population: because if in the population, there is almost no variation because the distribution is very skewed, there cannot exist a strong association with another variable, either. Unless, as we just saw, that other variable is similarly skewed. In all these cases, you can still accurately estimate the population correlation using your sample (assuming your sample size is large enough for accurate estimation in the first place, see this excellent paper with sample sizes required for estimation of correlation coefficients).

So, in most cases, deviations from normality aren’t problematic. It all boils down to which tests you want to do and what their assumptions are - and to what happens if the assumptions are violated.

So never panic if your scores aren’t normally distributed. Just find somebody who can help you figure out how bad it is exactly (e.g. a statistician). Often, the deviations from normality won’t be a problem.