Student’s t-test and knowing how much you don’t know

A friend — who is, incidentally, a lot better than me at the whole blogging thing — recently asked me about the t-distribution and what it’s used for. I ended up writing a fairly lengthy email in response and thought I might as well share it with the rest of the Internet.

Imagine the following setting: we have a sample X_1,\dots,X_n of independent normally distributed variables, all with mean \mu and variance \sigma^2. Suppose \sigma^2 is known and we want to test whether some value \hat{\mu} is a feasible guess for \mu.

H0: \mu = \hat{\mu}

H1: \mu \neq \hat{\mu}

The way we would typically go about this is by calculating the sample mean \bar{X} = \frac{1}{n}\sum X_i and saying that under H0 it follows a N(\hat{\mu}, \sigma^2/n) distribution. This means that under H0 the test statistic

Z=\frac{\bar{X} - \hat{\mu}}{\sqrt{\sigma^2/n}}

follows a standard normal distribution and we can proceed with calculating the p-value of this test. Easy!

What happens if we don’t know \sigma^2? It is tempting to use the (unbiased) sample variance

S^2 = \frac{1}{n-1}\sum (X_i - \bar{X})^2

to estimate \sigma^2 and substitute in the Z statistic above.

T=\frac{\bar{X} - \hat{\mu}}{\sqrt{S^2/{n}}}

Assuming T \approx N(0,1) we can compute an approximate p-value that should be fine… right?

Let’s forget about the algebra of it for a second. In the first case, we had some uncertainty about the mean and no uncertainty whatsoever about the variance. What the hypothesis test does is it checks whether the uncertainty that we have is enough to explain the difference between \bar{X} and \hat{\mu}. Not knowing the variance gives us some extra uncertainty on top of what we had before. The more uncertain we are of what we know, the more tolerant we should be to reality not matching our expectations.

Intuitively at least, the approximate p-value above is going to be somewhat conservative, since we’re not accounting for the variability of the sample variance. However, we still expect T to behave more or less like a standard normal — especially if our sample size is large, in which case we’re very confident about our estimate for \sigma^2. We can make several guesses about the density of T based on this idea alone:

  • it looks more or less like a standard normal (i.e. like a bell curve)

  • it has wider tails

  • it’s not fixed, but depends on n: the more data we have, the better our normal approximation

I do love informal approaches, but the academic community doesn’t necessarily share my enthusiasm. In any case we can all agree that knowing the exact p-value is preferable to making an approximation. This is where the t-distribution kicks in.

Definition. Let Z \sim N(0,1) and V \sim \chi^2_k be two independent random variables. The t-distribution with k degrees of freedom is defined to be the distribution of \frac{Z}{\sqrt{V/k}}.

While you can write the density function explicitly, it’s the form above that is the useful one. I won’t go through the algebra of it, but using it you can check that T \sim t_{n-1}. This means we can compute exact p-values for the hypothesis test (or rather we can let R compute them for us). This is what Student’s t-test amounts to!

Now for the really cool part. We can plot the standard normal and several t-distributions with varying degrees of freedom. Here’s what happens:


Our intuition was pretty spot on!