A large part of the statistics courses I took during my undergraduate degree involved parametric modelling. There we would often use “noise”, “white noise”, and “Gaussian noise” to mean the same thing — independent normally distributed random variables with mean 0, which (hopefully) account for the difference between the theoretical model and the observed data.

I’ve come across this particular piece of jargon again and again, and it almost always refers specifically to the normal distribution. I hadn’t really thought much of that: “noise” and “error” both refer to disturbances in the “true” signal, and we tend to think of errors as normally distributed by default. None of my undergraduate work ever involved noise in the colloquial sense, so I had no idea whether this reflected in any way real noise… until a few days ago.

I’ve been working on a short research project at IBME for the past few weeks. My work involves analysing voice recordings made in uncontrolled environments (mostly just people’s homes), and the data I have also includes short recordings of background noise. I haven’t worked much with those yet, but I did plot a histogram of one I picked at random. Here’s what it looks like!

A histogram of the amplitudes in a 5 second recording of background noise. The blue curve is a Gaussian fit.

Pretty neat, huh? I expected it to look symmetric, etc., but I’m pretty sure this is the best fit of real data to a normal distribution I’ve ever seen.

### Like this:

Like Loading...

Out of curiosity, what is the sample size of the histogram plot? If it is really large then the expectation of the distribution being normal is valid (Central Limit Theorem).

LikeLike

I’m not entirely sure that’s true – I’m looking at single recorded amplitudes, not their sums. If the amplitudes were exponentially distributed, for example, I would expect the histogram to show exponential decay instead of a bell curve, regardless of how many data points I had. What the CLT states is that sums of i.i.d. random variables are approximately normally distributed, not that the variables themselves are approximately normal.

In fact, the measurements aren’t even i.i.d. to begin with (although the CLT can still be valid even in the presence of local dependence). Instead, they’re more like the realisations of a one-dimensional random walk. This is actually not a bad analogy, because I’m looking at the amplitude measured at regular time intervals in a single recording and the differences between neighbouring measurements look reasonably independent. The histogram is then effectively a non-parametric estimate of the stationary distribution of that random walk – and happens to be normal.

There might be a CLT effect somewhere, but I’m not convinced it’s that straightforward.

LikeLike