The gaussian distribution

Perhaps the most important distribution in all of statistics is the Gaussian distribution, also known as normal distribution. It has a bell shape and spans all the real line, in fact for every real value xx we have fX(x)>0f_X(x)>0.

To be precise we don't have one single normal distribution, we have instead an infinite family of normal distributions, each characterized (or more formally "parameterized") by μ\mu and σ2\sigma^2; for this reason we usually denote a normal distribution as N(μ,σ2)N(\mu,\sigma^2).

:       :

The probability density function of a normal distribution is

f(x)=12πσ2e(xμ)22σ2f(x)={{1\over\sqrt{2\pi\sigma^2}}e^{-{(x-\mu)^2\over 2 \sigma^2}}}

It may look scary at first, but let's unpack it bit by bit:

  • the fraction 12πσ2{1\over\sqrt{2\pi\sigma^2}} is just a normalization term, which means that it's there just to make sure that the area under the curve is 11, this is a convention throughout probability but it doesn't fundamentally change the essence of the distribution
  • the second term is the exponential of a negative quantity, so it will approach 00 very fast as the exponent gets larger
  • at the exponent we have (xμ)2(x-\mu)^2, this bit is symmetric with respect to μ\mu and thus it makes the whole distribution symmetric with respect to μ\mu
  • finally at the denominator of the exponent we have σ2\sigma^2, so for larger values of σ2\sigma^2 the exponent will increase slower and thus the whole distribution will approach 00 slower

The probability distribution is defined with μ\mu and σ2\sigma^2 as parameters, which may look confusing because μ\mu and σ2\sigma^2 are also used to refer to the mean and the variance of a random variable. However it turns out that if we have a random variable XX with distribution N(μ,σ2)N(\mu,\sigma^2), which we can also compactly write as XN(μ,σ2)X\sim N(\mu,\sigma^2), then E[X]=μ\mathbb{E}[X]=\mu and Var(X)=σ2Var(X)=\sigma^2; we can easily prove the statement about the expected value leveraging the symmetry of the distribution, the one about the variance is less straightforward.


Let's start by considering μ=0\mu=0: take a random variable XN(0,σ2)X\sim N(0,\sigma^2), then by definition its expected value is

E[X]=+xf(x)  dx\mathbb{E}[X]=\int_{-\infty}^{+\infty} x f(x)\; dx

As we noticed above the function ff is symmetric with respect to μ\mu, so in this case we have f(x)=f(x)f(x)=f(-x). We can leverage this fact by splitting the integral in two parts and then performing a variable change in the first one:

E[X]=0xf(x)  dx  +  0+xf(x)  dx=0+xf(x)  dx  +  0+xf(x)  dx=0+xf(x)  dx  +  0+xf(x)  dx=0+xf(x)  dx  +  0+xf(x)  dx=0\begin{aligned} \mathbb{E}[X]&=\int_{-\infty}^{0} x f(x)\; dx \; + \; \int_{0}^{+\infty} x f(x)\; dx\\ &=\int_{0}^{+\infty} -x f(-x)\; dx \; + \; \int_{0}^{+\infty} x f(x)\; dx\\ &=\int_{0}^{+\infty} -x f(x)\; dx \; + \; \int_{0}^{+\infty} x f(x)\; dx\\ &=-\int_{0}^{+\infty} x f(x)\; dx \; + \; \int_{0}^{+\infty} x f(x)\; dx\\ &=0 \end{aligned}

To generalize the proof to the case with any μ\mu let's first notice that given any random variable XX and a constant kk we have that

E[X+k]=(x+k)f(x)  dx=xf(x)  dx  +  kf(x)  dx=xf(x)  dx  +  kf(x)  dx=E[X]+k1=E[X]+k\begin{aligned} \mathbb{E}[X+k]&=\int (x + k) * f(x)\; dx\\ &=\int x f(x)\; dx \; + \; \int k f(x)\; dx\\ &=\int x f(x)\; dx \; +\; k\int f(x)\; dx\\ &=\mathbb{E}[X] + k \cdot 1\\ &=\mathbb{E}[X] + k \end{aligned}

Then we can conclude the thesis by noticing that if XN(μ,σ2)X\sim N(\mu, \sigma^2) then XμN(0,σ2)X-\mu\sim N(0, \sigma^2), as we can easily verify that fN(μ,σ2)(x)=fN(0,σ2)(xμ)f_{N(\mu,\sigma^2)}(x)=f_{N(0,\sigma^2)}(x-\mu).
Trivially X=Xμ+μX=X-\mu + \mu, so E[X]=E[Xμ+μ]=E[Xμ]+μ\mathbb{E}[X]=\mathbb{E}[X-\mu+\mu]=\mathbb{E}[X-\mu]+\mu, but we have proven that E[Xμ]=0\mathbb{E}[X-\mu]=0, so E[X]=μ\mathbb{E}[X]=\mu.

From the properties of mean and variance that we discovered in the previous chapter follows that if we have a random variable XX with distribution N(μ,σ2)N(\mu, \sigma^2), then:

  • X+kX+k has distribution N(μ+k,σ2)N(\mu+k, \sigma^2)
  • X/hX/h has distribution N(μh,σ2h2)N\left({\mu\over h}, {\sigma^2\over h^2}\right)
  • X+kh{X+k\over h} has distribution N(μ+kh,σ2h2)N\left({\mu +k\over h}, {\sigma^2\over h^2}\right)

Stability of the gaussian distribution

An interesting property of the normal distribution is that any linear combination of normally distributed random variables is still normally distributed, that is if we take XN(μx,σx2)X\sim N(\mu_x, \sigma_x^2) and YN(μy,σy2)Y\sim N(\mu_y, \sigma_y^2) independent random variables then Z=aX+bYZ=aX+bY (where aa and bb are real numbers) has distribution N(μz,σz2)N(\mu_z,\sigma_z^2). This is property is called stability of the distribution.
Furthermore from the properties of the expected value and the variance seen in the previous chapter follows that μz=aμx+bμy\mu_z=a\mu_x+b\mu_y and σz2=a2σx2+b2σy2\sigma_z^2=a^2\sigma_x^2+b^2\sigma_y^2.

Down below you can see what happens when you sum two normally distributed random variables:

Mean 1:       Variance 1:
Mean 2:       Variance 2:

Notice that not all distributions have the stability property, for example the uniform distribution is not stable:

Mean 1:       Variance 1:
Mean 2:       Variance 2:


If shown the sequence 1,12,13,14,...1,{1\over2},{1\over3},{1\over 4},..., even without formal mathematical training, you'd easily recognize that the sequence is approaching 00, or more formally that the limit of the sequence is 00; we can define a similar concept of limit also for random variables.

In the last chapter we have seen that a random variable is identified by its cumulative distribution function (CDF), so it seems natural to defined the convergence of a sequence of random variables by looking at the sequence of the CDFs: given any sequence of random variables X1,X2,...X_1,X_2,... we say that the sequence has as limit the random variable XX if

xR.  limn+FXn(x)=FX(x)\forall x\in\mathbb{R}.\; \lim_{n\to+\infty} F_{X_n}(x)=F_X(x)

Which means that the value of the CDF of XnX_n at any point approaches the value of the CDF of XX at that same point. Visually it looks like this


The Central Limit Theorem

Many real life random variables, for example height and weight, follow a normal distribution, so studying its properties allows us to have more tools to study real life phenomena. This alone would make the Gaussian distribution a very important distribution, but its applicability is far wider thanks to the Central Limit Theorem, whose statement is the following:

Given nn independent random variables X1,...,XnX_1,...,X_n all having the same distribution with average μ\mu and variance σ2\sigma^2, define Zn=Xˉnμσ2/nZ_n={\bar X_n -\mu\over \sqrt{\sigma^2/n}} where Xˉn\bar X_n is the average of the variables XiX_i, then the limit of the sequence ZnZ_n is a random variable with distribution N(0,1)N(0,1).

Let's unpack all that jargon:

  • we take nn independent random variables with any distribution, not necessarily normally distributed, but it has to be the same for all the variables
  • we average all the variables, just like we did in the law of large numbers. This time however we shift the result by subtracting μ\mu, this makes it so that E[Zn]=0\mathbb{E}[Z_n]=0 (you can verify that by doing a bit of calculations if you feel like it), and then we scale the result by dividing it by σ2/n\sqrt{\sigma^2/n}, this makes it so that Var(Zn)=1Var(Z_n)=1 (again you can check it for yourself)
  • the incredible result is that as nn gets larger and larger the distribution of this "modified" average tends to a normal distribution, meaning that its cumulative distribution function (CDF) gets closer and closer to the CDF of the normal distribution N(0,1)N(0,1)

Unfortunately the proof of this theorem requires some advanced concepts, so we won't work through it, but we will be using this theorem extensively in the following chapters; it will be particularly useful to approximate the distribution of sums of independent random variables when dealing with the correct distribution is too hard.

For example if we choose the random variables XiX_i to represent NN coin flips, then then the distribution of ZnZ_n compared to the N(0,1)N(0,1) distribution looks as follows: