Normal and t distributions
   stats main page  

Introduction

Probability distributions are important when working with random variables. Examples of random variables are:

  • the number of ponds where we detect frogs,
     
  • the number of muntjac we see during a survey,
     
  • the weight of a randomly selected squirrel.

The result is not the same in every experiment, but if we repeat the experiment many times we can investigate the distribution of the result, and select a suitable model and parameter values to describe the population.

On this page we will look at two probability distributions used to describe continuous variables, such at weights of squirrels.


Normal or Gaussian distribution

This is the ‘bell curve’ often used to model measurements such as the heights of people or the weights of squirrels. We use it to answer questions such as, “What’s the probability that a squirrel picked at random will weight 987g?”, provided the variable is 'normally' distributed.

The binomial and Poisson distributions deal with discrete variables – integers or whole numbers – while the normal distribution deals with continuous variables – fractional values are possible. The probability that a squirrel will weight exactly 987g, that is
   987.00000000000000000000000000000000000000000000000000000000000...g
is very nearly zero. But our precision is never that good; if we weigh to the nearest gram, “987g” means “between 986.5 and 987.5g”, and we can calculate the probability of that. For a population mean of 1000g and s.d. 150g, it is:

Pr(986.5 < x < 987.5) = 0.0026.

(We’ll see later how to calculate this.)

If we decided to weigh the squirrels to the nearest 10g, the 987g squirrel becomes a 990g squirrel, and the probability of a randomly-picked squirrel weighing between 985 and 995g is about 10 times higher:

Pr(985 < x < 995) = 0.0265

This means that we can’t describe the probability distribution of the squirrel weights without first deciding how precisely we are going to weigh them.

To get around this, we define a new term, probability density, which is the probability that the weight falls in a particular interval divided by the width of the interval:

probability density(x) = Pr(x – δ < weight < x + δ) / 2δ

(δ is the small Greek letter ‘delta’.) The value of the fraction varies slightly with the value of δ, so we use the value corresponding to the smallest possible value of δ.

The probability density for the normal distribution is given by the probability density function (pdf):

That looks horrible, but in practice we use the functions NORMDIST in Excel or dnorm in R. The value of P(X) depends on two parameters of the distribution, the mean, μ and the standard deviation, σ.

The graph of the normal distribution with μ = 1000 and  σ = 150 produced by R is shown below:

Cumulative probability distribution

The cumulative probability is often the most convenient way of working with normal distributions. For example we can calculate the probability that a randomly-picked squirrel weighs less than 800g. In Excel we use NORMDIST(800, 1000, 150,TRUE) and in R, pnorm(800, 1000, 150) (Note: 'pnorm', not 'dnorm'). Both give

Pr(x < 800) = 0.0912

The cumulative probability is displayed as the area under the pdf curve. The total area = 1, and the probability of x < 800g is the area under the curve to the left of x = 800:

Calculating the probability for an interval

To calculate the probability that the weight of a squirrel falls in a particular range, eg. 985 to 995g, we use the relationship:

Pr(985 < x < 995) = Pr(x < 995) – Pr(x < 985)

Student's t-distribution

If we take very large samples from a normally-distributed population, the distribution of the sample means, , will be also normally distributed. (For more on sample means and the 'sampling distribution of the mean', go here.) But for small samples from a normally distributed population, the distribution has ‘fatter’ tails; the graph below shows the t-distribution for means of samples of n = 4, with the corresponding normal curve as a dotted line.

For large samples, where the normal distribution applies, the 95% CI is ± 1.96 SE. For small samples we need to use the t-distribution, and the value will depend on the sample size. For n = 4, the 95% CI is ± 3.18 SE. The 95% CI is shown by the vertical dashed lines in the graph above. 95% of the probability distribution is contained between the two lines, with 2.5% in each of the tails.

degrees of
freedom
n - 1
multiplier
1 12.7
2 4.3
3 3.2
4 2.8
5 2.6
6 - 7 2.4
8 - 9 2.3
10 - 13 2.2
14 - 27 2.1
> 27 2.0

The factor to use is shown in the table on the right. You can calculate this:

  • in Excel, with =TINV(0.05, 3). The first argument, 0.05, is the proportion you want to exclude.
     
  • in R, with qt(0.975,3). The first argument gives the cut-off point for the right-hand line in the graph, which has 0.975 (97.5%) of the distribution to the left.

The second argument in each case, 3, is the ‘degrees of freedom’, n ‑ 1.

What is a ‘small sample’?

If the population distribution is not normal and sample sizes are small, the distribution of sample means may not correspond to the t-distribution, and may not even be symmetrical. In that case, you need other methods to calculate confidence intervals, such as the bootstrap method.


Points to recall

  • In biology we are often dealing with random variables. We deal with randomness by modelling the variable as a probability distribution and estimating the parameters.
     
  • With continuous variables, probability is only meaningful for a range of values (eg.
    985 < x < 995), so we use probability density functions such as the normal (Gaussian) and t-distribution functions.
     
  • We use the t-distribution to calculate confidence intervals (CIs) for estimates of population means, provided (a) the sample is large, or (b) the population itself is roughly normally distributed.
     
  • These models and inferences are only valid if our samples are genuinely random samples of the variable we are interested in!
wcsmalaysia.org home Text by Mike Meredith, updated 10 May 2009