![]() ![]() ![]() |
Normal and t distributions | ||||||||||||||||||||||
|
|
|||||||||||||||||||||||
IntroductionProbability distributions are important when working with random variables. Examples of random variables are:
The result is not the same in every experiment, but if we repeat the experiment many times we can investigate the distribution of the result, and select a suitable model and parameter values to describe the population. On this page we will look at two probability distributions used to describe continuous variables, such at weights of squirrels. Normal or Gaussian distributionThis is the ‘bell curve’ often used to model measurements such as the heights of people or the weights of squirrels. We use it to answer questions such as, “What’s the probability that a squirrel picked at random will weight 987g?”, provided the variable is 'normally' distributed. The binomial and Poisson distributions deal with discrete
variables – integers or whole numbers – while the normal
distribution deals with continuous variables –
fractional values are possible. The probability that a
squirrel will weight exactly 987g, that is Pr(986.5 < x < 987.5) = 0.0026. (We’ll see later how to calculate this.) If we decided to weigh the squirrels to the nearest 10g, the 987g squirrel becomes a 990g squirrel, and the probability of a randomly-picked squirrel weighing between 985 and 995g is about 10 times higher: Pr(985 < x < 995) = 0.0265 This means that we can’t describe the probability distribution of the squirrel weights without first deciding how precisely we are going to weigh them. To get around this, we define a new term, probability density, which is the probability that the weight falls in a particular interval divided by the width of the interval: probability density(x) = Pr(x – δ < weight < x + δ) / 2δ (δ is the small Greek letter ‘delta’.) The value of the fraction varies slightly with the value of δ, so we use the value corresponding to the smallest possible value of δ. The probability density for the normal distribution is given by the probability density function (pdf):
That looks horrible, but in practice we use the functions NORMDIST in Excel or dnorm in R. The value of P(X) depends on two parameters of the distribution, the mean, μ and the standard deviation, σ. The graph of the normal distribution with μ = 1000 and σ = 150 produced by R is shown below:
Cumulative probability distributionThe cumulative probability is often the most convenient way of working with normal distributions. For example we can calculate the probability that a randomly-picked squirrel weighs less than 800g. In Excel we use NORMDIST(800, 1000, 150,TRUE) and in R, pnorm(800, 1000, 150) (Note: 'pnorm', not 'dnorm'). Both give Pr(x < 800) = 0.0912 The cumulative probability is displayed as the area under the pdf curve. The total area = 1, and the probability of x < 800g is the area under the curve to the left of x = 800:
Calculating the probability for an intervalTo calculate the probability that the weight of a squirrel falls in a particular range, eg. 985 to 995g, we use the relationship: Pr(985 < x < 995) = Pr(x < 995) – Pr(x < 985) Student's t-distributionIf we take very large samples from a
normally-distributed population, the distribution of the
sample means,
For large samples,
where the normal distribution applies, the 95% CI is
The factor to use is shown in the table on the right. You can calculate this:
The second argument in each case, 3, is the ‘degrees of freedom’, n ‑ 1. What is a ‘small sample’?If the population distribution is not normal and sample sizes are small, the distribution of sample means may not correspond to the t-distribution, and may not even be symmetrical. In that case, you need other methods to calculate confidence intervals, such as the bootstrap method. Points to recall
| |||||||||||||||||||||||
| Text by Mike Meredith, updated 10 May 2009 | |||||||||||||||||||||||
![]() ![]() |
|||||||||||||||||||||||