Frogs 1 : Binomial distribution
   stats main page

Data file (xls, 25kb)

Objectives

Binary data turn up frequently in wildlife biology, as we are often dealing with two possible states: present/absent, alive/dead, marked/unmarked, detected/not detected, etc.

The binomial distribution is used to describe binary data, and we will look at an example with simulated data in this exercise.

The calculations and graphs are calculated in the Excel files "Frogs1_binomial.xls", which you can download here. This also has a spreadsheet to create simulated data using random numbers rather than a die.


Simulating data

When frogs are breeding at a pond, they can often be heard calling in the evening – often, but not always, sometimes they are silent. Imagine a survey where you go to ten ponds where there are frogs and listen for frogs. The probability of detection,
p(detect)
, is actually the probability that frogs are calling.

Instead of going to the ponds, we simulated some data by rolling a die. We used a ten-sided die with spots on some of the sides. If the die landed with a spot on top, the frogs were calling; if there’s no spot, they were silent.

I rolled the die 10 times and got the following results (1=detected, 0=not detected):

0 1 0 0 1 1 1 0 1 1

for a total of x = 6 detections out of n = 10. So my estimate of the probability of detection, = 0.6.

When we put the estimates for all the participants together, they looked like this:

People got different values of the estimate: the estimate of detection probability is a random variable.

Because this was a simulation, we know the true value of p(detect): the die had spots on 7 of its 10 sides, so the true probability of detection was 0.7.

Less than half the estimates were correct, though most were close to the true value. And there was one 'outlier' (0.3), corresponding to data which were somewhat improbable given the true value of 0.7. Just how improbable it is we can calculate with the binomial distribution.

The binomial distribution

The binomial distribution gives the probability of obtaining X successes in n independent trials with the same probability of success (p) in each trial.

The formula for the binomial distribution is:

Note that we're dealing with two probabilities here: the probability of detecting frogs on a single visit to a single pond, p(detect), and the probability of detecting frogs at 3 ponds out of 10; we usually use a small p for the 'single visit' probability and use 'P' or 'Pr' or 'Prob' for the overall result. (‘n!’ is ‘n factorial’, the product of n and all the integers less than n; so 4! = 4 x 3 x 2 x 1 = 24.)

You won’t have to calculate that by hand; in R you can use dbinom(x, n, p) and in MS Excel™ BINOMDIST(x, n, p, FALSE).

The probability of hearing frogs at just 3 ponds out of 10, when the true p(detect) = 0.7 is P(X = 3) = 0.009, or about 1%. That's pretty small, but with 16 people in the group, the probability that one person gets X = 3 is about 16 x 1% = 16%, or about 1 in 6.

The probability of all the possible outcomes of visits to 10 ponds are shown in the table and bar graph below:

x : 0 1 2 3 4 5 6 7 8 9 10
Pr(X=x) : 0.000 0.000 0.001 0.009 0.037 0.103 0.200 0.267 0.233 0.121 0.028

The probability of detecting frogs at exactly 7 out of 10 ponds - and hence getting the right estimate of - is actually quite small, only 27%.

Further exploration

Download the Excel file "Frogs1_binomial.xls" and try some of the following changes:

  1. Change p(detect) from 0.7 to other values and see the effect on Pr(X=x).
    If p(detect) = 0.65, what is the probability that the estimate, , will be correct?
  2. Suppose you visited 12 ponds instead of 10. Add 2 extra columns to the table of probabilities and find the probability of hearing frogs at 0,1,2,…,12 ponds out of 12. Change the bar graph to include all 12 ponds.
  3. Try using BINOMDIST(x, n, p, TRUE) instead of BINOMDIST(x, n, p, FALSE). The values are no longer Pr(X=x); what are they?

Main points

  • With binary data, we are interested in estimating the true proportion of some variable in the population, eg. the proportion of evenings in the breeding season that frogs call = probability that frogs will be detected on a single visit.
     
  • When we take a sample, the proportion in the sample will differ among samples (it's a random variable), but will generally be close to the proportion in the population. We use the proportion in the sample as the estimate of the proportion in the population.
     
  • The binomial distribution describes the probability of getting different data sets depending on the true proportion in the population.

What next?

is an example of a maximum likelihood estimate, and we come back to this example again in Frogs 2 : maximum likelihood estimators

We used a similar experiment with simulated squirrel weights to investigate the effect of sampling for a continuous variable.

wcsmalaysia.org home Text by Mike Meredith, updated 10 May 2009