Frogs 2 : Likelihood
   stats main page

Data file (xls, 27kb)

Objectives

In "Frogs 1 : binomial distribution" we simulated some data for detections of frogs at ponds where we knew frogs were present. Here we will use those data to explore the concepts of likelihood and maximum likelihood estimation.

Likelihood was developed by Ronald A. Fisher in the 1920s and it became a cornerstone of information theory in the 1960s. Quick-and-dirty methods to calculate maximum likelihood estimators are available for simple problems, but complex models had to wait until modern computers became available.

The table of probabilities and likelihoods was done in an Excel spreadsheet which you can download here.


Estimating detection probability

When we simulated data for the detection of frog calls at 10 ponds, participants 'detected' frogs at varying numbers of ponds, mostly between 5 and 9, but with one person detecting frogs at only 3 ponds. As a result, the estimates of detection probability, , ranged from 0.3 to 0.9.

I 'detected' frogs at x = 6 ponds out of n = 10, and my estimate of was 6/10 = 0.6. How can I justify going for 0.6 and not 0.7 (which was the true value) or 0.5, ... or 0.65 or 0.59 for that matter?

Likelihood

To answer that question, we need to look at the probability of detecting frogs at 6 ponds out of 10 with various values of p(detect). The table below shows the probabilities of the different outcomes (x = 0, 1, 2, ... 9, 10) for different values for p(detect). These were calculated using BINOMDIST in the Excel spreadsheet "Frogs2_likelihood.xls".

    x = number of ponds where frogs were detected
    0 1 2 3 4 5 6 7 8 9 10
0 1 0 0 0 0 0 0 0 0 0 0
0.1 0.35 0.39 0.19 0.06 0.01 0.00 0.00 0.00 0.00 0.00 0.00
0.2 0.11 0.27 0.30 0.30 0.09 0.01 0.00 0.00 0.00 0.00 0.00
0.3 0.03 0.12 0.23 0.27 0.20 0.10 0.04 0.01 0.00 0.00 0.00
0.4 0.01 0.04 0.12 0.21 0.25 0.20 0.11 0.04 0.01 0.00 0.00
0.5 0.00 0.01 0.04 0.12 0.21 0.25 0.21 0.12 0.04 0.01 0.00
0.6 0.00 0.00 0.01 0.04 0.11 0.20 0.25 0.21 0.12 0.04 0.01
0.7 0.00 0.00 0.00 0.01 0.04 0.10 0.20 0.27 0.23 0.12 0.03
0.8 0.00 0.00 0.00 0.00 0.01 0.03 0.09 0.20 0.30 0.27 0.11
0.9 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.06 0.19 0.39 0.35
1 0 0 0 0 0 0 0 0 0 0 1

Each row of the table gives the probability Prob(X = x | p) given a specific hypothetical value of p from 0 to 1 (the vertical bar, |, means "given"). Each row adds up to 1. In the previous lab ("Frogs 1...") we displayed a table with just the row for p = 0.7, ie. Prob(X = x | p = 0.7); that's the row highlighted in yellow in the table above.

My result was x = 6. Look down the column corresponding to x = 6, which is highlighted in blue.

Note that the values in the column do not add up to 1; this column adds up to 0.909. In the Excel spreadsheet, I put in extra rows for p = 0.55, 0.59, 0.61 and 0.65; with those extra rows the column total went up to 1.886. In fact, we can put in as many rows for hypothetical values of p as we like, and the column total will change each time.

The values in the highlighted column are the likelihoods of the different values of p for the observed result of x = 6. We use a curly L for likelihood:  curly-L(p | x = 6).

  • We use the term "probability" if we are talking about different values of x for a fixed value of p.
     
  • We use the term "likelihood" if we are talking about different values of p for a fixed value of x.

If both p and x are fixed, they are the same; for example (the grey square in the table):

curly-L(p = 0.7 | x = 6) = Prob(x = 6 | p = 0.7) = 0.20

An analogy with streets and avenues

Think of a city such as New York where the roads are laid out in a grid. Avenues run north-south and Streets go east-west. Suppose you are at the intersection of 5th Avenue and 34th Street, right outside the Empire State Building. If you head west, you go along a Street; if you head north, it's an Avenue. In the same way, if you head west from the grey cell in the table, it's Probability; if you head north, it's Likelihood. But at the intersection, it's ... well, both.

Plots of probability and likelihood

The graphs below show the probability distribution of x for p = 0.6 (left) and the likelihood curve for p when x = 6 (right). The red line in each graph corresponds to the value when p = 0.6 and x = 6, which is the maximum likelihood, 0.251.

Maximum Likelihood Estimate

Now look down the x = 6 column (blue) and see which value of p has the maximum likelihood. You'll see it's p = 0.6. In the spreadsheet, I put in rows for p = 0.59 and 0.61, just to make sure that p = 0.6 really was the maximum.

This value, (“p hat”), is the maximum likelihood estimate of p based on our result of x = 6 detections.

To put it the other way around, it's the value of p which maximizes the probability of observing x = 6, rather than other possible values of x.

We can calculate the estimate just with = x / n; this formula is termed the "maximum likelihood estimator" for p.

Strictly speaking...

Strictly speaking, the likelihood for given x and p is proportional to the probability for those values, ie.

curly-L(p | x) = Prob(x | p) x C

where C is a constant for all values of p and x; we have simply taken C = 1. This works for discrete data such as we have in this example. In many cases, such as when using a probability density function, we can only calculate relative likelihoods, ie. C 1. We can still use this to compare likelihoods and in particular to find the maximum likelihood.


Main points

  • Given a fixed value of the parameter, p, we can use the binomial distribution to calculate the probability of observing each of the possible values of x, x = 0, 1, 2, ... 9, 10.
     
  • Given a fixed value of x, the binomial distribution gives us the relative likelihood of different values of the parameter p.
     
  • The value of p with the maximum likelihood is the maximum likelihood estimate, (“p hat”).
     
  • In this simple case, we don't need to calculate the likelihoods: we can use the formula = x / n, which is the maximum likelihood estimator for p.

What next?

For more on modeling occupancy - including using likelihood and AIC to select models -go here.

wcsmalaysia.org home Text by Mike Meredith, updated 11 May 2009