Mark-recapture with tigers
   stats main page

Lab guide (pdf, 17KB)
Data file (zip, 1KB)

Objectives

Here we will look at mark-recapture data for tigers in Kanha National Park and work through the analysis in MARK and CAPTURE to estimate the number of tigers in the Park.

You will need program MARK installed on your computer.

MARK uses maximum likelihood estimators to obtain the best estimates of the model parameters, then compares the models using AIC. You should be familiar with these concepts already: check the units "Frogs in ponds - maximum likelihood estimators" and "More frogs in ponds - AIC and likelihood" if necessary.

You will need to download the "Lab guide" and you may want to print this out and have it beside the computer when you are working in MARK.


Background

As part of a country-wide survey of tigers, Ullas Karanth and his team (Karanth et al 2004) used automatic cameras to “trap” tigers in part of Kanha National Park in Madhya Pradesh state in central India. (Kanha is a popular tourist destination and you can get more background information from a web search.)

The cameras were set on 10 occasions totalling 803 camera-trap days. They got photos of 26 individual tigers, which were identified using their distinctive patterns of stripes. Some tigers were “captured” on photos 2 or 3 times, some only once, and the researchers were able to deduce the capture history for each tiger.

The survey was completed in three months; this is short enough for us to be able to assume that the same tigers stayed in the area for the whole time, ie. we have a ‘closed population’ of tigers.

The tiger data

Download "Mark-recap_tigers.zip" and extract the data file “Kanha_tiger.inp”. Open it with Notepad:

/* Kanha tiger data from the WCS India Program */
/* 1 */ 1001000110 1;
/* 2 */ 1000000101 1;
/* 3 */ 1101000011 1;
/* 4 */ 0110000111 1;
/* 5 */ 0101000001 1;
/* 6 */ 0100000000 1;
/* 7 */ 0010011000 1;
/* 8 */ 0011000000 1;
/* 9 */ 0001000011 1;
/* a */ 0001100110 1;
/* b */ 0001001001 1;
/* c */ 0000100000 1;
/* d */ 0001000000 1;
/* e */ 1001000000 1;
/* f */ 0001000000 1;
/* g */ 0001100000 1;
/* h */ 0001001000 1;
/* i */ 0000110000 1;
/* j */ 0000100000 1;
/* k */ 0000010000 1;
/* l */ 1000101000 1;
/* m */ 0100000000 1;
/* n */ 0001000001 1;
/* o */ 0000010000 1;
/* p */ 0000000001 1;
/* q */ 0000100000 1;

The first line is just a comment to confirm what data are in the file. MARK ignores anything enclosed with /* … */ characters, assuming it is of interest to humans but not to machines. The codes on the left are the IDs of the individual tigers identified from their coat pattern. MARK will ignore these too, but they help if we need to check or modify the data later.

Then comes the ‘capture history’ for each tiger: the traps were set up for 10 periods, and the string of ten 0’s and 1’s indicates whether that tiger was photographed (1) or not (0) during each of the 10 periods. The last number is the number of animals with that capture history; in this case, each line corresponds to a single tiger, but – for example – tigers “d” and “f” have the same capture history, so we could replace those two lines with a single line: “/*d and f*/ 0001000000 2;”. Each line of data finishes with a semi-colon (;).

Setting up the project and entering data in MARK.

See Block A of the lab guide.

MARK implements a wide range of variations on the mark-capture-recapture theme. Here we are interested in estimating the population size for a closed population, ie. when the same animals remain in the area throughout the study.

Running a first analysis in MARK

Warning! The default analysis in MARK is the most complicated model with the maximum number of parameters (unlike PRESENCE and DISTANCE). It is not possible to estimate all these parameters. In particular, you can't estimate the population size (N) separately, so MARK just puts N = the number of animals you actually captured. Don't take these values and put them in your report! That would be garbage!

See Block B of the lab guide.

In MARK, we can specify models using parameter index matrices (PIMs); the PIMs for Closed Captures are shown below:

The default in MARK uses the maximum possible number of parameters, so there are:

  • 10 parameters for Capture Probability (p), numbered #1 to #10, one for each trapping occasion;
  • 9 parameters for Recapture Probability (c), #11 to #19 (only 9, because you can’t recapture animals on the first trapping occasion);
  • 1 parameter for Population Size (N), #20.

Model M0

The default model is the most complicated, and in fact won't run properly (more on that later). We'll start off with the simplest model, usually designated M0. This has only two parameters, one for all the capture and recapture probabilities and a second one for the population size. See the lab guide for details of how to set this up, run it, and display the detailed output in Notepad (or other text editor).

The first part summarizes the input, including the model name and setup details. There’s a warning that “At least a pair of the encounter histories are duplicates”; we put each tiger on a separate line, so that is okay. Then there’s some technical information and, near the bottom, the ‘real’ results:

                    Real Function Parameters of {M0}
                                              95% Confidence Interval 
Parameter       Estimate     Standard Error   Lower        Upper
--------------- ------------ ------------ ------------ ------------
    1:p         0.2038924    0.0285186    0.1536181    0.2654594 
    2:N         28.446382    2.1734708    26.548565    36.909891

The estimated capture (and recapture) probability is quite high at 0.2, and the estimated population for the part of the Reserve surveyed is 28.4 tigers. Since 26 tigers in the area were identified from the photos, that means that there are only 2 or 3 tigers that weren’t photographed.

Now let’s see what happens with more complex (and maybe more realistic) models.

Comparing models

See Block C of the lab guide.

Model Mb

With this model, animals behave differently after being trapped the first time, so recapture probability is different from capture probability. But neither changes with time. See the Lab Guide for details of how to do this.

The output is similar to that for the M0 model, but with rows for p and c as well as N; c and p are fairly similar and their confidence intervals overlap.

The estimate for population size is 27, with confidence limits of 26 to 39. This is only a little higher than the results for the M0 model.

Model Mt

Capture and recapture probabilities are equal, but they vary with time.

The output contains 10 separate estimates for p, but several of them are the same (0.1425971 and 0.1782464 each occur three times), which is suspicious, though the standard errors seem okay. Check with the original capture histories, and you'll see this pattern:

Occasion 1 2 3 4 5 6 7 8 9 10
No. of captures 5 5 3 13 7 4 4 4 5 8
Capture probability 0.18 0.18 0.11 0.46 0.25 0.14 0.14 0.14 0.18 0.28

So these estimates seem to be an accurate reflection of the data.

The population estimate (28) and its confidence limits (26 to 36) are similar to the other models and look reasonable.

The Results Browser

Now that we've run three models, let's take a look at the Results Browser:

(The "-2Log(L)" column is not shown by default; you can choose what to display with File > Preferences.)

By default, the models are arranged in order of increasing AICc (right-click in the browser window to change this if you want). The simplest model, M0, has lowest AICc and is the best supported of the three, but the Mb model is almost as good.

Model Mh

So far we’ve assumed all tigers have the same probability of being captured, which is unlikely to be true. Even if all the tigers were identical, some might have just one camera trap in their territories, while others had two or three or more, just by chance. We'd expect those with more traps in their territory will have a higher capture probability. So we ought to include heterogeneity among tigers in our model.

MARK can’t handle separate capture probabilities for each animal, but it can "mix" 2 different models, sometimes 3 if you have a big enough data set. A simple way to think of this is to imagine the tigers sorted into 2 groups with different capture probabilities. It's a bit more complicated though: the tigers can't be neatly sorted; instead, the probability of capture of each tiger is determined partly by one model, partly by the other, hence "mixing" rather than "sorting".

The simplest option is to fit a mixture of two M0 models, each with a single capture probability; this is often designated Model Mh2 to indicate a mixture of two models.

See Block D of the lab guide.

 The screen shot above shows the three PIMs for the Mh model:

  • 1 parameter for Probability of Mixture (pi or π), ie. the proportion of capture probability determined by each model;
  • 2 parameters for capture probability (p), one for each model in the mixture, which is the same for all occasions and is the same as the recapture probability;
  • 1 parameter for Population Size (N).

The output for this model looks like this:

                    Real Function Parameters of {Mh}
                                            95% Confidence Interval 
Parameter       Estimate     Standard Error   Lower        Upper
--------------- ------------ ------------ ------------ ------------
    1:pi        0.4920221    0.5318393    0.0147363    0.9843077
    2:p         0.2644006    0.1029851    0.1129405    0.5036527
    3:p         0.1061389    0.1459206    0.0057924    0.7076067 
    4:N         31.520517    8.2640357    26.659093    72.239465 

The M0 model had a single parameter for p; now we have 3 parameters, a p for each model and π telling us how they are mixed. Having 3 parameters instead of 1 increases flexibility and allows the capture probability to be 'smeared out' instead of being a single value for all tigers.

Now our estimate of population size is bigger than the other models and the confidence interval is a lot wider, going up to 72. This always happens with models allowing for heterogeneity: if animals differ in 'catchability', we'll catch most of those with high catchability, but we'll miss lots with low catchability.

Model Mbt

To illustrate some of the problems when you have too many parameters in the model, we'll try running the Mbt model, which has different probabilities of capture and recapture, and different values for each occasion, so 20 parameters in all (this is the default model when you start a Closed Captures project in MARK).

See Block E of the lab guide.

The first thing to notice is in the Results Browser: the "No. Par.", which is 19 instead of 20. What has happened? Let's investigate further.

The "real" results are shown below:

                    Real Function Parameters of {Mbt}
                                    95% Confidence Interval 
Par. Estimate       Standard Error   Lower        Upper
---- ------------   ------------   ------------   ------------
 1:p 0.1923077      0.0772920      0.0823826      0.3870421 
 2:p 0.1904762      0.0856891      0.0733640      0.4115140 
 3:p 0.1176470      0.0781425      0.0295884      0.3683111 
 4:p 0.5333333      0.1288122      0.2929986      0.7591322 
 5:p 0.5714286      0.1870439      0.2298263      0.8562709 
 6:p 0.6666666      0.2721655      0.1535074      0.9566299 
 7:p 0.5241250E-014 0.1023841E-006 -.2006728E-006 0.2006728E-006
 8:p 0.5241250E-014 0.1023841E-006 -.2006728E-006 0.2006728E-006
 9:p 0.5241250E-014 0.1023841E-006 -.2006728E-006 0.2006728E-006
10:p 1.0000000      0.6559917E-004 0.9998714      1.0001286 
11:c 0.2000000      0.1788854      0.0271820      0.6910542 
12:c 0.1111111      0.1047565      0.0153929      0.4998630 
13:c 0.4545455      0.1501314      0.2027556      0.7319458 
14:c 0.1578947      0.0836547      0.0518029      0.3915415 
15:c 0.0869565      0.0587534      0.0218428      0.2888552 
16:c 0.1600000      0.0733212      0.0613702      0.3568734 
17:c 0.1600000      0.0733212      0.0613703      0.3568734 
18:c 0.2000000      0.0800000      0.0857793      0.3998009 
19:c 0.2800000      0.0897998      0.1397323      0.4821555 
20:N 26.000000      0.8539367E-005 26.000000      26.000005 

The values for p for occasions 7, 8 and 9 are estimated as practically zero and the standard error is also practically zero (0.524E-014 means 0.524 x 10-14, which is as small as the computer will go without actually getting to zero). What's going on?

If you look at the capture histories, you'll see that no tigers were captured for the first time on occasions 7, 8 and 9, so the data give no information about probability of first capture; p = 0 is certainly plausible, but not at all certain. These three values cannot be estimated because of deficiencies in the data.

The last p is exactly 1; one tiger was caught on the last occasion (so p can't be zero), but that doesn't mean p = 1, unless of course we've now caught all the tigers! And that is the implication of the estimate of N, exactly 26, which is the number we actually photographed. In fact, these 2 parameters can't be estimated separately; N could be high and p low, or vice versa. There's a full range of pairs of values which fit the data equally well. As a consequence, N cannot be estimated! MARK chooses 1 of these pairs, with the last p = 1 and N = number captured, but that's entirely arbitrary, not something you can put in your thesis or report as the estimated population size.

Note that this is not due to problems with the data. No matter how good or how big your data set, you can't extract estimates for all 20 parameters. So the value of 19 in the Results Browser is actually correct.

Below the real results, you'll see additional output, which wasn't shown for the other models:

Attempted ordering of parameters by estimatibility:
1 16 18 19 17 15 2 14 3 4 13 12 5 11 6 9 7 8 10 20
Beta number 20 is a singular value.

This is where MARK, rather subtly, tells us that N (ie. parameter #20) can't be estimated; in the language of statistics, it's a singular value. Parameter #20 is the last in the second line, as most difficult to estimate, and the other parameters we were wary of, #7-#10, are also towards the end.

Always check the output from MARK carefully to see if the estimation has worked as you intended! Things to look for are:

  • Warnings; some of these may not be serious, but do take a look at each one;
  • A value in the "No. Pars." column which is smaller than you expected;
  • A standard error = 0;
  • A probability estimate close to 0 or close to 1 (but check the data to see if these could be correct);
  • A probability with a huge confidence interval, from near 0 to near 1;
  • An implausibly high estimate for N;
  • A note at the bottom of the output about a "singular value".

To get a proper estimate of N, we need to constrain the last p in some way. We've seen two ways to do this: making all the p's equal (as in model Mb), or make the last p = the last c, which we can estimate (that's model Mt), or both (model M0). MARK allows other possibilities, such as a trend over time or a constant difference between p and c, or using covariates, but these can't be implemented through the PIM matrices.

Comparison with CAPTURE

CAPTURE is an older program (from 1995) which can be run from within the MARK interface. It analyzes closed capture data, but unlike MARK it does not rely on maximum likelihood estimation and AIC. Thus, CAPTURE offers several models not available in MARK, in particular Mh models which allow a different capture probability to be estimated for each animal.

Three features of CAPTURE are important: goodness of fit testing, method of selection of models, and methods of dealing with heterogeneity.

Run CAPTURE with the default settings. See Block F of the lab guide.

CAPTURE output appears in a Notepad window. It starts with a summary of the input data.

Goodness of fit testing:

CAPTURE carries out nine tests (including 5a and 5b); look through them, remembering that a large probability value means that the data fit the model well and values < 0.05 indicate a really poor fit.

Tests 1 to 3 suggest that M0 compares well with the other models (probabilities > 0.1); Tests 4, 5 and 7 give little support for any of the other models (probabilities around 0.05); Test 6 could not be run with the present data set.

Model selection algorithm:

Instead of AIC, CAPTURE combines the results of the goodness-of-fit tests to choose the best model. An algorithm developed from simulated data sets is used to calculate relative appropriateness of the different models .

The appropriateness of each model is estimated as a weighted linear combination of the test results and scaled so that the most appropriate has a score of 1. These are summarized near the end of the output:

Model selection criteria. Model selected has maximum value.

Model    M(o)   M(h)   M(b)   M(bh)  M(t)   M(th)  M(tb)  M(tbh)
Criteria 1.00   0.87   0.22   0.52   0.00   0.27   0.40   0.64

Appropriate model probably is M(o)
Suggested estimator is null.

The result using the recommended model is then given:

Estimated probability of capture, p-hat = 0.2039

Population estimate is      28  with standard error    2.1386

Approximate 95 percent confidence interval   27  to   36

So CAPTURE has also selected the M0 model as the most appropriate, and gives us almost the same results as the M0 model in MARK (upper confidence limit of 36 instead of 37).

Model Mh again:

According to CAPTURE’s analysis, the Mh model is second-best and not far behind M0. The Mh model is more ‘robust’ than the M0 model, because it requires fewer assumptions: the M0 model assumes that the capture probability for all tigers are equal, and if that isn’t true (as most biologists familiar with tigers will tell you), it will give the wrong answers. The Mh model does not require that assumption, but will work fine if in fact capture probabilities are equal. The trade-off is that Mh normally has wider confidence intervals.

CAPTURE provides several alternative Mh models, the most robust being the Jackknife Mh model. Run this now; see Block F of the lab guide.

The important bit in the Notepad file that opens is:

Average p-hat = 0.1758

Interpolated population estimate is     33 with standard error   4.6859

Approximate 95 percent confidence interval   29 to    49

This estimate is higher than the M0 estimate from MARK or CAPTURE, and is even higher than the Mh2 model in MARK. The confidence limits are quite wide, but not as wide as those for Mh2, which might be more realistic.

Conclusion

The assumption that capture probability is the same for all tigers - which underlies the M0 and Mb models - is not biologically plausible, due to the different behaviour patterns of different tigers and the siting of camera traps relative to tiger territories. So we should limit consideration to Mh models.

Karanth et al (2004) used the jackknife estimator in Program CAPTURE and quoted a figure of 33 with a standard error of 4.69, the same results that we obtained running CAPTURE from within MARK.

How do we interpret this figure of 33 tigers, when only a portion (about 30%) of Kanha NP was surveyed? In fact, Karanth et al (2004) were interested in tiger density, which they calculated by dividing the number of tigers by the estimated area sampled. The area sampled was taken as the area of the polygon containing the cameras plus a buffer zone, the width of the buffer zone being half the mean maximum distances moved by individual tigers between captures. (For each tiger caught more than once, they looked at the trap locations and took the distance between the furthest traps; they averaged this over all the tigers; then divided by two.) This resulted in an estimate of 282 km2, and a density of 11.70 tigers per 100km2.

MARK will not do this for you, and does not attempt any estimate of density. New software developed in the last few years takes a more systematic approach to density estimation, relating capture probability to the distance of the trap from the animal's centre of activity - Spatially Explicit Capture-Recapture (SECR). This is implemented in Murray Efford's DENSITY software (http://www.otago.ac.nz/density) and in the R packages 'secr' and 'SPACECAP' (http://cran.r-project.org/mirrors.html).

Take home points

  • Program MARK assists in the analysis of "closed capture" data to estimate the population size. It allows a number of models with different assumptions about the animals' behaviour to be compared. But the models to test should reflect knowledge of the biology of the species concerned.
     
  • The default model in MARK has the maximum number of parameters, and typically these cannot all be estimated. With closed capture models, the population size parameter, N, cannot be estimated.
     
  • Other models may also have problems with parameter estimation, depending on the data set. The signs of these problems are subtle, and it is important to examine the detailed output carefully before accepting the results.
     
  • MARK estimates a population size without reference to the area sampled, so does not produce a density estimate and the population cannot be extrapolated to a larger area. SECR is a recent development which you should consider as an alternative to MARK.

wcsmalaysia.org home

Page updated 18 August 2010 by Mike Meredith