![]() ![]() ![]() |
Bootstrapping biodiversity indices | |
|
|
R Biodiversity scripts (zip, 22KB) | |
Bias and confidence limits for indicesAlthough the better diversity indices are less sensitive to rare species missed from the sample, calculations based on the sample tend to underestimate the true value for the community: estimates are biased low. (Simpson's index is an exception to this, as it has a built-in small-sample correction.) In addition, most indices lack a way to estimate the precision of the result. The concept of bootstrapping
The idea of the bootstrap is shown in the diagram on the right. In the real world, the true diversity index for the population is unknown. We take a sample and estimate the index based on the sample. In the bootstrap world we set up a model population based on what we know of the real population. We take samples from this model population and calculate values for the index from the bootstrap samples. The bootstrap world gives us two major advantages:
We can then use the results from the bootstrap world to calculate an unbiased estimate and confidence interval for the real world population. There are plenty of sources of information on bootstrapping in general: I would recommend Efron and Tibshirani's (1993) book An introduction to the bootstrap, which meshes with the 'bootstrap' package in R. Many other software packages also include a bootstrap option. So here are just a few points specific to bootstrapping diversity indices. Bootstrapping diversity indicesThe bootstrap population is modeled on the sample we have, in that the proportions of the various species are the same. (Ie. this is a nonparametric bootstrap method, as trying to fit an equation to the population would be difficult.) But the bootstrap population is large - much larger than the sample: we draw samples with replacement, so the population never 'runs out'. It behaves as an infinite population. As a result, when we come to calculate the true index for the bootstrap population we should not use a small-sample correction. If you bootstrap Simpson's index: you should not use the n*(n-1) form for the true value, but do use it for the estimates from the bootstrap samples. The bootstrap must be set up so that we are sampling individuals not species. If the real world sample is, say, 170 birds from 20 different species, we might summarize the data as 20 numbers, the abundances for each of the species. But picking 20 values at random from those 20 numbers (with replacement) is not what we want. Instead, we need to represent the population as a string of 170 numbers, each number representing one bird, with the value indicating which species it belongs to. Then we take 170 individuals from that population and see how many we have of each species. If that all sounds a bit convoluted, you'll probably find it becomes clear if you look as an example. The bundle of R scripts and data sets here includes examples of running bootstraps with both Hill's N2 and Simpson's index. | ||
|
Text by Mike Meredith, updated 7 April 2010 |
||
![]() ![]() |
||