|
Biologists have been extremely creative when
it comes to inventing indices of diversity, and a large number have
been proposed - if not widely used!
In principle, species diversity is a
combination of species richness and species evenness. In practice,
people have generally devised what they see as a sensible measure
of diversity, and then defined evenness in terms of (their measure
of) diversity and species richness.
Which index is 'the best'? There have been
attempts to rank the various indices on a range of criteria and see
which performs best. See Anne Magurran's book Measuring
Biological Diversity Blackwell Publishing, 2004, for more
details.
I'd rather ask, "Which are biologically
meaningful?" and that comes down to a pretty short list: Hill's
diversity numbers, which include Simpson's index (and also a
relative of Shannon's index).
If you want to try out some of these indices in
R, you will find
scripts and supporting material
here.
On this page:
The symbols used are as follows:
- N = the total number of individuals
in the sample (N - not italic - is used for Hill's numbers).
- S = the number of species in the
sample.
- ni = the number of
individuals of species i in the sample,
Σ ni = N.
- pi = the proportion of
individuals of species i in the sample,
pi = ni/N.
- Iname = name's index of
diversity.
- Dname = name's index of
dominance.
- Ename = name's index of
evenness.
- ln(x) = the natural logarithm (base
e) of x
Number of individuals is not the only possible measure of
abundance. In many cases, biomass or area covered or basal area or
energy production may be more appropriate. In that case, pi = the proportion of
biomass (or whatever) of species i in the sample. Those
indices which use pi will still work, but those
involving ni cannot be
used.
Berger-Parker index of dominance:
Berger-Parker's index of dominance is simply the proportion of
the most common species in the community or sample:
DBP = pmax The inverse of the index
of dominance is used as an index of diversity and is one of Hill's
diversity numbers: IBP = 1/DBP
= Ninf The index is very easy to
calculate. See Magurran (2004,
p.117), and Berger & Parker
(1970).
Brillouin index of diversity
This is a bit difficult to calculate, and even more difficult to
know what it might mean biologically. It is: IBrillouin
= ( ln(N!) - Σ ln(ni!)
) / N where N! is N factorial, ie.
N × (N-1)
× (N-2)
× (N-3)
× ...
× 3
× 2 × 1 Brillouin's
index of evenness is based on the diversity index:
EBrillouin = IBrillouin
/ IBrMax
where IBrMax
is the maximum value of IBrillouin,
when all species are equally abundant. In principle we can calculate
that by replacing the ni's with n = N/S,
and Σ ln(ni!) then becomes S ln(n!). Unfortunately, factorial only works if
n is an integer; if it isn't, it gets more complicated! To
begin with, we round n down to a whole number, ie n is
the integer part of N/S. The next problem is that the total
number of individuals is no longer N, since S × n
< N. Put
r = N - S × n
The trick now is to use
n + 1 for r species and n for the rest (ie. for
S - r species), which adds up to the correct value of N.
The formula then becomes: IBrMax
= ( ln(N!) - r ln((n+1)!)
- (S - r) ln(n!) ) / N
Brillouin's indices are not
appropriate for estimation of the diversity of a population on the
basis of a sample, which is what we are trying to do in most cases!
They are appropriate if you have data on the whole population, or if
your 'sample' is not a random sample and you cannot make inferences
about the population anyway.
The calculation can be problematic even
with a computer, as N! becomes huge for quite normal sample
sizes, and the computer then chokes on ln(N!).
In R, you should use the 'lfactorial' function, which handles the
factorial bit internally and returns the logarithm.
See Magurran (2004, p.113).
Fisher's alpha or Log series alpha
Ronald Fisher proposed fitting a 'log series' curve to species
abundance. This model predicts that:
- α x species will
have abundance ni =1,
- α x2 / 2
species will have abundance ni =2,
- α x3 / 3
species will have abundance ni =3,
- ... and so on.
The value of
α (alpha)
which gives predicted values closest to the observed values is
used as a measure of diversity. Several authors insist that it
is a valid measure even if the actual abundance curve is nothing
like a log series curve, which is often going to be the case
when S is small (less than several hundred).
Note that Fisher's alpha applies to samples, and it's not clear
what it might mean when applied to a large community, as it implies
that a high proportion of species have only one individual. It can't
be used for an infinite community consisting of a finite number of
species.
Computation involves finding the value of
α
which bests fits the data,
usually with successive approximations. The 'vegan' package in R
includes the function 'fisher.alpha'.
See Magurran (2004, pp 28-32 and 102-103),
and Fisher et al (1943)
Hill's diversity numbers
Hill produced a family of diversity numbers, corresponding to the
'effective species richness', in which rare species are given
progressively less weight than common species. The general formula
is:

The parameter a can take any value, but we are usually only
interested in positive values. This formula doesn't work for
a
= 1, though N1 can still be calculated (see
below). The main ones of interest are:
-
N0 = species richness (all
species, rare or common, count equally),
-
N1 = eH,
where H is Shannon's index,
- Σ pi ln(pi)
-
N2 = 1/Simpson's index
(without the small sample correction),
-
Ninf = 1/Berger-Parker index
The numbers go down steadily as
a increases:
N0 is the biggest, Ninf
the smallest. Hill also suggested a family of
evenness measures: Ea,b =
Na
/ Nb where a > b
He did not favour measures involving species richness, N0,
because it is so difficult to estimate - he considers it to be more
dependent on sampling than on the actual richness of the community
sampled. He preferred E2,1. Individual Hill
numbers are quite easy to calculate on a computer. The 'vegan'
package in R includes the function 'renyi', which, with the
parameter 'hill = TRUE', will calculate a whole set of Hill numbers
in one go. See Hill (1973)
Margalef's index
Strictly speaking, Margalef's diversity index is not a measure of
diversity, as it does not include any component of evenness. It is
an attempt to estimate species richness independently of the sample
size. The index is: IMargalef
= (S - 1) / ln(N) The index will be
independent of the number of individuals in the sample only if the
relationship between S (or S - 1) and ln(N) is linear.
Unfortunately, this is rarely true, as illustrated in the graph,
which is based on the results of ant surveys at La Selva in Costa
Rica (Longino et al, 2002) Margalef's
index is sometimes touted as a way to compare species richness in
two communities where different numbers of individuals have been
collected. The proper way to do that is to use rarefaction. See
Magurran (2004, p 76-77).
Menhinick's index
Menhinick's index is similar to Margalef's index (see above). It
is: IMenhinick = S
/ √N And it has the
same problems: the graph of S versus root N is typically not linear,
as indicated by the species accumulation curve for ants in La Selva
(Longino et al, 2002). .See
Magurran (2004, p 76-77).
The Q statistic
The Q statistic is a measure of the distribution of species
abundances in the inter-quartile interval, excluding the rarest
quarter of species and the most abundant quarter of species. This
is what we do:
- Find Q1 = S / 4 and Q3
= 3S / 4, rounding them up to whole numbers if
necessary.
- Sort the ni values from smallest to
largest, and note the abundance ni of the Q1th
and Q3th species, nQ1 and
nQ3.
- Count the number of species with abundances between
nQ1 and nQ3, ie. nQ1
< ni < nQ3.
- Count the number of species with abundances equal to nQ1
or nQ3 and divide by 2.
- Add together the results of 3. and 4.
- Calculate ln(nQ3) - ln(nQ1)
- Divide the result from 5. by the result from 6. And that's
it!
I'm not quite sure what all this means in terms of ecology,
but there we are.
See Magurran (2004, pp103-106).
Rényi's entropy
levels
The easiest way to approach Rényi's
entropy levels is to look first at Hill's diversity
numbers (although historically it's the other way round,
Rényi's work preceded Hill's). Like
Hill, Rényi devised a series of
measures depending on how much weight is given to rare species, and
the relationship is simply:
Ha = ln(Na)
where Na is Hill's diversity number of order a.
Rényi's H1
is Shannon's diversity index.
Although Shannon and Rényi provide
simple ways of measuring the 'entropy' or 'information content' of
biological communities, the concepts have not been used very much in
ecology in the intervening four decades. I tend to agree with Hill
that these "apparent lapses into thermodynamics and entropy" are
perplexing rather than illuminating.
The 'vegan' package in R includes the function 'renyi', which will
calculate a whole set of Rényi's entropy
levels in one go.
See Hill (1973) and
Rényi (1961)
Shannon's indices of diversity and
evenness
Shannon's diversity index (usually symbolized by H or H')
is one of the most well-known and widely-used diversity indices,
which is rather surprising considering that it turns out to be
poorer than other indices on most criteria. Maybe the people that
use it don't read the literature on diversity indices!
It comes out of work on information theory by Wiener, applied to
species abundances by Shannon, and hence sometimes referred to as
the Shannon-Wiener index. The formula is:
IShannon = H = -
Σ pi ln(pi)
Warning: Not everyone uses natural logarithms, ln or loge;
Shannon indices based on log2 or log10 are
also out there in the literature, so check before trying to
interpret the values.
Shannon's index is one of Renyi's family of entropy
measures (see above) and Hill's N1 = eH.
Shannon's index of evenness is calculated from the diversity
index. The value of H when all species are equally abundant
(ie. perfect evenness) is simply ln(S), and Shannon's
evenness index is:
EShannon = H / ln(S)
See Magurran (2004, pp106-109).
Simpson's indices of dominance, diversity
and evenness
Simpson devised a measure of dominance which could be applied to
a large population, and which could be estimated without bias from a
sample.
He asked, "If I draw two individuals at
random from this community, what is the probability that they will
belong to the same species?" This is biologically meaningful, and
helps to explain (for example) why wind pollination works in
low-diversity forests but not in high-diversity forests (where the
probability that pollen released by one individual will land on the
flower of an individual of the same species is very low). If the
community is very big, removing one individual will not make any
difference to the probability of drawing the same species a second
time. The probability of drawing species i the first time is
pi and the
probability of drawing species i twice is pi2.
Add up the values for all species and you get Simpson's index of
dominance for a large community (strictly, an infinite
community): DSimpson =
Σ pi2
For a small sample, the probability of drawing species i the
second time is not the same as the first time, as there are now
fewer individuals of species i in the sample. The
probability for the first draw is still pi = ni/N, but for the second draw it
is (ni-1) / (N-1).
So for a finite sample we have: DSimpson
= Σ ni(ni-1)
/ N(N-1) Calculating DSimpson
this way for a random sample drawn from a community gives you
an unbiased estimate of Simpson's index for the community.
(Other indices calculated from sample data systematically
underestimate the value for the community.) There are two options for
Simpson's index of diversity; we can use either the complement
or the inverse of the index of dominance:
ICompSimp
= 1 - DSimpson or
IInvSimp = 1 / DSimpson
In practice there's little confusion, as the complement is always
< 1, and the reciprocal is always > 1. I prefer the
reciprocal, as it is analogous to Hill's N2
and has a clear biological meaning. It's worth noting that for
large communities IInvSimp = N2.
N2
calculated for a sample tends to be lower than the true value
for the community, and IInvSimp in actually a
better estimate of N2 for the community, as it
incorporates a small sample correction mechanism. Simpson's
index of evenness is simply: ESimpson
= IInvSimp / S A small, very even sample can result in a value for
ESimpson > 1, ie. community evenness seems to
be
better than perfect! Remember that Simpson's indices aim to be
unbiased estimators of the true values for the community, so
overestimates and underestimates are equally frequent. If the true
evenness is 1, then a value >1 is just as likely as a value <1. Like all evenness measures, ESimpson depends heavily on the value used for S, which is strongly
affected by sampling and bears little resemblance to the true value. See
Magurran (2004, pp114-116)
and Simpson (1949).
Back
to stats page
|