A menagerie of diversity indices

Back to stats page  

Biologists have been extremely creative when it comes to inventing indices of diversity, and a large number have been proposed - if not widely used!

In principle, species diversity is a combination of species richness and species evenness. In practice, people have generally devised what they see as a sensible measure of diversity, and then defined evenness in terms of (their measure of) diversity and species richness.

Which index is 'the best'? There have been attempts to rank the various indices on a range of criteria and see which performs best. See Anne Magurran's book Measuring Biological Diversity Blackwell Publishing, 2004, for more details.

I'd rather ask, "Which are biologically meaningful?" and that comes down to a pretty short list: Hill's diversity numbers, which include Simpson's index (and also a relative of Shannon's index).

If you want to try out some of these indices in R, you will find scripts and supporting material here.


On this page:

The symbols used are as follows:

  • N = the total number of individuals in the sample (N - not italic - is used for Hill's numbers).
  • S = the number of species in the sample.
  • ni = the number of individuals of species i in the sample, Σ ni = N.
  • pi = the proportion of individuals of species i in the sample, pi = ni/N.
  • Iname = name's index of diversity.
  • Dname = name's index of dominance.
  • Ename = name's index of evenness.
  • ln(x) = the natural logarithm (base e) of x

Number of individuals is not the only possible measure of abundance. In many cases, biomass or area covered or basal area or energy production may be more appropriate. In that case, pi = the proportion of biomass (or whatever) of species i in the sample. Those indices which use pi will still work, but those involving ni cannot be used.


Berger-Parker index of dominance:

Berger-Parker's index of dominance is simply the proportion of the most common species in the community or sample:

DBP = pmax

The inverse of the index of dominance is used as an index of diversity and is one of Hill's diversity numbers:

IBP = 1/DBP = Ninf

The index is very easy to calculate.

See Magurran (2004, p.117), and Berger & Parker (1970).


Brillouin index of diversity

This is a bit difficult to calculate, and even more difficult to know what it might mean biologically. It is:

IBrillouin = ( ln(N!) - Σ ln(ni!) ) / N

where N! is N factorial, ie. N × (N-1) × (N-2) × (N-3) × ... × 3 × 2 × 1

Brillouin's index of evenness is based on the diversity index:

EBrillouin = IBrillouin / IBrMax

where IBrMax is the maximum value of IBrillouin, when all species are equally abundant. In principle we can calculate that by replacing the ni's with n = N/S, and Σ ln(ni!) then becomes ln(n!). Unfortunately, factorial only works if n is an integer; if it isn't, it gets more complicated! To begin with, we round n down to a whole number, ie n is the integer part of N/S. The next problem is that the total number of individuals is no longer N, since S × n < N. Put

r = N - S × n

The trick now is to use n + 1 for r species and n for the rest (ie. for S - r species), which adds up to the correct value of N. The formula then becomes:

IBrMax = (  ln(N!) - r ln((n+1)!) - (S - r) ln(n!)  ) / N

Brillouin's indices are not appropriate for estimation of the diversity of a population on the basis of a sample, which is what we are trying to do in most cases! They are appropriate if you have data on the whole population, or if your 'sample' is not a random sample and you cannot make inferences about the population anyway.

The calculation can be problematic even with a computer, as N! becomes huge for quite normal sample sizes, and the computer then chokes on ln(N!). In R, you should use the 'lfactorial' function, which handles the factorial bit internally and returns the logarithm.

See Magurran (2004, p.113).


Fisher's alpha or Log series alpha

Ronald Fisher proposed fitting a 'log series' curve to species abundance. This model predicts that:

  • α x  species will have abundance ni =1, 
  • α x2 / 2  species will have abundance ni =2, 
  • α x3 / 3  species will have abundance ni =3, 
  • ... and so on.

The value of α (alpha) which gives predicted values closest to the observed values is used as a measure of diversity. Several authors insist that it is a valid measure even if the actual abundance curve is nothing like a log series curve, which is often going to be the case when S is small (less than several hundred).

Note that Fisher's alpha applies to samples, and it's not clear what it might mean when applied to a large community, as it implies that a high proportion of species have only one individual. It can't be used for an infinite community consisting of a finite number of species.

Computation involves finding the value of α which bests fits the data, usually with successive approximations. The 'vegan' package in R includes the function 'fisher.alpha'.

See Magurran (2004, pp 28-32 and 102-103), and Fisher et al (1943)


Hill's diversity numbers

Hill produced a family of diversity numbers, corresponding to the 'effective species richness', in which rare species are given progressively less weight than common species. The general formula is:

The parameter a can take any value, but we are usually only interested in positive values. This formula doesn't work for a = 1, though N1 can still be calculated (see below).

 The main ones of interest are:

  • N0 = species richness (all species, rare or common, count equally),
  • N1 = eH, where H is Shannon's index, - Σ pi ln(pi)
  • N2 = 1/Simpson's index (without the small sample correction),
  • Ninf = 1/Berger-Parker index

The numbers go down steadily as a increases: N0 is the biggest, Ninf  the smallest.

Hill also suggested a family of evenness measures:

Ea,b = Na / Nb  where  a > b

He did not favour measures involving species richness, N0, because it is so difficult to estimate - he considers it to be more dependent on sampling than on the actual richness of the community sampled. He preferred E2,1.

Individual Hill numbers are quite easy to calculate on a computer. The 'vegan' package in R includes the function 'renyi', which, with the parameter 'hill = TRUE', will calculate a whole set of Hill numbers in one go.

See Hill (1973)


Margalef's index

Strictly speaking, Margalef's diversity index is not a measure of diversity, as it does not include any component of evenness. It is an attempt to estimate species richness independently of the sample size. The index is:

IMargalef  = (S - 1) / ln(N)

The index will be independent of the number of individuals in the sample only if the relationship between S (or S - 1) and ln(N) is linear. Unfortunately, this is rarely true, as illustrated in the graph, which is based on the results of ant surveys at La Selva in Costa Rica (Longino et al, 2002)

Margalef's index is sometimes touted as a way to compare species richness in two communities where different numbers of individuals have been collected. The proper way to do that is to use rarefaction.

See Magurran (2004,  p 76-77).


Menhinick's index

Menhinick's index is similar to Margalef's index (see above). It is:

IMenhinick = S / N

And it has the same problems: the graph of S versus root N is typically not linear, as indicated by the species accumulation curve for ants in La Selva (Longino et al, 2002).

.See Magurran (2004, p 76-77).


The Q statistic

The Q statistic is a measure of the distribution of species abundances in the inter-quartile interval, excluding the rarest quarter of species and the most abundant quarter of species. This is what we do:

  1. Find Q1 = S / 4 and Q3 = 3S / 4, rounding them up to whole numbers if necessary.
  2. Sort the ni values from smallest to largest, and note the abundance ni of the Q1th and Q3th species, nQ1 and nQ3.
  3. Count the number of species with abundances between  nQ1 and nQ3, ie. nQ1 < ni < nQ3.
  4. Count the number of species with abundances equal to nQ1 or nQ3 and divide by 2.
  5. Add together the results of 3. and 4.
  6. Calculate ln(nQ3) - ln(nQ1)
  7. Divide the result from 5. by the result from 6. And that's it!

I'm not quite sure what all this means in terms of ecology, but there we are.

See Magurran (2004, pp103-106).


Rényi's entropy levels

The easiest way to approach Rényi's entropy levels is to look first at Hill's diversity numbers (although historically it's the other way round, Rényi's work preceded Hill's). Like Hill, Rényi devised a series of measures depending on how much weight is given to rare species, and the relationship is simply:

Ha = ln(Na)

where Na is Hill's diversity number of order a.

Rényi's H1 is Shannon's diversity index.

Although Shannon and Rényi provide simple ways of measuring the 'entropy' or 'information content' of biological communities, the concepts have not been used very much in ecology in the intervening four decades. I tend to agree with Hill that these "apparent lapses into thermodynamics and entropy" are perplexing rather than illuminating.

The 'vegan' package in R includes the function 'renyi', which will calculate a whole set of Rényi's  entropy levels in one go.

See Hill (1973) and Rényi (1961)


Shannon's indices of diversity and evenness

Shannon's diversity index (usually symbolized by H or H') is one of the most well-known and widely-used diversity indices, which is rather surprising considering that it turns out to be poorer than other indices on most criteria. Maybe the people that use it don't read the literature on diversity indices!

It comes out of work on information theory by Wiener, applied to species abundances by Shannon, and hence sometimes referred to as the Shannon-Wiener index. The formula is:

IShannon = H = - Σ pi ln(pi)

Warning: Not everyone uses natural logarithms, ln or loge; Shannon indices based on log2 or log10 are also out there in the literature, so check before trying to interpret the values.

Shannon's index is one of Renyi's family of entropy measures (see above) and Hill's N1 = eH.

Shannon's index of evenness is calculated from the diversity index. The value of H when all species are equally abundant (ie. perfect evenness) is simply ln(S), and Shannon's evenness index is:

EShannon = H / ln(S)

See Magurran (2004, pp106-109).


Simpson's indices of dominance, diversity and evenness

Simpson devised a measure of dominance which could be applied to a large population, and which could be estimated without bias from a sample.

He asked, "If I draw two individuals at random from this community, what is the probability that they will belong to the same species?" This is biologically meaningful, and helps to explain (for example) why wind pollination works in low-diversity forests but not in high-diversity forests (where the probability that pollen released by one individual will land on the flower of an individual of the same species is very low).

If the community is very big, removing one individual will not make any difference to the probability of drawing the same species a second time. The probability of drawing species i the first time is pi and the probability of drawing species i twice is pi2. Add up the values for all species and you get Simpson's index of dominance for a large community (strictly, an infinite community):

DSimpson = Σ pi2

For a small sample, the probability of drawing species i the second time is not the same as the first time, as there are now fewer individuals of species i in the sample. The probability for the first draw is still pi = ni/N, but for the second draw it is (ni-1) / (N-1). So for a finite sample we have:

DSimpson = Σ ni(ni-1) / N(N-1)

Calculating DSimpson this way for a random sample drawn from a community gives you an unbiased estimate of Simpson's index for the community. (Other indices calculated from sample data systematically underestimate the value for the community.)

There are two options for Simpson's index of diversity; we can use either the complement or the inverse of the index of dominance:

ICompSimp = 1 - DSimpson   or   IInvSimp = 1 / DSimpson

In practice there's little confusion, as the complement is always < 1, and the reciprocal is always > 1. I prefer the reciprocal, as it is analogous to Hill's N2 and has a clear biological meaning.

It's worth noting that for large communities IInvSimp = N2. N2 calculated for a sample tends to be lower than the true value for the community, and  IInvSimp in actually a better estimate of N2 for the community, as it incorporates a small sample correction mechanism.

Simpson's index of evenness is simply:

ESimpsonIInvSimp / S

A small, very even sample can result in a value for ESimpson > 1, ie. community evenness seems to be better than perfect! Remember that Simpson's indices aim to be unbiased estimators of the true values for the community, so overestimates and underestimates are equally frequent. If the true evenness is 1, then a value >1 is just as likely as a value <1.

Like all evenness measures, ESimpson depends heavily on the value used for S, which is strongly affected by sampling and bears little resemblance to the true value.

See Magurran (2004, pp114-116) and Simpson (1949).


Back to stats page

wcsmalaysia.org home

Page updated 26 Oct 2007 by Mike Meredith