Bayes in Brief
   stats main page

 

On this page:

Objectives

Bayesian methods are increasingly used in the analysis of wildlife data. In general, they provide results which are more easily applied to management decisions and avoid some of the paradoxical features of the frequentist approach which dominated statistics during the 20th century.

Here we provide an introduction to the Bayesian approach and link to other pages with examples of simple analyses.


Two questions

Look at the following two questions and see what answers might be plausible:

  1. What is the probability that there are clouds in the sky if you already know that it is raining?

prob(clouds | rain) = ?

  1. What is the probability that it is raining if there are clouds in the sky?

prob(rain | clouds) = ?

(The vertical bar, |, can be read as "given", eg. "clouds given rain".)

The questions, and the answers, are very different: prob(clouds | rain) = 1, while prob(rain | clouds) is much less, maybe only 0.2.

A medical example

John Doe has just been tested for a disease, D, and the result was positive. The sensitivity of the test is 95%, ie. 95% of people with D test positive; so the probability that a randomly-selected person with the disease testing positive is:

 prob(+ | D) = 0.95.

So we can be 95% sure that John has D, right?

Well, we know that prob(+ | D) = 0.95. What we want is prob(D | +), the probability of having D given that the test results was positive, which is not the same thing!

We need two additional pieces of information:

  • The specificity of the test is 97%, ie. 97% of people without D test negative (and 3% test positive, known as false positives).
  • The prevalence of the disease, the probability that a person taken at random from the population has the disease: suppose this is 1%, so prob(D) = 0.01

The diagram below shows what we would expect in a hypothetical population of 10,000:

We would expect 297 people without D to test positive, compared with only 95 with D. So only 24% of people who test positive actually have D. John is probably okay!

Bayes' Rule

To put this mathematically:

This is Bayes' Rule (sometimes called Bayes' Law or Bayes' Theorem), first written about by Thomas Bayes and published in 1763.

A bit of terminology:

  • prob(D) is called the prior probability, the probability that John has D before we get the test results (prior = before), this is 1% in our example;
  • prob(+ | D) is the likelihood of getting a positive result if John has D; this is 95%;
  • prob(D | +) is the posterior probability, the probability that John has D after we see the test results (posterior = after), this is 24%.

In words: posterior probability is proportional to prior probability multiplies by likelihood.

To actually do the calculation, we need to turn the relationship into an equation using a constant, C:

To calculate C, we use the fact that John either has D or does not have D, so:

For the John Doe example, we have:

C = 0.95 x 0.01 + 0.03 x 0.99 = 0.0392, so

prob(D|+) = 0.95 x 0.01 / 0.0392 = 0.2423

Note that this works just as well if there are more than two possibilities: we just fix C so that all the probabilities add up to 1.


Probability

In the last two sections we've used "probability" in a way which many statisticians would object to! Let's look at various meanings for the term.

Mechanical (or classical) interpretation: This applies to events such as the toss of a coin or the roll of a die, where all possible outcomes are known and all are equally probable. So for a (fair!) six-sided die, the probability of a four (say) is 1/6. If the die is biased, so that all outcomes are not equally probable, this won't work. Apart from games, this interpretation is not very useful.

Frequentist interpretation: Relative frequency when a trial (or an experiment) is repeated many times. So to find the probability of a four when you roll a die, you have to roll the die a large number of times; this works for biased dice too. We can also deal with probability of capture, provided we can imagine the same trapping survey being carried out in the same way a huge number of times.

We can talk about "the probability that a person selected at random has D", but not "the probability that John Doe has D", since the idea of relative frequency doesn't apply to a single person. Nor can we talk about the probability that there are fewer than 500 orangutan in Sarawak, that tigers will be extinct in the wild by 2050, or that deploying a further 20 rangers will stop encroachment in Kao Yai National Park. (Note that these are the kind of questions asked by politicians and managers.)

Bayesian (or subjectivist) interpretation: Here 'probability' means 'degree of belief' in a particular situation or outcome. Now we can talk about the probability that John Doe has D, and the kind of management questions mentioned above.

Different people may assign different probabilities to the same situation, depending on their existing information. Bayes' rule provides a rigorous method for updating existing beliefs in the light of new data and people's probabilities will converge if they are presented with the same data. Nevertheless, some feel that these subjective probabilities are incompatible with the needs of science for objectivity.

The objectivity of frequentist methods may be exaggerated. The choice of significance level or the interpretation of a p-value is subjective, even if sanctioned by wide usage. Moreover, the result depends on data which could have been observed but were not, which depends in turn on the design of the study. Rigorous adherence to an arbitrary protocol may not be enough to guarantee objectivity!

Whatever the scientific standing of Bayesian probability statements, it is clear that they provide usable input to real-world management and policy decisions.


Further information

A good introduction to Bayesian statistics, which does not assume too much knowledge of maths or statistics ("Bayes 101"), is Bolstad 2004.

wcsmalaysia.org home

Text by Mike Meredith, updated 24 March 2010