![]() ![]() ![]() |
Frogs 3 : AIC and model selection | |||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
ObjectivesIn "Frogs 1 : binomial distribution" we simulated data for detections of frogs at ponds where we knew frogs were present. Then, in "Frogs 2 : likelihood", we used those data to introduce the concept of likelihood and maximum likelihood estimation. Here we will simulate another set of detection data to represent the results of a second visit to the same ponds. If the results of the two visits are different, there are two possibilities:
We will formulate these as two different models and use likelihood and AIC (Akaike's Information Criterion) to judge which is the better model. We will also see how uncertainty about the right model can be allowed for through model averaging. Data simulation and the likelihood calculations are in the Excel file "Frogs3_models.xls". Simulating the second set of dataThe first set of data was generated by rolling a 10-sided die with spots on some of the sides. If the die landed with a spot on top, the frogs were detected. The first time, I recorded x1 = 6 detections out of 10. The second time, we used a different die, so the actual number of spots could be the same or different. I rolled the die 10 times and got the following results (1=detected, 0=not detected):
for a total of x2 = 3 detections out of n = 10. The rest of the calculations on this page use these simulated data, x1 = 6 and x2 = 3. If you have different simulated observations, you will get different results all the way though! Two different modelsWe now consider two possible models:
We'll calculate the likelihood for each of these models. Model 1To find the likelihood of p(detect) with x1 = 6 and x2 = 3, we multiply together the individual likelihoods:
We use the binomial distribution to get the values for the likelihoods, which are calculated in the "Likelihoods" tab of the spreadsheet "Frogs3_models.xls". For my data, I look along the p=0.45 row and note the values in the columns for x = 6 and x = 3:
Model 2This model has two different values of p(detect), one for the first occasion (p1) and another for the second occasion (p2). My estimates were
Referring to the likelihoods calculated in the spreadsheet, I get:
Using log-likelihoodYou will see that the likelihood gets smaller and smaller each time we add more data. With lots of data, the likelihood is extremely small. For example, the likelihood for the psi(.) p(.) model for the golden cats analysis in PRESENCE is: 0.000...(70 zeros in here!)...00000000002185 So we prefer to work with the (natural) log of the likelihood, which in this case is: -201.86 Using logs makes no difference to the process of maximum likelihood estimation, as the maximum of the log likelihood always corresponds to the maximum likelihood. The log likelihoods for the two models using my data are:
You need to be careful when comparing negative numbers: the log likelihood for Model 2 is bigger than the log likelihood for Model 1. Comparing the modelsComparing the two models applied to my simulated data, we see that the likelihood for model 2 is higher than for model 1 (0.067 vs 0.027), which means that it is a closer fit to the data we collected. But model 2 is also more complex, having two parameters (p1 and p2) instead of just one. The model which is the best fit to the data is not necessarily the best representation of reality. To see the difference, let’s look at a different situation. The
graph shows the numbers for a hypothetical population of
elephants for eight years. The crosses indicate the true
population, which has an underlying upward trend. The black
dots are the estimates from annual surveys of elephants. The
lines represent two models calculated from the survey data.
The straight line is a simple model with just 2 parameters;
the wiggly line has 8 parameters and fits the data
perfectly. In fact, the wiggliness of the wiggly line is due
mainly to errors in estimating the population, and the
simpler model is closer to the true situation. The wiggly
model is described as “over-fitted” or “over-parameterized”.
In reality of course we don’t know the true population, all we have are estimates which are never quite exact – we see the dots but not the crosses in the diagram. How do we decide which is the best model? Akaike’s Information Criterion (AIC) balances number of parameters and fit to the data (likelihood): AIC = 2 * No. of Parameters – 2 * log(Likelihood) and a small value for AIC indicates a better combination of simplicity and fit to data. The table below, which was calculated in the "Model comparison" tab of the spreadsheet "Frogs3_models.xls", shows the calculation for AIC for the two models for my data:
Model 1 has a smaller AIC value, and is the better model by this criterion. Again, note that if you are using your own simulated data, you will get different results, and you may find that Model 2 is better than Model 1. Also note that you can't compare AIC values for different data sets: the data used to calculate AIC must be the same. Going further with AICThe AIC values enable us to answer more questions about the models. Model 1 is better then Model 2, but is it a lot better, or only a little better? The difference in AIC ("ΔAIC" or "deltaAIC") is only 0.182, which means that Model 1 is only a little better: Burnham and Anderson (2002, page 70) suggest that models with deltaAIC < 2 are all plausible, 4-7 are considerably less so, and >10 means that the models miss some important explanatory variables. We can also ask, "How likely is it that Model 2 is in fact the correct model?" We can only calculate the relative likelihoods of the models, but if we take the best model as having Model Likelihood = 1, then Model Likelihood = e-deltaAIC/2 For Model 2, that works out to 0.913 (see table below), so Model 2 is almost as likely as Model 1 to be the right one.
If we were quite sure that Model 1 was right, we’d be
confident in our estimates of detection probabilities,
The solution is 'model averaging': we take a weighted average of the estimates from the different models, using weights proportional to the Model Likelihood. As shown in the last column of the table above, AIC weights are simply the Model Likelihood divided by the total of all the Model Likelihoods. For our example, this gives:
AIC is a powerful way of working with models and is used by PRESENCE, DISTANCE and MARK. It can be used to compare models which are completely different in form. But you must use the same data set for all the comparisons. For more on modeling and model fitting with AIC, see Burnham and Anderson (2002). Main points
| ||||||||||||||||||||||||||||||||||||||||||||||||||
| Text by Mike Meredith, updated 11 May 2009 | ||||||||||||||||||||||||||||||||||||||||||||||||||
![]() ![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||