Model seçiminde AIC kuralları


32

BIC'yi genellikle benim anlayışım olarak, para cezasına AIC'den daha güçlü bir şekilde değer verdiği için kullanıyorum. Ancak şimdi daha kapsamlı bir yaklaşım kullanmaya karar verdim ve AIC'yi de kullanmak istiyorum. Raftery'nin (1995) BIC farklılıkları için güzel kurallar sunduğunu biliyorum: 0-2 zayıf, 2-4 bir modelin daha iyi olduğunun kanıtıdır.

Ders kitaplarına baktım ve AIC'de garip görünüyorlar (daha büyük bir farkın zayıf olduğu ve AIC'deki daha küçük bir farkın bir modelin daha iyi olduğu anlamına geliyor gibi görünüyor). Bu bana öğretildiğimi bildiğim şeylere aykırı. Anladığım kadarıyla düşük AIC istiyorsun.

Raftery'nin kılavuz ilkelerinin AIC'ye kadar uzanıp uzatmadığını veya bir modelden diğerine "kanıt gücü" için bazı kılavuzlardan alıntı yapabileceğimi bilen var mı?

And yes, cutoffs are not great (I kind of find them irritating) but they are helpful when comparing different kinds of evidence.


1
Is this (pdf), the Raftery paper you are referring to?
gung - Reinstate Monica

4
Readers here may be interested to read the following excellent CV thread: Is there any reason to prefer the AIC or BIC over the other?
gung - Reinstate Monica

1
Which textbooks are you referring to when you say "I looked in textbooks and they seem strange on AIC (it looks like a larger difference is weak and a smaller difference in AIC means one model is better)" --- and what do they actually say?
Glen_b -Reinstate Monica

1
Your second para is unclear. You probably mean this: While large differences suggests that the model with the smaller values are preferable, smaller differences are difficult to evaluate. Moreover, statisticians are yet to agree on what differences are 'small' or 'large' - Singer and Willet (2003, p.122)
Hibernating

1
As to your third para, if you want to adopt the categories of evidential strength advanced by Jeffreys (1961, p. 432) I can give you the full reference.
Hibernating

Yanıtlar:


23

AIC and BIC hold the same interpretation in terms of model comparison. That is, the larger difference in either AIC or BIC indicates stronger evidence for one model over the other (the lower the better). It's just the the AIC doesn't penalize the number of parameters as strongly as BIC. There is also a correction to the AIC (the AICc) that is used for smaller sample sizes. More information on the comparison of AIC/BIC can be found here.


5
+1. Just to add/clarify: AIC (and AICc) employs KL-divergence. Therefore, exactly because AIC reflects "additional" information the smaller it is the better. In other words as our sample size N, the model with the minimum AIC score will possess the smallest Kullback-Leibler divergence and will therefore be the model closest to the "true" model.
usεr11852 says Reinstate Monic

28

You are talking about two different things and you are mixing them up. In the first case you have two models (1 and 2) and you obtained their AIC like AIC1 and AIC2. IF you want to compare these two models based on their AIC's, then model with lower AIC would be the preferred one i.e. if AIC1<AIC2 then you pick up model 1 and vise versa.
In the 2nd case, you have a set of candidate models like models (1,2,...,n) and for each model you calculate AIC differences as Δi=AICiAICmin, where AICi is the AIC for the ith model and AICmin is the minimum of AIC among all the models. Now the model with Δi>10 have no support and can be ommited from further consideration as explained in Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach by Kenneth P. Burnham, David R. Anderson, page 71. So the larger is the Δi, the weaker would be your model. Here the best model has ΔiΔmin0.


1
Aha! This totally cleared up the "larger than" bit. Thanks!
Tom Carpenter

7

I generally never use AIC or BIC objectively to describe adequate fit for a model. I do use these ICs to compare the relative fit of two predictive models. As far as whether an AIC of "2" or "4" is concerned, it's completely contextual. If you want to get a sense of how a "good" model fits, you can (should) always use a simulation. Your understanding of the AIC is right. AIC receives a positive contribution from the parameters and a negative contribution from the likelihood. What you're trying to do is maximize the likelihood without loading up your model with a bunch of parameters. So, my bubble bursting opinion is that cut offs for AIC are no good out of context.


What if your models doesn't allow any simulation?
Stat

6
Tut-tut! How is that even possible? One can bootstrap the world.
AdamO

God luck with that ... simulate the world lol
Stat

2
@Stat I'm very serious when I say that I can't conceive of a situation in which it would be impossible to simulate data from a model. At the very least, bootstrapping from the training dataset qualifies as a valid simulation approach.
AdamO

When bootstrapping is hard cross validation or even simple jackknifing should work. Also, model averaging provides a means for reconciling information from models with similar AICs.
N Brouwer

2

Here is a related question when-is-it-appropriate-to-select-models-by-minimising-the-aic?. It gives you a general idea of what people not unrecognizable in academic world consider appropriate to write and what references to leave in as important.

Generally, it is the differences between the likelihoods or AICs that matter, not their absolute values. You have missed the important word "difference" in your "BIC: 0-2 is weak" in the question - check Raftery's TABLE 6 - and it's strange that no-one wants to correct that.

I myself have been taught to look for MAICE (Minimum AIC Estimate - as Akaike called it). So what? Here is what one famous person wrote to an unknown lady:

Dear Miss -- 
I have read about sixteen pages of your manuscript ... I suffered exactly the same 
treatment at the hands of my teachers who disliked me for my independence and passed 
over me when they wanted assistants ... keep your manuscript for your sons and
daughters, in order that they may derive consolation from it and not give a damn for
what their teachers tell them or think of them. ... There is too much education
altogether.

My teachers never heard of papers with titles like "A test whether two AIC's differ significantly" and I can't even remember they ever called AIC a statistic, that would have a sampling distribution and other properties. I was taught AIC is a criterion to be minimized, if possible in some automatic fashion.

Yet another important issue, which I think have been expressed here a few years ago by IrishStat (from memory so apologies if I am wrong as I failed to find that answer) is that AIC, BIC and other criteria have been derived for different purposes and under different conditions (assumptions) so you often can't use them interchangeably if your purpose is forecasting, say. You can't just prefer something inappropriate.

My sources show that I used a quote to Burnham and Anderson (2002, p.70) to write that delta (AIC differences) within 0-2 has a substantial support; delta within 4-7 considerably less support and delta greater than 10 essentially no support. Also, I wrote that "the authors also discussed conditions under which these guidelines may be useful". The book is cited in the answer by Stat, which I upvoted as most relevant.


0

With regard to information criteria, here is what SAS says:

"Note that information criteria such as Akaike's (AIC), Schwarz's (SC, BIC), and QIC can be used to compare competing nonnested models, but do not provide a test of the comparison. Consequently, they cannot indicate whether one model is significantly better than another. The GENMOD, LOGISTIC, GLIMMIX, MIXED, and other procedures provide information criteria measures."

There are two comparative model testing procedure: a) Vuong test and b) non-parametric Clarke test. See this paper for details.


I find the mathematical notation employed in the cited "paper" (i.e. presentation) non-comprehensible without comments. In particular, what does the line of dashes symbolize? Implication?
Adam Ryczkowski
Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.