Extract AIC from FitARMA in R - r

I apply FitARMA function from package FitARMA to certain time serie and I get the following result:
> model <- FitARMA(ts, c(1,0,1))
> model
ARIMA(1,0,1)
length of series = 1593 , number of parameters = 3
loglikelihood = 5113 , aic = -10220 , bic = -10203.9
I want to extract aic to a variable. However there is no aic in model details (screen with model details) neither any information about it in the package documentation.
Is there any possibility to do sth like model_aic <- model$aic since I want to do for loop for different p, q orders of ARMA, therefore I would like to extract aic to a variable instead of typing it from the console manually?

One way is to create a function that computes the AIC of the FitARMA model
library(FitARMA)
model <- FitARMA(AirPassengers, c(1,0,1))
model
ARIMA(1,0,1)
length of series = 144 , number of parameters = 3
loglikelihood = -496.55 , aic = 999.1 , bic = 1008
AICFitARMA <- function(model){
k <- nrow(coef.FitARMA(model))
AIC <- 2 * k - 2 * model$loglikelihood
return(AIC)
}
AICFitARMA(model)
[1] 999.0944

Related

How to extract AIC from polr summary output in R

I'd like to quickly compare AICs that are provided as output when running summary() on individual polr() models created using the MASS package in R. I have no problem compiling this info, but what I can't figure out is where exactly the AIC info is being stored in the polr model objects themselves.
I've tried using str() and attributes() on my model objects, and I've even tried using getAnywhere("polr") to look at the source code itself. Nothing is standing out to me.
Anyone know how to extract AIC output from summary(polr_mod)?
Example for reference:
library(MASS)
dat <- data.frame(v1 = factor(rep(0:2,each=3),ordered = T), v2 = rep(1:3,each=3))
mod_polr <- polr(v1 ~ v2, data = dat, Hess = T, method = "logistic" )
summary(mod_polr)
Call:
polr(formula = v1 ~ v2, data = dat, Hess = T, method = "logistic")
Coefficients:
v2
46.7233
Intercepts:
0|1 1|2
73.62014 117.05781
Residual Deviance: 1.560782e-08
AIC: 6.00
^ See, at the bottom of the output is AIC: 6.00. Where is this stored in the object? I'd like to call/extract it.
AIC is a generic function with no method for objects of class "polr" but with a default method.
The default method's code can be seen by running
getAnywhere("AIC.default")
and what it does is to call logLik on its first argument and then to compute the AIC with k = 2.
The number of model parameters is given by
attr(lls, "df")
And the computation is
lls <- logLik(mod_polr)
-2*as.numeric(lls) + k*attr(lls, "df")
In its turn, logLik is also generic but with a method for objects of class "polr".
The code returned by
getAnywhere("logLik.polr")
is very simple, a one-liner. Note that the df in its code was edf in the object "polr". Indented it's the following.
logLik.polr <- function(object, ...) {
structure(
-0.5 * object$deviance,
df = object$edf,
nobs = object[["nobs"]],
class = "logLik"
)
}
The explanation for the deviance, edf and nobs are found in help("polr), section Value.
deviance
the residual deviance.
edf
the (effective) number of degrees of freedom used by the model
And like that section says, nobs is used for stepAIC (but not for logLik.polr).
So this can all be pieced together in one function.
Write a AIC method for objects of class "polr" simplifying the code above.
AIC.polr <- function(x, k = 2){
dev <- x$deviance
nparams <- x$edf
dev + k*nparams
}
AIC(mod_polr)
#[1] 6
I know that tihs is an old question ut I just want to share in case others have the same question and want another option.
Using modelsummary() to create your regression tabel then you also get AIC, BIC and RMSE. If you need example let me know.

How to get the same values for AIC and BIC in R as in Stata?

Say I have a very simple model
library(foreign)
smoke <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta")
smoking.reg <- lm(cigs ~ educ, data=smoke)
AIC(smoking.reg)
BIC(smoking.reg)
In R I get the following results:
> AIC(smoking.reg)
[1] 6520.26
> BIC(smoking.reg)
[1] 6534.34
Running the same regression however in Stata
use http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta
reg cigs educ
returns the following result
estat ic
How can I get R to return exactly the same values as does Stata for AIC and BIC?
AIC is calculated as -2*log likelihood + 2* number of parameters
BIC is calculated as -2*log likelihood + log(n)* number of parameters, where n is the sample size.
Your linear regression has three parameters - two coefficients and the variance -- and so you can calculate AIC and BIC as
ll = logLik(smoking.reg)
aic = -2*ll + 2* 3 # 6520.26
bic = -2*ll + log(nrow(smoke))* 3 # 6534.34
(As Ben Bolker mentioned in the comments the logLik object has several attributes which you can use to get the number of parameters ("df") and the number of observations ("nobs"). See attr(ll, "df") and attr(ll, "nobs") )
Stata does not include the variance parameter, only including the number of coefficients. This usually would not be a problem as information criteria are usually used to compare models (AIC_of_model1 - AIC_of_model2) and so if this parameter is omitted in both calculations it will make no difference. In Stata the calculation is
aic = -2*ll + 2* 2 # 6518.26
bic = -2*ll + log(nrow(smoke))* 2 # 6527.647

Using a loop to calculate BIC with high dimensional data in R studio? (my code keeps giving me errors)

I am working with a large high dimensional data set (So P>N). I am attempting to use BIC for model selection. Here is what I am doing in R studio:
X is my predictor matrix and Y is my outcome vector.
fit <- glmnet(X,Y,alpha=1) #finding LASSO, find 100 lambda's
models <- list()
for(i in 1:100) {
models[[i]] = fit
}
BIC(models)
This results in an error which states "Error in UseMethod("logLik") : no applicable method for 'loglik' applied to an object of class "list""
I also attempt to compute BIC while in the loop as follows:
for (i in 1:100){
BIC(models[i])
}
Which gives me the same error.
You can check out this answer on how to calculate BIC, so we first create a function that calculates it based on the log likelihood.
minustwologLik = function(fit){
n=length(fit$residuals)
deviance = sum(fit$residuals^2)
n*(log(deviance/n) + 1 + log(2*pi))
}
BIC_manual = function(fit,n,k){
log(n)*k + minustwologLik(fit)
}
We can test it for normal glms:
set.seed(100)
X = matrix(rnorm(5000),ncol=50)
Y = rnorm(100)
lmf = glm(Y~X)
# k is number of predictors + 2 because
# we have an intercept and we estimate the error
BIC_manual(lmf,nrow(X),ncol(X)+2)
[1] 446.8827
BIC(lmf)
[1] 446.8827
Now we have a working BIC function. Suppose we want to test 10 values of lambda
library(glmnet)
lambda = runif(10,min=0,max=0.1)
models <- list()
for(i in 1:10) {
fit = glmnet(X,Y,alpha=1,lambda=lambda[i])
# we store the residuals
fit$residuals = predict(fit,X)-Y
models[[i]] = fit
}
To calculate BIC, we do:
results=sapply(1:length(models),function(i){
BIC_manual(models[[i]],nrow(X),models[[i]]$df+4)
})
Here I think the k is number of non zero coefficents + 4 , because you have intercept, alpha,beta and error. It does not quite matter if you compare between models, as this will be constant between models. What matters is to get the number of nonzero coefficients.
For MSE, it is the deviance we were calculating. If you have glmnet, you can do for example
deviance.glmnet(models[[1]])
We collect the MSE in a similar way:
MSE=sapply(models,deviance.glmnet)
So the results:
lambda BIC MSE
1 0.003344587 447.2294 46.75395
2 0.028688622 408.8085 55.32970
3 0.056700061 370.3395 65.44696
4 0.078313596 362.2118 72.54230
5 0.090647978 359.2993 77.25786
6 0.062240390 359.1432 67.18382
7 0.077180937 361.6381 72.12735
8 0.044374976 382.2904 61.34681
9 0.088453015 358.1662 76.38739
10 0.069375991 357.8248 69.42866

How to get AIC from lm_robust object

How do I get an AIC from an lm_robust object (package estimatr)? I'm using lm_robust because I want to use a robust estimator for calculating the SE. Unlike the lm function, AIC is not provided when you run the summary function and running the AIC function on a lm_robust object produces an error. Below is a toy example of the kind of model I'm trying to run.
library(estimatr)
fake_data<-data.frame(outcome=rnorm(100,3.65,1),
pred1=rnorm(100,15,7),
pred2=as.factor(sample(1:5, 100, replace = T)))
mod1<-lm_robust(outcome~pred1+pred2,data=fake_data)
AIC(mod1)
here is what the error message looks like:
> AIC(mod1)
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "lm_robust"
If you have to do it with lm_robust, you may choose to calculate it by yourself as below,
The formula of AIC,
AIC = 2*k + n [Ln( 2(pi) RSS/n ) + 1]
# n : Number of observation
# k : All variables including all distinct factors and constant
# RSS : Residual Sum of Square
If we apply it to R for your case,
# Note that, I take k=7 since you have, 5 factors + 1 continuous and 1 constant
AIC_calculated <- 2*7 + 100* (log( 2*pi* (1-mod1$r.squared)*mod1$tss/100 ) + 1)
[1] 332.2865
which is same with both lm and glm outputs.
mod2<-lm(outcome~pred1+pred2,data=fake_data)
> AIC(mod2)
[1] 332.2865
And finally, of course, you can put this calculation into a function to call whenever you want by just giving lm_robust model inside it without having to set the N and k parameters for any given data like,
myAIC <- function(data) {
2*(data$k+1) + data$N * (log(2*pi* (1-data$r.squared)*data$tss/data$N ) + 1)
}
> myAIC(mod1)
[1] 332.2865
Note: Results may be shown different in your computer because of the seeding differences when running the sample() function in dataframe.
Here's a workaround
mod1 = lm_robust(outcome ~ pred1 + pred2, data = fake_data)
#Create any fitted model using 'lm' as a placeholder
mod2 = with(list(x = rnorm(10), y = rnorm(10)), lm(y ~ x))
#Copy values in `mod2` from `mod1`
mod2[names(mod2)] = mod1[names(mod2)]
#Calculate residuals in `mod2`
mod2$residuals = mod2$fitted.values - fake_data$outcome
AIC(mod2)
#[1] 326.6092

Creating R Squared function for CPLM package

For my graduate research I'm using the CPLM package (specifically the cpglmm function) to account for zero-inflated data (Tweedie compound Poisson distribution) in a data set looking at the effects of logging on breeding bird densities. This isn't a widely used package like lme4, nlme, etc. Therefore, the model validation methods that can be used on these more commonly used packages cannot be used on cpglmm.
I'm currently at the stage of describing the fit of my models and am trying to calculate R-squared values, both marginal and conditional. Unfortunately I cannot use the r2glmm package or MuMln to calculate R-squared values because they do not support cpglmm. Therefore, I've had to calculate those values manually through an example found here (example found in Appendix 6 under cpglmm parasite models, pg. 33). Here's the script from that example:
# Fit null model without fixed effects (but including all random effects)
parmodCPr <- cpglmm(Parasite ~ 1 + (1 | Population) + (1 | Container), data = DataAll)
# Fit alternative model including fixed and all random effects
parmodCPf <- cpglmm(Parasite ~ Sex + Treatment + Habitat + (1 | Population) +
(1 | Container), data = DataAll)
# Calculation of the variance in fitted values
VarF <- var(as.vector(model.matrix(parmodCPf) %*% fixef(parmodCPf)))
# getting the observation-level variance Null model
phiN <- parmodCPr#phi # the dispersion parameter
pN <- parmodCPr#p # the index parameter
mu <- exp(fixef(parmodCPr) + 0.5 * (VarCorr(parmodCPr)$Population[1] + VarCorr(parmodCPr)$Container[1]))
VarOdN <- phiN * mu^(pN - 2) # the delta method
# Full model
phiF <- parmodCPf#phi # the dispersion parameter
pF <- parmodCPf#p # the index parameter
VarOdF <- phiF * mu^(pF - 2) # the delta method
# R2[GLMM(m)] - marginal R2[GLMM]; using the delta method observation-level variance
R2glmmM <- VarF/(VarF + sum(as.numeric(VarCorr(parmodCPf))) + VarOdF)
# R2[GLMM(c)] - conditional R2[GLMM] for full model
R2glmmC <- (VarF + sum(as.numeric(VarCorr(parmodCPf))))/(VarF + sum(as.numeric(VarCorr(parmodCPf))) +
VarOdF)
What I would like to be able to do is write a function in R using this code outputting both the marginal and conditional R-squared values (RglmmM and RglmmC) with my models as the input. I'd greatly appreciate any help with this problem. Hopefully I have supplied enough information.
Thanks.
Believe I figured it out. Here's an example I wrote up:
R2glmm <- function(model){
# Calculation of the variance in fitted values
VarALT <- var(as.vector(model.matrix(model) %*% fixef(model)))
# getting the observation-level variance Null model
phiNULL <- NULLmodel$phi # the dispersion parameter
pNULL <- NULLmodel$p # the index parameter
mu <- exp(fixef(NULLmodel) + 0.5 * (VarCorr(NULLmodel)$YEAR[1]))
VarOdNULL <- phiNULL * mu^(pNULL - 2) # the delta method
# Alternate model
phiALT <- model$phi # the dispersion parameter
pALT <- model$p # the index parameter
VarOdALT <- phiALT * mu^(pALT - 2) # the delta method
# R2[GLMM(m)] - marginal R2[GLMM]; using the delta method observation-level variance
R2glmmM <- VarALT/(VarALT + sum(as.numeric(VarCorr(model))) + VarOdALT)
# R2[GLMM(c)] - conditional R2[GLMM] for full model
R2glmmC <- (VarALT + sum(as.numeric(VarCorr(model))))/(VarALT + sum(as.numeric(VarCorr(model))) + VarOdALT)
return(c(R2glmmM, R2glmmC))
}
Variables containing ALT refers to the alternate model. "model" represents any cpglmm model you need to run through the function.
Hope this helps someone out. Been working on this problem and other related ones for ages now.

Resources