basically we can extract optimum AR order from auto.arima by
> auto.arima(ret.fin.chn,trace=TRUE,allowdrift=TRUE)
ARIMA(2,0,2) with non-zero mean : -14242.19
ARIMA(0,0,0) with non-zero mean : -14239.24
ARIMA(1,0,0) with non-zero mean : -14241.3
ARIMA(0,0,1) with non-zero mean : -14238.16
ARIMA(1,0,2) with non-zero mean : -14237.65
ARIMA(3,0,2) with non-zero mean : -14242.72
ARIMA(3,0,1) with non-zero mean : -14239.52
ARIMA(3,0,3) with non-zero mean : -14242.5
ARIMA(2,0,1) with non-zero mean : -14237.15
ARIMA(4,0,3) with non-zero mean : -14238.06
ARIMA(3,0,2) with zero mean : -14244.39
ARIMA(2,0,2) with zero mean : -14243.98
ARIMA(4,0,2) with zero mean : -14241.45
ARIMA(3,0,1) with zero mean : -14241.23
ARIMA(3,0,3) with zero mean : -14244.04
ARIMA(2,0,1) with zero mean : -14238.78
ARIMA(4,0,3) with zero mean : -14239.73
Best model: ARIMA(3,0,2) with zero mean
Series: ret.fin.chn
ARIMA(3,0,2) with zero mean
Coefficients:
ar1 ar2 ar3 ma1 ma2
0.5497 -0.4887 0.0461 -0.5691 0.4923
s.e. 0.3525 0.1764 0.0232 0.3534 0.1878
sigma^2 estimated as 0.0003277: log likelihood=7127.67
AIC=-14243.35 AICc=-14243.32 BIC=-14207.83
Warning messages:
1: In if (is.constant(x)) { :
the condition has length > 1 and only the first element will be used
2: In if (is.constant(x)) return(d) :
the condition has length > 1 and only the first element will be used
3: In if (is.constant(dx)) { :
the condition has length > 1 and only the first element will be used
now store the result to object a
> a<-auto.arima(ts(ret.fin.chn),trace=TRUE,allowdrift=TRUE)
then
> a$arma[1]
while for optimum MA order by
> a$arma[2]
now look at this part Best model: ARIMA(3,0,2) with zero mean
this is the ARIMA(p,d,q) order
i've known how to extract the AR(p) and MA(q) order but how to extract the Integration(d) order and note in mind that i've tried the ndiffs and sometimes it gives different result than the best model perhaps it's somewhere in $arma[?]???
More generally, the order (d) is the next to last element; the seasonal order (D) is the last. So-
a$arma[length(a$arma)-1] is the order d
a$arma[length(a$arma)] is the seasonal order
As pointed out by Rob Hyndman, one of the authors of the forecast package, in an answer to a similar question on Cross Validated, an easy way to extract the order vector (p,d,q) is to use the forecast::arimaorder function.
In your example, this would work as follows:
arimaorder(a)
The output is a named integer with the values of p, d and q:
p d q
3 0 2
You can see from the help file of arima under Value (auto.arima has the same Value as arima)
arma
A compact form of the specification, as a vector giving the number of AR, MA, seasonal AR and seasonal MA coefficients, plus the period and the number of non-seasonal and seasonal differences.
So value a$arma[6] contains non-seasonal difference and a$arma[7] contains seasonal difference.
I'm really sorry Metrics it seems that your solution isn't quite right
> auto.arima(fin.gre,trace=TRUE,allowdrift=TRUE)$arma
ARIMA(2,2,2) : 26148.84
ARIMA(0,2,0) : 27846.32
ARIMA(1,2,0) : 27209.88
ARIMA(0,2,1) : 26161.36
ARIMA(1,2,2) : 26146.27
ARIMA(1,2,1) : 26144.37
ARIMA(1,2,1) : 26144.37
ARIMA(2,2,1) : 26146.69
Best model: ARIMA(1,2,1)
a<-auto.arima(fin.gre,trace=TRUE,allowdrift=TRUE)
a$arma
[1] 1 1 0 0 1 2 0
while doing str(a) yields
> str(a)
List of 16
$ coef : Named num [1:2] 0.0715 -0.9969
..- attr(*, "names")= chr [1:2] "ar1" "ma1"
$ sigma2 : num 795
$ var.coef : num [1:2, 1:2] 3.65e-04 -3.19e-06 -3.19e-06 3.39e-06
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "ar1" "ma1"
.. ..$ : chr [1:2] "ar1" "ma1"
$ mask : logi [1:2] TRUE TRUE
$ loglik : num -13078
$ aic : num 26162
$ arma : int [1:7] 1 1 0 0 1 2 0
$ residuals: Time-Series [1:2750] from 1 to 2750: 0.39 -1.15 -3.64 -4.65 -11.57 ...
$ call : language auto.arima(x = structure(list(x = c(872.5, 880.78, 884.1, 884.1, 874.45, 855.3, 844.81, 837.14, 828.08, 830.74, 835.36, 839.25, 819.54, 802.27, 798.25, 793.01, 816.43, 831.87, ...
$ series : chr "fin.gre"
$ code : int 0
$ n.cond : int 0
$ model :List of 10
..$ phi : num 0.0715
..$ theta: num -0.997
..$ Delta: num [1:2] 2 -1
..$ Z : num [1:4] 1 0 2 -1
..$ a : num [1:4] 1.01 -1.72 62.87 62.78
..$ P : num [1:4, 1:4] -2.22e-16 2.21e-16 1.74e-16 4.62e-17 2.21e-16 ...
..$ T : num [1:4, 1:4] 0.0715 0 1 0 1 ...
..$ V : num [1:4, 1:4] 1 -0.997 0 0 -0.997 ...
..$ h : num 0
..$ Pn : num [1:4, 1:4] 1.00 -9.97e-01 9.51e-17 1.71e-16 -9.97e-01 ...
$ bic : num 26180
$ aicc : num 26162
$ x :An ‘xts’ object on 2003-01-01/2013-07-16 containing:
Data: num [1:2750, 1] 872 881 884 884 874 ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
- attr(*, "class")= chr "Arima"
as you can see the $model[3] contain two number from $arma[5] and $arma[6]it seems that $arma[5] represent integration order d, but im not really sure about it
Related
I have undertaken ARIMA modelling using the auto.arima function for 91 models. The outputs are sitting in a list of lists.
The structure of the outputs for one model looks like the following:
List of 19
$ coef : Named num [1:8] -3.17e-01 -3.78e-01 -8.02e-01 -5.39e+04 -1.33e+05 ...
..- attr(*, "names")= chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ sigma2 : num 6.37e+10
$ var.coef : num [1:8, 1:8] 1.84e-02 8.90e-03 -7.69e-03 -8.80e+02 2.83e+03 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ mask : logi [1:8] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ loglik : num -1189
$ aic : num 2395
$ arma : int [1:7] 2 1 0 0 1 1 0
$ residuals: Time-Series [1:87] from 1 to 87: 1810 -59503 263294 240970 94842 ...
$ call : language auto.arima(y = x[, 2], stepwise = FALSE, approximation = FALSE, xreg = x[, 3:ncol(x)], x = list(x = c(1856264.57,| __truncated__ ...
$ series : chr "x[, 2]"
$ code : int 0
$ n.cond : int 0
$ nobs : int 86
$ model :List of 10
..$ phi : num [1:2] -0.317 -0.378
..$ theta: num -0.802
..$ Delta: num 1
..$ Z : num [1:3] 1 0 1
..$ a : num [1:3] -599787 284456 1887763
..$ P : num [1:3, 1:3] 0.00 0.00 -4.47e-23 0.00 3.33e-16 ...
..$ T : num [1:3, 1:3] -0.317 -0.378 1 1 0 ...
..$ V : num [1:3, 1:3] 1 -0.802 0 -0.802 0.643 ...
..$ h : num 0
..$ Pn : num [1:3, 1:3] 1.00 -8.02e-01 -1.83e-23 -8.02e-01 6.43e-01 ...
$ bic : num 2417
$ aicc : num 2398
$ xreg : Time-Series [1:87, 1:5] from 1 to 87: -0.866 -0.466 -1.383 -0.999 -0.383 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "Price.Diff" "Easter" "Christmas" "High.Week" ...
$ x : Time-Series [1:87] from 1 to 87: 1856265 1393925 2200962 2209996 2161707 ...
$ fitted : Time-Series [1:87] from 1 to 87: 1854455 1453429 1937668 1969026 2066864 ...
- attr(*, "class")= chr [1:3] "ARIMA" "forecast_ARIMA" "Arima"
When printed the output looks as follows:
Series: x[, 2]
Regression with ARIMA(2,1,1) errors
Coefficients:
ar1 ar2 ma1 Price.Diff Easter Christmas High.Week Low.Week
-0.3170 -0.3777 -0.8017 -53931.11 -133187.55 -53541.62 -347146.59 216202.71
s.e. 0.1356 0.1319 0.1069 28195.33 68789.25 23396.62 -74115.78 66881.15
sigma^2 estimated as 6.374e+10: log likelihood=-1188.69
AIC=2395.38 AICc=2397.75 BIC=2417.47
I have written the following to export my models to text file format:
# export model outputs to newly created folder
for(i in 1:length(ts_outputs)){
sink(paste0(names(ts_outputs[i]), ".txt"))
print(ts_outputs[i])
sink()
}
This works, to view the model outputs themselves, however I need to be able to import the model outputs back into R to use them to forecast out my time series' forward.
I am assuming that I need to put them back into the original structure once re-imported.
Is there a certain package that has already been written to do this?
Are text files the way to go for the original exporting?
I believe the following is the source code from the forecast package which writes the outputs (https://rdrr.io/github/ttnsdcn/forecast-package/src/R/arima.R):
if (length(x$coef) > 0) {
cat("\nCoefficients:\n")
coef <- round(x$coef, digits=digits)
if (se && nrow(x$var.coef)) {
ses <- rep(0, length(coef))
ses[x$mask] <- round(sqrt(diag(x$var.coef)), digits=digits)
coef <- matrix(coef, 1, dimnames=list(NULL, names(coef)))
coef <- rbind(coef, s.e.=ses)
}
print.default(coef, print.gap=2)
}
cm <- x$call$method
if (is.null(cm) || cm != "CSS")
{
cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": log likelihood=", format(round(x$loglik, 2)),"\n",sep="")
npar <- length(x$coef) + 1
nstar <- length(x$residuals) - x$arma[6] - x$arma[7]*x$arma[5]
bic <- x$aic + npar*(log(nstar) - 2)
aicc <- x$aic + 2*npar*(nstar/(nstar-npar-1) - 1)
cat("AIC=", format(round(x$aic, 2)), sep="")
cat(" AICc=", format(round(aicc, 2)), sep="")
cat(" BIC=", format(round(bic, 2)), "\n",sep="")
}
else cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": part log likelihood=", format(round(x$loglik, 2)),
"\n", sep="")
invisible(x)
}
Appreciate any direction/advice.
I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"
You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))
I want to retrieve the fitted values from an ar() function output model in R. When using Arima() method, I get them using fitted(model.object) function, but I cannot find its equivalent for ar().
It does not store a fitted vector but does have the residuals. An example of using the residuals from the ar-object to reconstruct the predictions from the original data:
data(WWWusage)
arf <- ar(WWWusage)
str(arf)
#====================
List of 14
$ order : int 3
$ ar : num [1:3] 1.175 -0.0788 -0.1544
$ var.pred : num 117
$ x.mean : num 137
$ aic : Named num [1:21] 258.822 5.787 0.413 0 0.545 ...
..- attr(*, "names")= chr [1:21] "0" "1" "2" "3" ...
$ n.used : int 100
$ order.max : num 20
$ partialacf : num [1:20, 1, 1] 0.9602 -0.2666 -0.1544 -0.1202 -0.0715 ...
$ resid : Time-Series [1:100] from 1 to 100: NA NA NA -2.65 -4.19 ...
$ method : chr "Yule-Walker"
$ series : chr "WWWusage"
$ frequency : num 1
$ call : language ar(x = WWWusage)
$ asy.var.coef: num [1:3, 1:3] 0.01017 -0.01237 0.00271 -0.01237 0.02449 ...
- attr(*, "class")= chr "ar"
#===================
str(WWWusage)
# Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...
png(); plot(WWWusage)
lines(seq(WWWusage),WWWusage - arf$resid, col="red"); dev.off()
The simplest way to get the fits from an AR(p) model would be to use auto.arima() from the forecast package, which does have a fitted() method. If you really want a pure AR model, you can constrain the differencing via the d parameter and the MA order via the max.q parameter.
> library(forecast)
> fitted(auto.arima(WWWusage,d=0,max.q=0))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 91.68778 86.20842 82.13922 87.60576 ...
I am trying to use the penalizedLDA package to run a penalized linear discriminant analysis in order to select the "most meaningful" variables. I have searched here and on other sites for help in accessing the the output from the penalized model to no avail.
My data comprises of 400 varaibles and 44 groups. Code I used and results I got thus far:
yy.m<-as.matrix(yy) #Factors/groups
xx.m<-as.matrix(xx) #Variables
cv.out<-PenalizedLDA.cv(xx.m,yy.m,type="standard")
## aplly the penalty
out <- PenalizedLDA(xx.m,yy.m,lambda=cv.out$bestlambda,K=cv.out$bestK)
Too get the structure of the output from the anaylsis:
> str(out)
List of 10
$ discrim: num [1:401, 1:4] -0.0234 -0.0219 -0.0189 -0.0143 -0.0102 ...
$ xproj : num [1:100, 1:4] -8.31 -14.68 -11.07 -13.46 -26.2 ...
$ K : int 4
$ crits :List of 4
..$ : num [1:4] 2827 2827 2827 2827
..$ : num [1:4] 914 914 914 914
..$ : num [1:4] 162 162 162 162
..$ : num [1:4] 48.6 48.6 48.6 48.6
$ type : chr "standard"
$ lambda : num 0
$ lambda2: NULL
$ wcsd.x : Named num [1:401] 0.0379 0.0335 0.0292 0.0261 0.0217 ...
..- attr(*, "names")= chr [1:401] "R400" "R405" "R410" "R415" ...
$ x : num [1:100, 1:401] 0.147 0.144 0.145 0.141 0.129 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:401] "R400" "R405" "R410" "R415" ...
$ y : num [1:100, 1] 2 2 2 2 2 1 1 1 1 1 ...
- attr(*, "class")= chr "penlda"
I am interested in obtaining a list or matrix of the top 20 variables for feature selection, more than likely based on the coefficients of the Linear discrimination.
I realized I would have to sort the coefficients in descending order, and get the variable names matched to it. So the output I would expect is something like this imaginary example
V1 V2
R400 0.34
R1535 0.22...
Can anyone provide any pointers (not necessarily the R code). Thanks in advance.
Your out$K is 4, and that means you have 4 discriminant vectors. If you want the top 20 variables according to, say, the 2nd vector, try this:
# get the data frame of variable names and coefficients
var.coef = data.frame(colnames(xx.m), out$discrim[,2])
# sort the 2nd column (the coefficients) in decreasing order, and only keep the top 20
var.coef.top = var.coef[order(var.coef[,2], decreasing = TRUE)[1:20], ]
var.coef.top is what you want.
I'm trying to extract the lmg information from the results i'm getting from calc.relimp found in the relaimpo package.
when i view my results i see
Response variable: DS[, 2]
Total response variance: 107.5848
Analysis based on 21985 observations
3 Regressors:
DS[, 33] DS[, 18] DS[, 23]
Proportion of variance explained by model: 1.39%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg
DS[, 33] 0.007041436
DS[, 18] 0.001038892
DS[, 23] 0.005823708
Average coefficients for different model sizes:
1X 2Xs 3Xs
DS[, 33] -1.9229313 -2.3138967 -2.4784731
DS[, 18] -0.9155606 -0.8011497 -0.6107294
DS[, 23] 1.3592192 2.0488534 2.3525688
i would ideally like to extract 33 0.00704, 18 0.00103, 23 0.00582 so i can run more analysis on the lmg values.
Thank you for your help!
relaimpo also caters for users who are used to lists and the $ extractor, i.e. the following would also work:
library(relaimpo)
ll <- calc.relimp(swiss)
ll$lmg ## instead of ll#lmg
You can see the structure of your object calculated with calc.relimp() function using function list().
lmg values are stored in list element with the same name and can selected as object#lmg.
Here is example using data from this package
library(relaimpo)
data(swiss)
ll<-calc.relimp(swiss)
str(ll)
Formal class 'relimplm' [package "relaimpo"] with 36 slots
..# var.y : num 156
..# R2 : num 0.707
..# R2.decomp : num 0.707
..# lmg : Named num [1:5] 0.0571 0.1712 0.2601 0.1056 0.1128
.. ..- attr(*, "names")= chr [1:5] "Agriculture" "Examination" "Education" "Catholic" ...
.....
..# car.diff : num(0)
..# namen : chr [1:6] "Fertility" "Agriculture" "Examination" "Education" ...
..# nobs : int 47
..# ave.coeffs : num [1:5, 1:5] 0.194 -1.011 -0.862 0.139 1.786 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:5] "Agriculture" "Examination" "Education" "Catholic" ...
.. .. ..$ : chr [1:5] "1X" "2Xs" "3Xs" "4Xs" ...
....
ll#lmg
Agriculture Examination Education Catholic Infant.Mortality
0.05709122 0.17117303 0.26013468 0.10557015 0.11276592