R Importing ARIMA model outputs to use in forecast - r

I have undertaken ARIMA modelling using the auto.arima function for 91 models. The outputs are sitting in a list of lists.
The structure of the outputs for one model looks like the following:
List of 19
$ coef : Named num [1:8] -3.17e-01 -3.78e-01 -8.02e-01 -5.39e+04 -1.33e+05 ...
..- attr(*, "names")= chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ sigma2 : num 6.37e+10
$ var.coef : num [1:8, 1:8] 1.84e-02 8.90e-03 -7.69e-03 -8.80e+02 2.83e+03 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ mask : logi [1:8] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ loglik : num -1189
$ aic : num 2395
$ arma : int [1:7] 2 1 0 0 1 1 0
$ residuals: Time-Series [1:87] from 1 to 87: 1810 -59503 263294 240970 94842 ...
$ call : language auto.arima(y = x[, 2], stepwise = FALSE, approximation = FALSE, xreg = x[, 3:ncol(x)], x = list(x = c(1856264.57,| __truncated__ ...
$ series : chr "x[, 2]"
$ code : int 0
$ n.cond : int 0
$ nobs : int 86
$ model :List of 10
..$ phi : num [1:2] -0.317 -0.378
..$ theta: num -0.802
..$ Delta: num 1
..$ Z : num [1:3] 1 0 1
..$ a : num [1:3] -599787 284456 1887763
..$ P : num [1:3, 1:3] 0.00 0.00 -4.47e-23 0.00 3.33e-16 ...
..$ T : num [1:3, 1:3] -0.317 -0.378 1 1 0 ...
..$ V : num [1:3, 1:3] 1 -0.802 0 -0.802 0.643 ...
..$ h : num 0
..$ Pn : num [1:3, 1:3] 1.00 -8.02e-01 -1.83e-23 -8.02e-01 6.43e-01 ...
$ bic : num 2417
$ aicc : num 2398
$ xreg : Time-Series [1:87, 1:5] from 1 to 87: -0.866 -0.466 -1.383 -0.999 -0.383 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "Price.Diff" "Easter" "Christmas" "High.Week" ...
$ x : Time-Series [1:87] from 1 to 87: 1856265 1393925 2200962 2209996 2161707 ...
$ fitted : Time-Series [1:87] from 1 to 87: 1854455 1453429 1937668 1969026 2066864 ...
- attr(*, "class")= chr [1:3] "ARIMA" "forecast_ARIMA" "Arima"
When printed the output looks as follows:
Series: x[, 2]
Regression with ARIMA(2,1,1) errors
Coefficients:
ar1 ar2 ma1 Price.Diff Easter Christmas High.Week Low.Week
-0.3170 -0.3777 -0.8017 -53931.11 -133187.55 -53541.62 -347146.59 216202.71
s.e. 0.1356 0.1319 0.1069 28195.33 68789.25 23396.62 -74115.78 66881.15
sigma^2 estimated as 6.374e+10: log likelihood=-1188.69
AIC=2395.38 AICc=2397.75 BIC=2417.47
I have written the following to export my models to text file format:
# export model outputs to newly created folder
for(i in 1:length(ts_outputs)){
sink(paste0(names(ts_outputs[i]), ".txt"))
print(ts_outputs[i])
sink()
}
This works, to view the model outputs themselves, however I need to be able to import the model outputs back into R to use them to forecast out my time series' forward.
I am assuming that I need to put them back into the original structure once re-imported.
Is there a certain package that has already been written to do this?
Are text files the way to go for the original exporting?
I believe the following is the source code from the forecast package which writes the outputs (https://rdrr.io/github/ttnsdcn/forecast-package/src/R/arima.R):
if (length(x$coef) > 0) {
cat("\nCoefficients:\n")
coef <- round(x$coef, digits=digits)
if (se && nrow(x$var.coef)) {
ses <- rep(0, length(coef))
ses[x$mask] <- round(sqrt(diag(x$var.coef)), digits=digits)
coef <- matrix(coef, 1, dimnames=list(NULL, names(coef)))
coef <- rbind(coef, s.e.=ses)
}
print.default(coef, print.gap=2)
}
cm <- x$call$method
if (is.null(cm) || cm != "CSS")
{
cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": log likelihood=", format(round(x$loglik, 2)),"\n",sep="")
npar <- length(x$coef) + 1
nstar <- length(x$residuals) - x$arma[6] - x$arma[7]*x$arma[5]
bic <- x$aic + npar*(log(nstar) - 2)
aicc <- x$aic + 2*npar*(nstar/(nstar-npar-1) - 1)
cat("AIC=", format(round(x$aic, 2)), sep="")
cat(" AICc=", format(round(aicc, 2)), sep="")
cat(" BIC=", format(round(bic, 2)), "\n",sep="")
}
else cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": part log likelihood=", format(round(x$loglik, 2)),
"\n", sep="")
invisible(x)
}
Appreciate any direction/advice.

Related

Interpreting the PCA axis Dim1 and Dim2 from CLARA plot results directly

I had a large dataset that contains more than 300,000 rows/observations and 22 variables. I used the CLARA method for the clustering and plotted the results using fviz_cluster. Using the silhouette method, I got 10 as my number of clusters and from there I applied it to my CLARA algorithm.
clara.res <- clara(df, 10, samples = 50,trace = 1,sampsize = 1000, pamLike = TRUE)
str(clara.res)
List of 10
$ sample : chr [1:1000] "100046" "100303" "10052" "100727" ...
$ medoids : num [1:10, 1:22] 0.925 0.125 0.701 0 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:10] "193751" "137853" "229261" "257462" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
$ i.med : int [1:10] 104171 42062 143627 174961 300065 13836 192832 207079 185241 228575
$ clustering: Named int [1:302251] 1 1 1 2 3 4 5 3 3 3 ...
..- attr(*, "names")= chr [1:302251] "1" "10" "100" "1000" ...
$ objective : num 0.37
$ clusinfo : num [1:10, 1:4] 71811 40181 46271 10155 31309 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "size" "max_diss" "av_diss" "isolation"
$ diss : 'dissimilarity' num [1:499500] 1.392 2.192 0.937 2.157 1.643 ...
..- attr(*, "Size")= int 1000
..- attr(*, "Metric")= chr "euclidean"
..- attr(*, "Labels")= chr [1:1000] "100046" "100303" "10052" "100727" ...
$ call : language clara(x = df, k = 10, samples = 50, sampsize = 1000, trace = 1, pamLike = TRUE)
$ silinfo :List of 3
..$ widths : num [1:1000, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:1000] "83395" "181310" "34452" "42991" ...
.. .. ..$ : chr [1:3] "cluster" "neighbor" "sil_width"
..$ clus.avg.widths: num [1:10] 0.645 0.408 0.487 0.513 0.839 ...
..$ avg.width : num 0.612
$ data : num [1:302251, 1:22] 1 1 1 0.366 0.35 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:302251] "1" "10" "100" "1000" ...
.. ..$ : chr [1:22] "COD" "DMW" "HER" "SPR" ...
- attr(*, "class")= chr [1:2] "clara" "partition"
For the plot:
fviz_cluster(clara.res,
palette = c(
"#004c6d",
"#00a1c1",
"#ffc334",
"#78ab63",
"#00ffff",
"#00cfe3",
"#6efa75",
"#cc0089",
"#ff9509",
"#ffb6de"
), # color palette
ellipse.type = "t",geom = "point",show.clust.cent = TRUE,repel = TRUE,pointsize = 0.5,
ggtheme = theme_classic()
)+ xlim(-7, 3) + ylim (-5, 4) + labs(title = "Plot of clusters")
The result:
I reckoned that this cluster plot is based on PCA and have been trying to figure out which variables in my original data were chosen as Dim1 and Dim2 or what these x and y-axis represent. Can somebody help me how to find out these Dim1 and Dim2 and eigenvalues/variance of the whole Dim that exist without running PCA separately?
I saw there are some other functions/packages for PCA such as get_eigenvalue in factoextra and FactomineR, but it seemed that will require me to use the PCA algorithm from the beginning? How can I integrate it directly with my CLARA results?
Also, my Dim1 only consists of 12.3% and Dim2 8.8%, does it mean that these variables are not representative enough or? considering that I would have 22 dimensions in total (from my 22 variables), I think it's alright, no? I am not sure how these percentages of Dim1 and Dim2 affect my cluster results. I was thinking to do the screeplot from my CLARA results but I also can't figure it out.
I'd appreciate any insights.

R - How to extract slope and intercept from lm.fit?

I need a faster way of doing linear regression than the lm() method. I found that lm.fit() is quite a bit faster but I'm wondering how to use the results. For example using this code:
x = 1:5
y = 5:1
regr = lm.fit(as.matrix(x), y)
str(regr)
Outputs:
List of 8
$ coefficients : Named num 0.636
..- attr(*, "names")= chr "x1"
$ residuals : num [1:5] 4.364 2.727 1.091 -0.545 -2.182
$ effects : Named num [1:5] -4.719 1.69 -0.465 -2.619 -4.774
..- attr(*, "names")= chr [1:5] "x1" "" "" "" ...
$ rank : int 1
$ fitted.values: num [1:5] 0.636 1.273 1.909 2.545 3.182
$ assign : NULL
$ qr :List of 5
..$ qr : num [1:5, 1] -7.416 0.27 0.405 0.539 0.674
..$ qraux: num 1.13
..$ pivot: int 1
..$ tol : num 1e-07
..$ rank : int 1
..- attr(*, "class")= chr "qr"
$ df.residual : int 4
I'm expecting intercept = 6 and slope = -1 but the result above doesn't contain anyhing near that. Also, does lm.fit() output r squared?
lm.fit allows to do things much more manually, so, as #MrFlick commented, we must include the intercept manually as well using cbind(1, x) as the design matrix. The R^2 is not provided but we may easily compute it:
x <- 1:5
y <- 5:1 + rnorm(5)
regr <- lm.fit(cbind(1, x), y)
regr$coef
# x
# 5.2044349 -0.5535963
1 - var(regr$residuals) / var(y) # R^2
# [1] 0.3557227
1 - var(regr$residuals) / var(y) * (length(y) - 1) / regr$df.residual # Adj. R^2
# [1] 0.1409636

Extract statistics from Anderson-Darling test (list)

I would like to extract the p-values from the Anderson-Darling test (ad.test from package kSamples). The test result is a list of 12 containing a 2x3 matrix. The p value is part of the 2x3 matrix and is present in element 7.
When using the following code:
lapply(AD_result, "[[", 7)
I get the following subset of AD test results (first 2 of a total of 50 shown)
[[1]]
AD T.AD asympt. P-value
version 1: 1.72 0.94536 0.13169
version 2: 1.51 0.66740 0.17461
[[2]]
AD T.AD asympt. P-value
version 1: 12.299 14.624 6.9248e-07
version 2: 11.900 14.144 1.1146e-06
My question is how to extract only the p-value (e.g. from version 1) and put these 50 results into a vector
The output from str(AD_result) is:
List of 55
$ :List of 12
..$ test.name : chr "Anderson-Darling"
..$ k : int 2
..$ ns : int [1:2] 103 2905
..$ N : int 3008
..$ n.ties : int 2873
..$ sig : num 0.762
..$ ad : num [1:2, 1:3] 1.72 1.51 0.945 0.667 0.132 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "version 1:" "version 2:"
.. .. ..$ : chr [1:3] "AD" "T.AD" " asympt. P-value"
..$ warning : logi FALSE
..$ null.dist1: NULL
..$ null.dist2: NULL
..$ method : chr "asymptotic"
..$ Nsim : num 1
..- attr(*, "class")= chr "kSamples"
You could try:
unlist(lapply(AD_result, function(x) x$ad[,3]))

Using apply over two lists of different lengths

This question is related to my earlier question found here: https://stackoverflow.com/questions/33089532/r-accounting-for-a-factor-with-this-logistic-regression-function-replace-lappl
I realize that I didn't do a good job at asking the first question, so here is a more simple analog with actual data:
My data looks something like this:
#data look like this, but with a variable number of "y" columms
wk<-rep(1:50,2)
X<-rnorm(1:100,1)
y1<-rnorm(1:100,1)
y2<-rnorm(1:100,1)
df1<-as.data.frame(cbind(wk,X,y1,y2))
df1$hyst<-ifelse(df1$wk>=5 & df1$wk<32, "R", "F")
Y<-df1[, -which(colnames(df1) %in% c("wk"))] #this step makes more sense with my actual data since I have a bunch of columns to remove
l1<-length(Y)-1
lst1<-lapply(2:l1,function(x){colnames(Y[x])})
dflst<-c("Y",'Y[Y$hyst=="R",]','Y[Y$hyst=="F",]')
I want to run a model over all Y columns for the full data set (all data) and for two subsets, when the factor hyst=="R" and when hyst=="F".
To do this, I have nested two lapply functions, which sort of works, but I think it essentially doubles my results and is causing me all sorts of list headaches.
Here is the nested lapply code:
lms <- lapply(dflst, function(z){
lapply(lst1, function(y) {
form <- paste0(y, " ~ X")
lm(form, data=eval(parse(text=z)))
})
})
How can I replace or modify the nested lapply function to obtain a model run for each Y column for each data set( all, "R", and "F")?
Construct your DF list like
DFlst <- c(list(full=Y), split(Y, Y$hyst))
str(DFlst)
List of 3
$ full:'data.frame': 100 obs. of 4 variables:
..$ X : num [1:100] 1.792 3.192 0.367 1.632 1.388 ...
..$ y1 : num [1:100] 3.354 1.189 1.99 0.639 0.1 ...
..$ y2 : num [1:100] 0.864 2.415 0.437 1.069 1.368 ...
..$ hyst: chr [1:100] "F" "F" "F" "F" ...
$ F :'data.frame': 46 obs. of 4 variables:
..$ X : num [1:46] 1.792 3.192 0.367 1.632 0.707 ...
..$ y1 : num [1:46] 3.354 1.189 1.99 0.639 0.894 ...
..$ y2 : num [1:46] 0.864 2.415 0.437 1.069 1.213 ...
..$ hyst: chr [1:46] "F" "F" "F" "F" ...
$ R :'data.frame': 54 obs. of 4 variables:
..$ X : num [1:54] 1.388 2.296 0.409 1.494 0.943 ...
..$ y1 : num [1:54] 0.1002 0.6425 -0.0918 1.199 0.8767 ...
..$ y2 : num [1:54] 1.368 1.122 0.402 -0.237 1.518 ...
..$ hyst: chr [1:54] "R" "R" "R" "R" ...
Do some regressions:
res <- lapply(DFlst, function(DF) {
cols = grep("^y[0-9]+$",names(DF),value=TRUE)
lapply(setNames(cols,cols),
function(y) lm(paste(y,"~X"), data=DF))
})
str(res, list.len=2, give.attr=FALSE)
List of 3
$ full:List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.903 0.111
.. ..$ residuals : Named num [1:100] 2.2509 -0.0698 1.046 -0.4464 -0.9578 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.423 -0.166
.. ..$ residuals : Named num [1:100] -0.2623 1.5213 -0.9253 -0.0837 0.1751 ...
.. .. [list output truncated]
$ F :List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.9289 0.0769
.. ..$ residuals : Named num [1:46] 2.2871 0.0146 1.0332 -0.4157 -0.0889 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.4177 -0.0789
.. ..$ residuals : Named num [1:46] -0.413 1.25 -0.952 -0.22 -0.149 ...
.. .. [list output truncated]
[list output truncated]

Simulating stochastic integrals

I'm using the Sim.DiffProc package in R to simulate a Stratonovich stochastic integral. Using the following code I can simulate 5 paths of the stochastic integral from t=0 to t=5:
fun=expression(w)
strat=st.int(fun, type="str", M=5, lower=0, upper=5)
How can I get the values of the stochastic integral in t=5 given that the st.int() function doesn't give the values in the various t as output?
I'm not sure what you mean by t=5. The $X matrix is a of times series:
> str(strat)
List of 8
$ X : mts [1:1001, 1:5] 0.0187 0.0177 0.0506 0.0357 0.0357 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "X1" "X2" "X3" "X4" ...
..- attr(*, "tsp")= num [1:3] 0 5 200
..- attr(*, "class")= chr [1:3] "mts" "ts" "matrix"
$ fun : symbol w
$ type : chr "str"
$ subdivisions: int 1000
$ M : num 5
$ Dt : num 0.005
$ t0 : num 0
$ T : num 5
- attr(*, "class")= chr "st.int"
If it is the fifth row of the values matrix is what you mean, it would be:
> (strat$X[5 , ])
X1 X2 X3 X4 X5
0.0031517578 0.0161278426 0.0003616453 0.0097594992 0.0012617410

Resources