How to get fitted values from ar() method model in R - r

I want to retrieve the fitted values from an ar() function output model in R. When using Arima() method, I get them using fitted(model.object) function, but I cannot find its equivalent for ar().

It does not store a fitted vector but does have the residuals. An example of using the residuals from the ar-object to reconstruct the predictions from the original data:
data(WWWusage)
arf <- ar(WWWusage)
str(arf)
#====================
List of 14
$ order : int 3
$ ar : num [1:3] 1.175 -0.0788 -0.1544
$ var.pred : num 117
$ x.mean : num 137
$ aic : Named num [1:21] 258.822 5.787 0.413 0 0.545 ...
..- attr(*, "names")= chr [1:21] "0" "1" "2" "3" ...
$ n.used : int 100
$ order.max : num 20
$ partialacf : num [1:20, 1, 1] 0.9602 -0.2666 -0.1544 -0.1202 -0.0715 ...
$ resid : Time-Series [1:100] from 1 to 100: NA NA NA -2.65 -4.19 ...
$ method : chr "Yule-Walker"
$ series : chr "WWWusage"
$ frequency : num 1
$ call : language ar(x = WWWusage)
$ asy.var.coef: num [1:3, 1:3] 0.01017 -0.01237 0.00271 -0.01237 0.02449 ...
- attr(*, "class")= chr "ar"
#===================
str(WWWusage)
# Time-Series [1:100] from 1 to 100: 88 84 85 85 84 85 83 85 88 89 ...
png(); plot(WWWusage)
lines(seq(WWWusage),WWWusage - arf$resid, col="red"); dev.off()

The simplest way to get the fits from an AR(p) model would be to use auto.arima() from the forecast package, which does have a fitted() method. If you really want a pure AR model, you can constrain the differencing via the d parameter and the MA order via the max.q parameter.
> library(forecast)
> fitted(auto.arima(WWWusage,d=0,max.q=0))
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 91.68778 86.20842 82.13922 87.60576 ...

Related

R Importing ARIMA model outputs to use in forecast

I have undertaken ARIMA modelling using the auto.arima function for 91 models. The outputs are sitting in a list of lists.
The structure of the outputs for one model looks like the following:
List of 19
$ coef : Named num [1:8] -3.17e-01 -3.78e-01 -8.02e-01 -5.39e+04 -1.33e+05 ...
..- attr(*, "names")= chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ sigma2 : num 6.37e+10
$ var.coef : num [1:8, 1:8] 1.84e-02 8.90e-03 -7.69e-03 -8.80e+02 2.83e+03 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ mask : logi [1:8] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ loglik : num -1189
$ aic : num 2395
$ arma : int [1:7] 2 1 0 0 1 1 0
$ residuals: Time-Series [1:87] from 1 to 87: 1810 -59503 263294 240970 94842 ...
$ call : language auto.arima(y = x[, 2], stepwise = FALSE, approximation = FALSE, xreg = x[, 3:ncol(x)], x = list(x = c(1856264.57,| __truncated__ ...
$ series : chr "x[, 2]"
$ code : int 0
$ n.cond : int 0
$ nobs : int 86
$ model :List of 10
..$ phi : num [1:2] -0.317 -0.378
..$ theta: num -0.802
..$ Delta: num 1
..$ Z : num [1:3] 1 0 1
..$ a : num [1:3] -599787 284456 1887763
..$ P : num [1:3, 1:3] 0.00 0.00 -4.47e-23 0.00 3.33e-16 ...
..$ T : num [1:3, 1:3] -0.317 -0.378 1 1 0 ...
..$ V : num [1:3, 1:3] 1 -0.802 0 -0.802 0.643 ...
..$ h : num 0
..$ Pn : num [1:3, 1:3] 1.00 -8.02e-01 -1.83e-23 -8.02e-01 6.43e-01 ...
$ bic : num 2417
$ aicc : num 2398
$ xreg : Time-Series [1:87, 1:5] from 1 to 87: -0.866 -0.466 -1.383 -0.999 -0.383 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "Price.Diff" "Easter" "Christmas" "High.Week" ...
$ x : Time-Series [1:87] from 1 to 87: 1856265 1393925 2200962 2209996 2161707 ...
$ fitted : Time-Series [1:87] from 1 to 87: 1854455 1453429 1937668 1969026 2066864 ...
- attr(*, "class")= chr [1:3] "ARIMA" "forecast_ARIMA" "Arima"
When printed the output looks as follows:
Series: x[, 2]
Regression with ARIMA(2,1,1) errors
Coefficients:
ar1 ar2 ma1 Price.Diff Easter Christmas High.Week Low.Week
-0.3170 -0.3777 -0.8017 -53931.11 -133187.55 -53541.62 -347146.59 216202.71
s.e. 0.1356 0.1319 0.1069 28195.33 68789.25 23396.62 -74115.78 66881.15
sigma^2 estimated as 6.374e+10: log likelihood=-1188.69
AIC=2395.38 AICc=2397.75 BIC=2417.47
I have written the following to export my models to text file format:
# export model outputs to newly created folder
for(i in 1:length(ts_outputs)){
sink(paste0(names(ts_outputs[i]), ".txt"))
print(ts_outputs[i])
sink()
}
This works, to view the model outputs themselves, however I need to be able to import the model outputs back into R to use them to forecast out my time series' forward.
I am assuming that I need to put them back into the original structure once re-imported.
Is there a certain package that has already been written to do this?
Are text files the way to go for the original exporting?
I believe the following is the source code from the forecast package which writes the outputs (https://rdrr.io/github/ttnsdcn/forecast-package/src/R/arima.R):
if (length(x$coef) > 0) {
cat("\nCoefficients:\n")
coef <- round(x$coef, digits=digits)
if (se && nrow(x$var.coef)) {
ses <- rep(0, length(coef))
ses[x$mask] <- round(sqrt(diag(x$var.coef)), digits=digits)
coef <- matrix(coef, 1, dimnames=list(NULL, names(coef)))
coef <- rbind(coef, s.e.=ses)
}
print.default(coef, print.gap=2)
}
cm <- x$call$method
if (is.null(cm) || cm != "CSS")
{
cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": log likelihood=", format(round(x$loglik, 2)),"\n",sep="")
npar <- length(x$coef) + 1
nstar <- length(x$residuals) - x$arma[6] - x$arma[7]*x$arma[5]
bic <- x$aic + npar*(log(nstar) - 2)
aicc <- x$aic + 2*npar*(nstar/(nstar-npar-1) - 1)
cat("AIC=", format(round(x$aic, 2)), sep="")
cat(" AICc=", format(round(aicc, 2)), sep="")
cat(" BIC=", format(round(bic, 2)), "\n",sep="")
}
else cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": part log likelihood=", format(round(x$loglik, 2)),
"\n", sep="")
invisible(x)
}
Appreciate any direction/advice.

how to extract integration order (d) from auto.arima

basically we can extract optimum AR order from auto.arima by
> auto.arima(ret.fin.chn,trace=TRUE,allowdrift=TRUE)
ARIMA(2,0,2) with non-zero mean : -14242.19
ARIMA(0,0,0) with non-zero mean : -14239.24
ARIMA(1,0,0) with non-zero mean : -14241.3
ARIMA(0,0,1) with non-zero mean : -14238.16
ARIMA(1,0,2) with non-zero mean : -14237.65
ARIMA(3,0,2) with non-zero mean : -14242.72
ARIMA(3,0,1) with non-zero mean : -14239.52
ARIMA(3,0,3) with non-zero mean : -14242.5
ARIMA(2,0,1) with non-zero mean : -14237.15
ARIMA(4,0,3) with non-zero mean : -14238.06
ARIMA(3,0,2) with zero mean : -14244.39
ARIMA(2,0,2) with zero mean : -14243.98
ARIMA(4,0,2) with zero mean : -14241.45
ARIMA(3,0,1) with zero mean : -14241.23
ARIMA(3,0,3) with zero mean : -14244.04
ARIMA(2,0,1) with zero mean : -14238.78
ARIMA(4,0,3) with zero mean : -14239.73
Best model: ARIMA(3,0,2) with zero mean
Series: ret.fin.chn
ARIMA(3,0,2) with zero mean
Coefficients:
ar1 ar2 ar3 ma1 ma2
0.5497 -0.4887 0.0461 -0.5691 0.4923
s.e. 0.3525 0.1764 0.0232 0.3534 0.1878
sigma^2 estimated as 0.0003277: log likelihood=7127.67
AIC=-14243.35 AICc=-14243.32 BIC=-14207.83
Warning messages:
1: In if (is.constant(x)) { :
the condition has length > 1 and only the first element will be used
2: In if (is.constant(x)) return(d) :
the condition has length > 1 and only the first element will be used
3: In if (is.constant(dx)) { :
the condition has length > 1 and only the first element will be used
now store the result to object a
> a<-auto.arima(ts(ret.fin.chn),trace=TRUE,allowdrift=TRUE)
then
> a$arma[1]
while for optimum MA order by
> a$arma[2]
now look at this part Best model: ARIMA(3,0,2) with zero mean
this is the ARIMA(p,d,q) order
i've known how to extract the AR(p) and MA(q) order but how to extract the Integration(d) order and note in mind that i've tried the ndiffs and sometimes it gives different result than the best model perhaps it's somewhere in $arma[?]???
More generally, the order (d) is the next to last element; the seasonal order (D) is the last. So-
a$arma[length(a$arma)-1] is the order d
a$arma[length(a$arma)] is the seasonal order
As pointed out by Rob Hyndman, one of the authors of the forecast package, in an answer to a similar question on Cross Validated, an easy way to extract the order vector (p,d,q) is to use the forecast::arimaorder function.
In your example, this would work as follows:
arimaorder(a)
The output is a named integer with the values of p, d and q:
p d q
3 0 2
You can see from the help file of arima under Value (auto.arima has the same Value as arima)
arma
A compact form of the specification, as a vector giving the number of AR, MA, seasonal AR and seasonal MA coefficients, plus the period and the number of non-seasonal and seasonal differences.
So value a$arma[6] contains non-seasonal difference and a$arma[7] contains seasonal difference.
I'm really sorry Metrics it seems that your solution isn't quite right
> auto.arima(fin.gre,trace=TRUE,allowdrift=TRUE)$arma
ARIMA(2,2,2) : 26148.84
ARIMA(0,2,0) : 27846.32
ARIMA(1,2,0) : 27209.88
ARIMA(0,2,1) : 26161.36
ARIMA(1,2,2) : 26146.27
ARIMA(1,2,1) : 26144.37
ARIMA(1,2,1) : 26144.37
ARIMA(2,2,1) : 26146.69
Best model: ARIMA(1,2,1)
a<-auto.arima(fin.gre,trace=TRUE,allowdrift=TRUE)
a$arma
[1] 1 1 0 0 1 2 0
while doing str(a) yields
> str(a)
List of 16
$ coef : Named num [1:2] 0.0715 -0.9969
..- attr(*, "names")= chr [1:2] "ar1" "ma1"
$ sigma2 : num 795
$ var.coef : num [1:2, 1:2] 3.65e-04 -3.19e-06 -3.19e-06 3.39e-06
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "ar1" "ma1"
.. ..$ : chr [1:2] "ar1" "ma1"
$ mask : logi [1:2] TRUE TRUE
$ loglik : num -13078
$ aic : num 26162
$ arma : int [1:7] 1 1 0 0 1 2 0
$ residuals: Time-Series [1:2750] from 1 to 2750: 0.39 -1.15 -3.64 -4.65 -11.57 ...
$ call : language auto.arima(x = structure(list(x = c(872.5, 880.78, 884.1, 884.1, 874.45, 855.3, 844.81, 837.14, 828.08, 830.74, 835.36, 839.25, 819.54, 802.27, 798.25, 793.01, 816.43, 831.87, ...
$ series : chr "fin.gre"
$ code : int 0
$ n.cond : int 0
$ model :List of 10
..$ phi : num 0.0715
..$ theta: num -0.997
..$ Delta: num [1:2] 2 -1
..$ Z : num [1:4] 1 0 2 -1
..$ a : num [1:4] 1.01 -1.72 62.87 62.78
..$ P : num [1:4, 1:4] -2.22e-16 2.21e-16 1.74e-16 4.62e-17 2.21e-16 ...
..$ T : num [1:4, 1:4] 0.0715 0 1 0 1 ...
..$ V : num [1:4, 1:4] 1 -0.997 0 0 -0.997 ...
..$ h : num 0
..$ Pn : num [1:4, 1:4] 1.00 -9.97e-01 9.51e-17 1.71e-16 -9.97e-01 ...
$ bic : num 26180
$ aicc : num 26162
$ x :An ‘xts’ object on 2003-01-01/2013-07-16 containing:
Data: num [1:2750, 1] 872 881 884 884 874 ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
- attr(*, "class")= chr "Arima"
as you can see the $model[3] contain two number from $arma[5] and $arma[6]it seems that $arma[5] represent integration order d, but im not really sure about it

Feature selection using the penalizedLDA package

I am trying to use the penalizedLDA package to run a penalized linear discriminant analysis in order to select the "most meaningful" variables. I have searched here and on other sites for help in accessing the the output from the penalized model to no avail.
My data comprises of 400 varaibles and 44 groups. Code I used and results I got thus far:
yy.m<-as.matrix(yy) #Factors/groups
xx.m<-as.matrix(xx) #Variables
cv.out<-PenalizedLDA.cv(xx.m,yy.m,type="standard")
## aplly the penalty
out <- PenalizedLDA(xx.m,yy.m,lambda=cv.out$bestlambda,K=cv.out$bestK)
Too get the structure of the output from the anaylsis:
> str(out)
List of 10
$ discrim: num [1:401, 1:4] -0.0234 -0.0219 -0.0189 -0.0143 -0.0102 ...
$ xproj : num [1:100, 1:4] -8.31 -14.68 -11.07 -13.46 -26.2 ...
$ K : int 4
$ crits :List of 4
..$ : num [1:4] 2827 2827 2827 2827
..$ : num [1:4] 914 914 914 914
..$ : num [1:4] 162 162 162 162
..$ : num [1:4] 48.6 48.6 48.6 48.6
$ type : chr "standard"
$ lambda : num 0
$ lambda2: NULL
$ wcsd.x : Named num [1:401] 0.0379 0.0335 0.0292 0.0261 0.0217 ...
..- attr(*, "names")= chr [1:401] "R400" "R405" "R410" "R415" ...
$ x : num [1:100, 1:401] 0.147 0.144 0.145 0.141 0.129 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:401] "R400" "R405" "R410" "R415" ...
$ y : num [1:100, 1] 2 2 2 2 2 1 1 1 1 1 ...
- attr(*, "class")= chr "penlda"
I am interested in obtaining a list or matrix of the top 20 variables for feature selection, more than likely based on the coefficients of the Linear discrimination.
I realized I would have to sort the coefficients in descending order, and get the variable names matched to it. So the output I would expect is something like this imaginary example
V1 V2
R400 0.34
R1535 0.22...
Can anyone provide any pointers (not necessarily the R code). Thanks in advance.
Your out$K is 4, and that means you have 4 discriminant vectors. If you want the top 20 variables according to, say, the 2nd vector, try this:
# get the data frame of variable names and coefficients
var.coef = data.frame(colnames(xx.m), out$discrim[,2])
# sort the 2nd column (the coefficients) in decreasing order, and only keep the top 20
var.coef.top = var.coef[order(var.coef[,2], decreasing = TRUE)[1:20], ]
var.coef.top is what you want.

R k-means clustering data

in R, I have computed a k-means clustering as follows:
km = (mat2, centers=3)
where mat2 is a matrix of column vectors obtained by combining elements of a set of time series. There are 31 rows
Now that I have my k-means object how can I look at the data associated with a particular point? For example, supposed I clicked on a dot in that belongs to one of the partitions. How can I view this data? Of course what I mean is how to programmatically obtain this data.
I expect that you call kmeans as this:
set.seed(42)
df <- data.frame( row.names = paste0( "obs", 1:100 ),
V1 = rnorm(100),
V2 = rnorm(100),
V3 = rnorm(100) )
km <- kmeans( df, centers = 3 )
If you are unfamiliar with a new function, it's always a good idea to inspect the resulting object using str():
> str(km)
List of 7
$ cluster : Named int [1:100] 1 2 3 3 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:100] "obs1" "obs2" "obs3" "obs4" ...
$ centers : num [1:3, 1:3] 0.65604 -1.09689 0.56428 0.11162 0.00549 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:3] "V1" "V2" "V3"
$ totss : num 291
$ withinss : num [1:3] 43.7 65.7 51.3
$ tot.withinss: num 161
$ betweenss : num 130
$ size : int [1:3] 36 34 30
- attr(*, "class")= chr "kmeans"
As I understood from your question, you are looking for km$cluster, which tells you which observation of your data has been assigned to which cluster. The cluster centers can accordingly be investigated by km$centers.
If you now want to know which observations has been clustered to the third cluster with the center km$centers[3,], you can subset your data.frame (or matrix) by
> rownames(df[ km$cluster == 3, ])
[1] "obs3" "obs4" "obs12" "obs15" "obs16" "obs21" "obs25" "obs27" "obs32" "obs42" "obs43" "obs46" "obs48" "obs54" "obs55" "obs58" "obs61" "obs62" "obs63" "obs66" "obs67" "obs73" "obs76"
[24] "obs77" "obs81" "obs84" "obs86" "obs87" "obs90" "obs94"

am I using the wrong data type with predict.nnet() in R [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
My lack of understanding of R is causing me to grind to a halt in my work and seek your help. I'm looking to build a neural network from some time series data and then build a prediction using separate data and the model returned by the trained neural network.
I created an xts containing the dependent variable nxtCl (a one-day forward closing stock price) and the independent variables (a set of corresponding prices and technical indicators).
I split the xts in two, one set being training data and the other set for testing/prediction, these are miData.train and miData.test respectively. Subsequently I altered these two xts to be scaled data frames.
miData.train <- scale(as.data.frame(miData.train))
miDate.test <- scale(as.data.frame(miData.test))
Using the package nnet I am able to build a neural network from the training data:
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
The str() output for this returned formula object is:
> str(nn)
List of 18
$ n : num [1:3] 11 10 1
$ nunits : int 23
$ nconn : num [1:24] 0 0 0 0 0 0 0 0 0 0 ...
$ conn : num [1:131] 0 1 2 3 4 5 6 7 8 9 ...
$ nsunits : num 22
$ decay : num 0.001
$ entropy : logi FALSE
$ softmax : logi FALSE
$ censored : logi FALSE
$ value : num 4.64
$ wts : num [1:131] 2.73 -1.64 1.1 2.41 1.36 ...
$ convergence : int 0
$ fitted.values: num [1:901, 1] -0.465 -0.501 -0.46 -0.431 -0.485 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ residuals : num [1:901, 1] -0.0265 0.0487 0.0326 -0.0384 0.0632 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ call : language nnet.formula(formula = nxtCl ~ ., data = miData.train, inout = T, size = 10, decay = 0.001, maxit = 10000)
$ terms : language nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol + rsi + dvi
$ coefnames : chr [1:11] "Op" "Hi" "Lo" "Cl" ...
$ xlevels : Named list()
- attr(*, "class")= chr [1:2] "nnet.formula" "nnet"
I then try to run the prediction function using this model nn and the data I kept separate miData.test using the following function:
preds <- predict(object=nn, miData.test)
and I get the following error:
Error in terms.default(object, data = data) :
no terms component nor attribute
Running terms.default on miData.test I see that my data frame does not have any attributes:
terms.default(miData.test)
Error in terms.default(miData.test) : no terms component nor attribute
but is this why the prediction will not run?
miData.test has names that match the terms of nn:
> nn$terms
nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol +
rsi + dvi
> names(miData.test)[1] "Op" "Hi" "Lo" "Cl" "vul" "smaten" "smafif" "smath" "vol" "rsi" "dvi" "nxtCl"
And, in terms of structure, the data is exactly the same as that which was used to build nn in the first place. I tried adding my own named attributes to miData.test, matching the terms of nn but that did not work. The str() of miData.test returns:
> str(miData.test)
'data.frame': 400 obs. of 12 variables:
$ Op : num 82.2 83.5 80.2 79.8 79.8 ...
$ Hi : num 83.8 84.2 83 79.9 80.2 ...
$ Lo : num 81 82.7 79.2 78.3 78 ...
$ Cl : num 83.7 82.8 79.2 79 78.2 ...
$ vul : num 4.69e+08 2.94e+08 4.79e+08 3.63e+08 3.17e+08 ...
$ smaten: num 84.1 84.1 83.8 83.3 82.8 ...
$ smafif: num 86.9 86.8 86.7 86.6 86.4 ...
$ smath : num 111 111 111 110 110 ...
$ vol : num 0.335 0.341 0.401 0.402 0.382 ...
$ rsi : num 45.7 43.6 36.6 36.3 34.7 ...
$ dvi : num 0.00968 0.00306 -0.01575 -0.01189 -0.00623 ...
$ nxtCl : num 82.8 79.2 79 78.2 77.4 ...
Any help or insight in getting predict() to work in this instance would be greatly appreciated. Thanks.
Here's some reproducible code. In putting this together, I have 'removed' the error. Unfortunately, although it now works, I am none the wiser as to what was causing the problem before:
require(quantstrat)
require(PerformanceAnalytics)
require(nnet)
initDate <- "2004-09-30"
endDate <- "2010-09-30"
symbols <- c("SPY")
getSymbols(symbols, from=initDate, to=endDate, index.class=c("POSIXt","POSIXct"))
rsi <- RSI(Cl(SPY))
smaTen <- SMA(Cl(SPY))
smaFif <- SMA(Cl(SPY),n=50)
nxtCl <- lag(Cl(SPY),-1)
tmp <- SPY[,-5]
tmp <- tmp[,-5]
miData <- merge(tmp,rsi,smaTen,smaFif,nxtCl)
names(miData) <- c("Op","Hi","Lo","Cl","rsi","smaTen","smaFif","nxtCl")
miData <- miData[50:1512]
scaled.miData <- scale(miData)
miData.train <- as.data.frame(scaled.miData[1:1000])
miData.test <- as.data.frame(scaled.miData[1001:1463])
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
preds <- predict(object=nn, miData.test)

Resources