Recursive / Expanding Window forecasts - r

I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
library(forecast)
library(zoo)
library(timeDate)
library(xts)
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data, order.by=tt)
##########################################################################
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
print(fcast)
# print(fcast)
# newfcast[k+window+1,]=fcast
}
print(newfcast)
return(newfcast)
}
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
{
try({
X=data[,i]
Y=data[,1]
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
StoreMatrix=cbind(StoreMatrix,RecModel)
print(StoreMatrix)
}, silent=T)
}
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.

Related

Please select a longer horizon when the forecasts are first computed in forecast package in r

When I run the following code, I do NOT get this error:
## https://www.dataiku.com/learn/guide/code/r/time_series.html
library(readxl)
library(forecast)
library(dplyr)
library(prophet)
library(rstan)
library(Hmisc)
library(caret)
data<-read_excel("Time Series/Items.xlsx", col_types = c("text", "numeric"))
Nper=0.75
stmodels=c("meanf","naive","snaive","rwf","croston","stlf","ses","holt","hw","splinef","thetaf","ets","auto.arima","tbats","prophet")
gkuniforecast = function(data, Np, Ncolumn, tsfreq, model) {
## Preparation
N = ceiling(Np*nrow(data))
## Models
if (model=="prophet"){
df=data
names(df)=c("ds","y")
df$ds=as.Date(paste(df$ds,"-01",sep=""), "%Y-%b-%d")
train.df = df[1:N,]
na.df=data.frame(ds=rep(NA, N),y=rep(NA, N))
test.df <- rbind(na.df, df[(N+1):nrow(data),])
m <- prophet(train.df)
future <- make_future_dataframe(m, periods = nrow(data)-N, freq = 'month')
pro_forecast <- predict(m, future)
plot(m, pro_forecast)
##prophet_plot_components(m, forecast)
acc=matrix(rep(NA, 16),nrow=2,ncol=8,dimnames=list(c("Training set", "Test set"),c("ME","RMSE","MAE","MPE","MAPE","MASE","ACF1","Theil's U")))
acc["Test set","RMSE"]=sqrt(mean((pro_forecast$yhat - test.df)^2, na.rm = TRUE))
}else{
x=pull(data,Ncolumn)
train.x = ts(x[1:N], frequency=tsfreq)
test.x <- ts(c(rep(NA, N), x[(N+1):NROW(x)]), frequency=tsfreq)
str1=paste0("m_",model," = ",model,"(train.x)")
if (Np==1) {str2=paste0("f_",model," = forecast(m_",model,", h=NROW(x)")
} else {str2=paste0("f_",model," = forecast(m_",model,", h=NROW(x)-N)")}
str3=paste0("plot(f_",model,")")
str4="lines(test.x)"
str5=paste0("acc=accuracy(f_",model,",test.x)")
str=paste0(str1,";",str2,";",str3,";",str4,";",str5)
eval(parse(text=str))
}
return(acc)
}
acc = lapply(stmodels, gkuniforecast, data=data, Np=Nper, Ncolumn=2,tsfreq=12)
But when I run this code, I do:
##Forecast data prep
tsfreq=5
x=pull(data,1)
train.x = ts(x[1:N], frequency=tsfreq)
test.x <- ts(c(rep(NA, N), x[(N+1):NROW(x)]), frequency=tsfreq)
stmodels=c("meanf","naive","snaive","rwf","croston","stlf","ses","holt","hw"##,"splinef"
,"thetaf","ets","auto.arima","tbats")
for (i in 1:length(stmodels)){
str1=paste0("m_",stmodels[i]," = ",stmodels[i],"(train.x)")
str2=paste0("f_",stmodels[i]," = forecast(m_",stmodels[i],", h=NROW(x)-N)")
str3=paste0("plot(f_",stmodels[i],")")
str4="lines(test.x)"
str5=paste0('acc[["',stmodels[i],'"]]=accuracy(f_',stmodels[i],',test.x)')
str=paste0(str1,";",str2,";",str3,";",str4,";",str5)
eval(parse(text=str))
}
There seems to be a problem with 'hw' (splinef is commented out, because it gives me another error), but I do not understand why in the first dataset, I get no errors and I do with the second dataset. What is also different is the frequency.
Again the error is:
Please select a longer horizon when the forecasts are first computed
You are mixing functions that create forecasts directly (like meanf()) with functions that generate models (like ets()). For functions that generate forecasts directly, you need to specify the forecast horizon when you call the function. See https://otexts.org/fpp2/the-forecast-package-in-r.html for a list of functions that produce forecasts directly.

How to visulize the convolution layer and feature layer in mxnet after cnn was finished trained?

I want to plot or visualize the result of each layers out from a trained CNN with mxnet in R. Like w´those abstract art from what a nn's each layer can see.
But I don't know how. Please somebody help me. One way I can think out is to put the weights and bias back to every step and plot the step out. But when I try to put model$arg.params$convolution0_weight back to mx.symbol.Convolution(), I get
Error in mx.varg.symbol.Convolution(list(...)) :
./base.h:291: Unsupported parameter type object type for argument weight, expect integer, logical, or string.
Can anyone help me?
I thought out one way, but encounter a difficulty at one step. Here is what I did.
I found all the trained cnn's parameters inmodel$arg.params , and to compute with parameters we can use mx.nd... founctions as bellow:
`#convolution 1_result
conv1_result<- mxnet::mx.nd.Convolution(data=mx.nd.array(train_array),weight=model$arg.params$convolution0_weight,bias=model$arg.params$convolution0_bias,kernel=c(8,8),num_filter = 50)
str(conv1_result)
tanh1_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool1_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
conv2 result
conv2_result<- mxnet::mx.nd.Convolution(data=pool1_result,weight=model$arg.params$convolution1_weight,bias=model$arg.params$convolution1_bias,kernel=c(5,5),num_filter = 50)
tanh2_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool2_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
1st fully connected layer result
flat_result <- mx.nd.flatten(data = pool2_result)
fcl_1_result <- mx.nd.FullyConnected(data = flat_result,weight = model$arg.params$fullyconnected0_weight,bias = model$arg.params$fullyconnected0_bias, num_hidden = 500)
tanh_3_result <- mx.nd.Activation(data = fcl_1_result, act_type = "tanh")
2nd fully connected layer result
fcl_2_result <- mx.nd.FullyConnected(data = tanh_3,weight = model$arg.params$fullyconnected1_weight,bias = model$arg.params$fullyconnected1_bias, num_hidden =100)`
but when I came to mx.nd.FullyConnected() step , I encountered not sufficient memory(i have 16 GB RAM) and R crashed.
So, does anyone know how to batch_size the input data in
mx.nd.FullyConnected(), or any method to make mx.nd.FullyConnected() run successfully as mx.model.FeedForward.create()
did?
Here is the code that can help you to achieve what you want. The code below displays activations of 2 convolution layers of LeNet. The code gets as an input MNIST dataset, which is 28x28 grayscale images (downloaded automatically), and produces images as activations.
You can grab outputs from executor. To see the list of available outputs use names(executor$ref.outputs)
The result of each output is available as a matrix with values in [-1; 1] range. The dimensions of the matrix depends on parameters of the layer. The code use these matrices to display as greyscaled images where -1 is white pixel, 1 - black pixel. (most of the code is taken from https://github.com/apache/incubator-mxnet/issues/1152 and massaged a little bit)
The code is a self sufficient to run, but I have noticed that if I build the model second time in the same R session, the names of ouputs get different indices, and later the code fails because the expected names of outputs are hard coded. So if you decide to create a model more than once, you will need to restart R session.
Hope it helps and you can adjust this example to your case.
library(mxnet)
download.file('https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/data/mnist_csv.zip', destfile = 'mnist_csv.zip')
unzip('mnist_csv.zip', exdir = '.')
train <- read.csv('train.csv', header=TRUE)
data.x <- train[,-1]
data.x <- data.x/255
data.y <- train[,1]
val_ind = 1:100
train.x <- data.x[-val_ind,]
train.x <- t(data.matrix(train.x))
train.y <- data.y[-val_ind]
val.x <- data.x[val_ind,]
val.x <- t(data.matrix(val.x))
val.y <- data.y[val_ind]
train.array <- train.x
dim(train.array) <- c(28, 28, 1, ncol(train.x))
val.array <- val.x
dim(val.array) <- c(28, 28, 1, ncol(val.x))
# input layer
data <- mx.symbol.Variable('data')
# first convolutional layer
convLayer1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=30)
convAct1 <- mx.symbol.Activation(data=convLayer1, act_type="tanh")
poolLayer1 <- mx.symbol.Pooling(data=convAct1, pool_type="max", kernel=c(2,2), stride=c(2,2))
# second convolutional layer
convLayer2 <- mx.symbol.Convolution(data=poolLayer1, kernel=c(5,5), num_filter=60)
convAct2 <- mx.symbol.Activation(data=convLayer2, act_type="tanh")
poolLayer2 <- mx.symbol.Pooling(data=convAct2, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# big hidden layer
flattenData <- mx.symbol.Flatten(data=poolLayer2)
hiddenLayer <- mx.symbol.FullyConnected(flattenData, num_hidden=500)
hiddenAct <- mx.symbol.Activation(hiddenLayer, act_type="tanh")
# softmax output layer
outLayer <- mx.symbol.FullyConnected(hiddenAct, num_hidden=10)
LeNet1 <- mx.symbol.SoftmaxOutput(outLayer)
# Group some output layers for visual analysis
out <- mx.symbol.Group(c(convAct1, poolLayer1, convAct2, poolLayer2, LeNet1))
# Create an executor
executor <- mx.simple.bind(symbol=out, data=dim(val.array), ctx=mx.cpu())
# Prepare for training the model
mx.set.seed(0)
# Set a logger to keep track of callback data
logger <- mx.metric.logger$new()
# Using cpu by default, but set gpu if your machine has a supported one
devices=mx.cpu(0)
# Train model
model <- mx.model.FeedForward.create(LeNet1, X=train.array, y=train.y,
eval.data=list(data=val.array, label=val.y),
ctx=devices,
num.round=1,
array.batch.size=100,
learning.rate=0.05,
momentum=0.9,
wd=0.00001,
eval.metric=mx.metric.accuracy,
epoch.end.callback=mx.callback.log.train.metric(100, logger))
# Update parameters
mx.exec.update.arg.arrays(executor, model$arg.params, match.name=TRUE)
mx.exec.update.aux.arrays(executor, model$aux.params, match.name=TRUE)
# Select data to use
mx.exec.update.arg.arrays(executor, list(data=mx.nd.array(val.array)), match.name=TRUE)
# Do a forward pass with the current parameters and data
mx.exec.forward(executor, is.train=FALSE)
# List of outputs available.
names(executor$ref.outputs)
# Plot the filters of a sample from validation set
sample_index <- 99 # sample number in validation set. Change it to if you want to see other samples
activation0_filter_count <- 30 # number of filters of the "convLayer1" layer
par(mfrow=c(6,5), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
dim(executor$ref.outputs$activation0_output)
for (i in 1:activation0_filter_count) {
outputData <- as.array(executor$ref.outputs$activation0_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
activation1_filter_count <- 60 # number of filters of the "convLayer2" layer
dim(executor$ref.outputs$activation1_output)
par(mfrow=c(6,10), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
for (i in 1:activation1_filter_count) {
outputData <- as.array(executor$ref.outputs$activation1_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
As a result you should see the following images for a validation sample #2 (use RStudio left and right arrows to navigate between them).

addTA - Error in naCheck(x, n) : Series contains non-leading NAs

I recently tried to create my own technical indicator, a simple golden cross indicator. 50 - 200 day EMA to be added to my chartSeries chart. This worked fine with the code below at first, but after the updated package of quantmod was released it gives me this error message:
Code (stock data is downloaded through the getSymbols function in quantmod)
#20dayEMA - 50dayEMA Technical indicator, Price and Volume
newEMA <- function(x){(removeNA(EMA(p[,6],n=50)-(EMA(p[,6],n=200))))
}
emaTA <- newTA(newEMA)
emaTA(col='lightgoldenrod3', 'Price')
Then it gives me this error message:
Error in naCheck(x, n) : Series contains non-leading NAs
Does anyone know how to remove these non-leading NAs?
You can use na.omit and there is no need to convert to an xts-object as this is the default.
library(quantmod)
getSymbols("VELO.CO")
p <- na.omit(VELO.CO)
newEMA <- function(x) {
EMA(p[,6], n = 20) - (EMA(p[,6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")
I'm not familiar with the quantmod package, but I played around with your code and I think I found a working solution:
library("quantmod")
getSymbols("VELO.CO")
p <- as.xts(c(VELO.CO))
# remove incomplete cases
vec <- which(!complete.cases(p)) # rows 2305 2398
p2 <- p[-vec, ]
newEMA <- function(x) {
EMA(p2[, 6], n = 20) - (EMA(p2[, 6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")

How to estimate static yield curve with 'termstrc' package in R?

I am trying to estimate the static yield curve for Brazil using termstrc package in R. I am using the function estim_nss.couponbonds and putting 0% coupon-rates and $0 cash-flows, except for the last one which is $1000 (the face-value at maturity) -- as far as I know this is the function to do this, because the estim_nss.zeroyields only calculates the dynamic curve. The problem is that I receive the following error message:
"Error in (pos_cf[i] + 1):pos_cf[i + 1] : NA/NaN argument In addition: Warning message: In max(n_of_cf) : no non-missing arguments to max; returning -Inf "
I've tried to trace the problem using trace(estim_nss.couponbons, edit=T) but I cannot find where pos_cf[i]+1 is calculated. Based on the name I figured it could come from the postpro_bondfunction and used trace(postpro_bond, edit=T), but I couldn't find the calculation again. I believe "cf" comes from cashflow, so there could be some problem in the calculation of the cashflows somehow. I used create_cashflows_matrix to test this theory, but it works well, so I am not sure the problem is in the cashflows.
The code is:
#Creating the 'couponbond' class
ISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021','ltn_2023')) #Bond's identification
MATURITYDATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
ISSUEDATE <- as.Date(c(41288,41666,42395, 42073, 42395), origin='1899-12-30') #Dates are in system's format
COUPONRATE <- rep(0,5) #Coupon rates are 0 because these are zero-coupon bonds
PRICE <- c(969.32, 867.77, 782.48, 628.43, 501.95) #Prices seen 'TODAY'
ACCRUED <- rep(0.1,5) #There is no accrued interest in the brazilian bond's market
#Creating the cashflows sublist
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021', 'ltn_2023')) #Bond's identification
CF <- c(1000,1000,1000,1000,1000)# The face-values
DATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil)
class(mybonds) <- "couponbonds"
#Estimating the zero-yield curve
ns_res <-estim_nss.couponbonds(mybonds, 'brasil' ,method = "ns")
#Testing the hypothesis that the error comes from the cashflow matrix
cf_p <- create_cashflows_matrix(mybonds[[1]], include_price = T)
m_p <- create_maturities_matrix(mybonds[[1]], include_price = T)
b <- bond_yields(cf_p,m_p)
Note that I am aware of this question which reports the same problem. However, it is for the dynamic curve. Besides that, there is no useful answer.
Your code has two problems. (1) doesn't name the 1st list (this is the direct reason of the error. But if modifiy it, another error happens). (2) In the cashflows sublist, at least one level of ISIN needs more than 1 data.
# ...
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019',
'ltn_2021', 'ltn_2023', 'ltn_2023')) # added a 6th element
CF <- c(1000,1000,1000,1000,1000, 1000) # added a 6th
DATE <- as.Date(c(42736,43101,43466,44197,44927, 44928), origin='1899-12-30') # added a 6th
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil = brasil) # named the list
class(mybonds) <- "couponbonds"
ns_res <-estim_nss.couponbonds(mybonds, 'brasil', method = "ns")
Note: the error came from these lines
bonddata <- bonddata[group] # prepro_bond()'s 1st line (the direct reason).
# cf <- lapply(bonddata, create_cashflows_matrix) # the additional error
create_cashflows_matrix(mybonds[[1]], include_price = F) # don't run

Relative Performance loop in R

I'm not a programmer by any means and have been trying to learn R to code various trading strategies. I'm trying to calculate the relative performance of a list of stocks versus the S&P 500 and save it to a matrix. It appears that what I've written only goes through the first symbol and then stops. Below is the code that I've come up with. I appreciate any help, input and advice on how to proceed. Thank you.
library(quantmod)
library(PerformanceAnalytics)
Sys.setenv(TZ = "UTC")
symbols <- c('IBM', 'GE', '^GSPC')
getSymbols(symbols, src = "yahoo", from = "2010-12-31", to = Sys.Date())
symadj <- cbind(IBM[,6], GE[,6])
sp5adj <- GSPC[,6]
# Calculate Relative Performance vs S&P and save data
for (i in length(symadj)) {
rp <- matrix(symadj[,1]/sp5adj, nrow = 1070, ncol = 3)
print(tail(rp))
}
_You are not looping over an array but over a single number:
for (i in length(symadj))
Try (see the seq added, watch the parenthesis. Plus, be careful with length, the iteration is over ncol - i.e. the columns):
for (i in seq(1,ncol(rp),1))
_Also, you are going always through the same column:
rp <- matrix(symadj[,1]/sp5adj, nrow = 1070, ncol = 3)
_A thing I skipped: you should build your matrix before the loop:
rp <- matrix(0,nrow=1071,ncol=2)
And then assign without overwritting your previous matrix - you have already build it (plus, look at the i where the 1 was, now you are iterating)
rp[,i] <- symadj[,i]/sp5adj #This inside the loop
_Your for loop should end up looking something like this:
rp <- matrix(0,nrow=1071,ncol=2)
for (i in seq(1,ncol(rp),1)) {
rp[,i] <- symadj[,i]/sp5adj #This inside the loop
print(tail(rp))
}
\!/ Now there are 1071 days in that period, so the matrix should have one more row - that's why the 1071.

Resources