How to loop an arima model to run over different columns of a time series data? - r

So I have a data set which has district wise values of covid-19 cases. Now I want to run an Arima model on each of these districts and create a similar dataset of predicted values.
library("forecast")
df <- read.csv("D:/Hackathon/Time series/Maharashtra.csv")
z = ncol(df)
for(i in z){
x = ts(c[,i],frequency = 365, start = c(2020,1,30))
plot.ts(x)
pi = auto.arima(x)
summary(pi)
q = forecast(pi,h=30)
plot.forecast(q)
write.csv(q,"D:/Hackathon/pred.csv")
}
I know for a fact this is not correct. This how the data is

Related

Failing regression loop in R

I wish to regress certain factor exposures calculated for a portfolio on that portfolios returns over 550 months (monthly observations). Fama-macbeth regression.
So essentially, regressing 550 return observations on "constant," factor exposures previously calculated. My loop thus far is as follows
'''
# Second Regression
library(sandwich)
library(broom)
library(tibble)
#Chose file containing factor exposures (note: data columns = exposures, rows = portfolios)
f <- file.choose()
betas <- read.csv(f)
BTM2R <- betas[1, 3]
BIPR <- betas[1, 4]
BInfR <- betas[1, 5]
BUnR <- betas[1, 6]
BOilR <- betas[1, 7]
#Chose file containing return data (columns = portfolios, rows = monthly return observations)
f <- file.choose()
retur <- read.csv(f)
for (i in 1:nrow(retur)){
mod <- lm(data = retur, retur[[i,1]]~BTM2R+BIPR+BInfR+BUnR+BOilR)
print(mod$coefficients)
}
'''
(I also wish to develop this loop further so that after running this regression for each portfolio, it is run for the next portfolio, such that the column number in "return," = j??? But will first address my current problem)
My current problem is that when I run the regression, all coefficient values return "NA," aside from the intercept, which returns a value. To confuse matters further, the intercept does not equal the return value at time = t, which, although the regression results would still obviously be incorrect, one would expect if all other coefficients return "NA,"

Dynamic factor models and forecasting exercises in R (Nowcasting package)

I would like to do a pseudo-out-of-sample exercises with Dynamic factor model (DFM) from the Nowcasting-package in R.
Let me first provide you with a replicable example using the data from the Nowcasting-package.
library(nowcasting)
data(NYFED)
NYFED$legend$SeriesName
base <- NYFED$base
blocks <- NYFED$blocks$blocks
trans <- NYFED$legend$Transformation
frequency <- NYFED$legend$Frequency
delay <- NYFED$legend$delay
vintage <- PRTDB(mts = BRGDP$base, delay = BRGDP$delay, vintage = "2015-06-01")
base <- window(vintage, start = c(2005,06), frequency = 12)
x <- Bpanel(base = base, trans = BRGDP$trans)
GDP <- base[,which(colnames(base) == "PIB")]
GDP_qtr <- month2qtr(x = GDP, reference_month = 3)
y <- diff(diff(GDP_qtr,4))
y <- qtr2month(y)
data <- cbind(y,x)
frequency <- c(4,rep(12,ncol(x)))
nowca <- nowcast(formula = y~., data = data, r = 1, q = 1 , p = 1, method = "2s_agg",
frequency = frequency)
summary(nowca$reg)
nowca$yfcst
nowcast.plot(nowca, type = "fcst")
This code runs fine and creates forecasts and a plot with GDP, in-sample fit and three steps of out-of-sample forecasts.
However, I would like to do a full pseudo-out-of-sample forecasting exercise with this package. In other words, I would like to create multiple point forecasts using forecasts generated by this nowcast-function.
I have already written a replicable code to do this. It uses the same the data as before, but now the data is inputted gradually to the model.
nowcasts_dfm <- rep(NA,nrow(data))
for (i in 12:nrow(data)){
data <- ts(data[1:i,], start=c(2005,06), frequency=12)
nowca <- nowcast(formula = y~., data = data, r = 1, q = 1 , p = 1, method = "2s_agg",
frequency = frequency)
nowcasts_dfm[i] <- now$yfcst[,3][!is.na(now$yfcst[,3])][1]
}
So, this pseudo-out-of-sample uses expanding window starting with the first 12 observations. It then expands to cover the whole sample. However, I am getting a error message.
Error in eigen(cov(x)) : infinite or missing values in 'x'
Could some help me with this, please? How do you code a expanding window pseudo-out-of-sample forecasting exercise with this package?
Or is there a better way to code a expanding window Dynamic factor model (DFM) in R?
Thanks!

Plotting Forecast and Real values in one plot using a Rolling Window

I have a code which takes the input as the Yield Spread (dependent var.) and Forward Rates(independent var.) and operate an auto.arima to get the orders. Afterwards, I am forecasting the next 25 dates (forc.horizon). My training data are the first 600 (training). Then I am moving the time window 25 dates, meaning using the data from 26 to 625, estimating the auto.arima and then forecasting the data from 626 to 650 and so on. My data sets are 2298 rows (date) and 30 columns (maturity).
I want to store all of the forecasts and then plot the forecasted and real values in the same plot.
This is the code I have, but it doesn't store the forecasts in a way to plot later.
forecast.func <- function(NS.spread, ind.v, maturity, training, forc.horizon){
NS.spread <- NS.spread/100
forc <- c()
j <- 0
for(i in 1:floor((nrow(NS.spread)-training)/forc.horizon)){
# test data
y <- NS.spread[(1+j):(training+j) , maturity]
f <- ind.v[(1+j):(training+j) , maturity]
# auto- arima
c <- auto.arima(y, xreg = f, test= "adf")
# forecast
e <- ind.v[(training+j+1):(training+j+forc.horizon) , maturity]
h <- forecast(c, xreg = lagmatrix(e, -1))
forc <- c(forc, list(h))
j <- j + forc.horizon
}
return(forc)
}
a <- forecast.func(spread.NS.JPM, Forward.rate.JPM, 10, 600, 25)
lapply(a, plot)
Here's a link to my two datasets:
https://drive.google.com/drive/folders/1goCxllYHQo3QJ0IdidKbdmfR-DZgrezN?usp=sharing
LOOK AT THE END for a full functional example on how to handle AUTO.ARIMA MODEL with DAILY DATA using XREG and FOURIER SERIES with ROLLING STARTING TIMES and cross validated training and test.
Without a reproducible example no one can help you, because they can't run your code. You need to provide data. :-(
Even if it's not part of StackOverflow to discuss statistics matters, why don't you do an auto.arima with xreg instead of lm + auto.arima on residuals? Especially, considering how you forecast at the end, that training method looks really wrong. Consider using:
fit <- auto.arima(y, xreg = lagmatrix(f, -1))
h <- forecast(fit, xreg = lagmatrix(e, -1))
auto.arima will automatically calculate the best parameters by max likelihood.
On your coding question..
forc <- c() should be outside of the for loop, otherwise at every run you delete your previous results.
Same for j <- 0: at every run you're setting it back to 0. Put it outside if you need to change its value at every run.
The output of forecast is an object of class forecast, which is actually a type of list. Therefore, you can't use cbind effectively.
I'm my opinion, you should create forc in this way: forc <- list()
And create a list of your final results in this way:
forc <- c(forc, list(h)) # instead of forc <- cbind(forc, h)
This will create a list of objects of class forecast.
You can then plot them with a for loop by getting access at every object or with a lapply.
lapply(output_of_your_function, plot)
This is as far as I can go without a reproducible example.
FINAL EDIT
FULL FUNCTIONAL EXAMPLE
Here I try to sum up a conclusion out of the million comments we wrote.
With the data you provided, I built a code that can handle everything you need.
From training and test to model, till forecast and finally plotting which have the X axis with the time as required in one of your comments.
I removed the for loop. lapply is much better for your case.
You can leave the fourier series if you want to. That's how Professor Hyndman suggests to handle daily time series.
Functions and libraries needed:
# libraries ---------------------------
library(forecast)
library(lubridate)
# run model -------------------------------------
.daily_arima_forecast <- function(init, training, horizon, tt, ..., K = 10){
# create training and test
tt_trn <- window(tt, start = time(tt)[init] , end = time(tt)[init + training - 1])
tt_tst <- window(tt, start = time(tt)[init + training], end = time(tt)[init + training + horizon - 1])
# add fourier series [if you want to. Otherwise, cancel this part]
fr <- fourier(tt_trn[,1], K = K)
frf <- fourier(tt_trn[,1], K = K, h = horizon)
tsp(fr) <- tsp(tt_trn)
tsp(frf) <- tsp(tt_tst)
tt_trn <- ts.intersect(tt_trn, fr)
tt_tst <- ts.intersect(tt_tst, frf)
colnames(tt_tst) <- colnames(tt_trn) <- c("y", "s", paste0("k", seq_len(ncol(fr))))
# run model and forecast
aa <- auto.arima(tt_trn[,1], xreg = tt_trn[,-1])
fcst <- forecast(aa, xreg = tt_tst[,-1])
# add actual values to plot them later!
fcst$test.values <- tt_tst[,1]
# NOTE: since I modified the structure of the class forecast I should create a new class,
# but I didnt want to complicate your code
fcst
}
daily_arima_forecast <- function(y, x, training, horizon, ...){
# set up x and y together
tt <- ts.intersect(y, x)
# set up all starting point of the training set [give it a name to recognize them later]
inits <- setNames(nm = seq(1, length(y) - training, by = horizon))
# remove last one because you wouldnt have enough data in front of it
inits <- inits[-length(inits)]
# run model and return a list of all your models
lapply(inits, .daily_arima_forecast, training = training, horizon = horizon, tt = tt, ...)
}
# plot ------------------------------------------
plot_daily_forecast <- function(x){
autoplot(x) + autolayer(x$test.values)
}
Reproducible Example on how to use the previous functions
# create a sample data
tsp(EuStockMarkets) <- c(1991, 1991 + (1860-1)/365.25, 365.25)
# model
models <- daily_arima_forecast(y = EuStockMarkets[,1],
x = EuStockMarkets[,2],
training = 600,
horizon = 25,
K = 5)
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[1]]
Example for the author of the post
# your data
load("BVIS0157_Forward.rda")
load("BVIS0157_NS.spread.rda")
spread.NS.JPM <- spread.NS.JPM / 100
# pre-work [out of function!!!]
set_up_ts <- function(m){
start <- min(row.names(m))
end <- max(row.names(m))
# daily sequence
inds <- seq(as.Date(start), as.Date(end), by = "day")
ts(m, start = c(year(start), as.numeric(format(inds[1], "%j"))), frequency = 365.25)
}
mts_spread.NS.JPM <- set_up_ts(spread.NS.JPM)
mts_Forward.rate.JPM <- set_up_ts(Forward.rate.JPM)
# model
col <- 10
models <- daily_arima_forecast(y = mts_spread.NS.JPM[, col],
x = stats::lag(mts_Forward.rate.JPM[, col], -1),
training = 600,
horizon = 25,
K = 5) # notice that K falls between ... that goes directly to the inner function
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[5]]

Why is predict.lm returning NA as prediction in R?

I am trying to replicate a simulation from an article for a class project. I have created simulated matrix x and a simulated vector y. I want to first reduce the model size from p=1000 to p=n/log(n) using SIS. I want to then further reduce the vector space to p' by using the Primal Dual Dantzig selector.
There is no direct predict function in this package so I am plugging the p' selected predictors and their coefficients into a lm function to get predictions. I am using the predict.lm function to get the predicted values. However, the returned vector is all NA.
The following code may illustrate the problem further. I am not getting any errors after running the following code. I have to follow this procedure of using SIS and then Dantzig Selector to match the simulation in the article.
library(SIS) #import the SIS library
library(fastclime)
errors_DS_small = c() #create empty array to store errors from each simulation
model_sizes_DS_small = c() #creat empty array to store selected model size from each simulation
start_time = Sys.time()
set.seed(123*1) #re-create results at a later time but different seed for each data set
n = 200 #sample size
p = 1000 #variables
x = matrix(rnorm(n*p, mean=0, sd=1), n, p) #creating IID standard Gaussian random predictors
# gaussian response
set.seed(456*1) #re-create results at a later time but different seed for each data set
s = 8
u = rbinom(s, 1, 0.4)
z = rnorm(s, mean=0, sd=1)
a = 4*log(n)/sqrt(n)
b= ((-1)**(u))*(a + abs(z))
y=x[, 1:s]%*%b
#creating SIS-DS model and gaussian response. iter=FALSE means not doing ISIS
modelSIS_small = SIS(x, y, family='gaussian', iter = FALSE, nsis=(n/log(n)))
modelDS = dantzig(x[,modelSIS_small$sis.ix0], y)
selectDS = dantzig.selector(modelDS$lambdalist, modelDS$BETA0, lambda=min(modelDS$lambdalist))
linearMod = lm(y~x[,modelSIS_small$sis.ix0])
linearMod$coefficients = selectDS
newx = x[,modelSIS_small$sis.ix0]
predTest = predict(linearMod, data=newx) #create predictions using test data test
a = modelDS$BETA0[, 9] != 0
mse = mean((y - predTest)^2) #compare predictions to real values of test y
rmse = sqrt(mse) #Square root of MSE (Square-loss function)
errors_DS_small[1] = rmse #store the RMSE
model_sizes_DS_small[1] = table(a)["TRUE"] #Store model size including intercept
print(c(1, errors_DS_small[1], model_sizes_DS_small[1])) #print results
end_time = Sys.time()
DS_total_time_small = end_time - start_time
DS_error_small_median = median(errors_DS_small)
DS_model_Size_small_median = median(model_sizes_DS_small)
print(c("DS", DS_model_Size_small_median, DS_error_small_median, DS_total_time_small))

How to create a formulated table in R?

This is my reproducible example :
#http://gekkoquant.com/2012/05/26/neural-networks-with-r-simple-example/
library("neuralnet")
require(ggplot2)
traininginput <- as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
Hidden_Layer_1 <- 1 # value is randomly assigned
Hidden_Layer_2 <- 1 # value is randomly assigned
Threshold_Level <- 0.1 # value is randomly assigned
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=c(Hidden_Layer_1, Hidden_Layer_2), threshold = Threshold_Level)
#Test the neural network on some test data
testdata <- as.data.frame((1:13)^2) #Generate some squared numbers
net.results <- predict(net.sqrt, testdata) #Run them through the neural network
cleanoutput <- cbind(testdata,sqrt(testdata),
as.data.frame(net.results))
colnames(cleanoutput) <- c("Input","ExpectedOutput","NeuralNetOutput")
ggplot(data = cleanoutput, aes(x= ExpectedOutput, y= NeuralNetOutput)) + geom_point() +
geom_abline(intercept = 0, slope = 1
, color="brown", size=0.5)
rmse <- sqrt(sum((sqrt(testdata)- net.results)^2)/length(net.results))
print(rmse)
At here, when my Hidden_Layer_1 is 1, Hidden_Layer_2 is 2, and the Threshold_Level is 0.1, my rmse generated is 0.6717354.
Let's say we try for the other example,
when my Hidden_Layer_1 is 2, Hidden_Layer_2 is 3, and the Threshold_Level is 0.2, my rmse generated is 0.8355925.
How can I create a table that will automatically calculate the value of rmse when user assign value to the Hidden_Layer_1, Hidden_Layer_2, and Threshold_Level. ( I know how to do it in Excel but not in r haha )
The desired table should be looked like this :
I wish that I have Trial(s), Hidden_Layer_1, Hidden_Layer_2, Threshold_Level, and rmse in my column, and the number of rows can be generated infinitely by entering some actionButton (if possible), means user can keep on trying until they got the rmse they desired.
How can I do that? Can anyone help me? I will definitely learn from this lesson as I am quite new to r.
Thank you very much for anyone who willing to give a helping hand to me.
Here is a way to create the table of values that can be displayed with the data frame viewer.
# initialize an object where we can store the parameters as a data frame
data <- NULL
# function to receive a row of parameters and add them to the
# df argument
addModelElements <- function(df,trial,layer1,layer2,threshold,rmse){
newRow <- data.frame(trial = trial,
Hidden_Layer_1 = layer1,
Hidden_Layer_2 = layer2,
Threshold = threshold,
RMSE = rmse)
rbind(df,newRow)
}
# once a model has been run, call addModelElements() with the
# model parameters
data <- addModelElements(data,1,1,2,0.1,0.671735)
data <- addModelElements(data,2,2,3,0.2,0.835593)
...and the output:
View(data)
Note that if you're going to create scores or hundreds of rows of parameters & RMSE results before displaying any of them to the end user, the code should be altered to improve the efficiency of rbind(). In this scenario, we build a list of sets of parameters, convert them into data frames, and use do.call() to execute rbind() only once.
# version that improves efficiency of `rbind()
addModelElements <- function(trial,layer1,layer2,threshold,rmse){
# return row as data frame
data.frame(trial = trial,
Hidden_Layer_1 = layer1,
Hidden_Layer_2 = layer2,
Threshold = threshold,
RMSE = rmse)
}
# generate list of data frames and rbind() once
inputParms <- list(c(1,1,2,0.1,0.671735),
c(1,1,2,0.3,0.681935),
c(2,2,3,0.2,0.835593))
parmList <- lapply(inputParms,function(x){
addModelElements(x[1],x[2],x[3],x[4],x[5])
})
# bind to single data frame
data <- do.call(rbind,parmList)
View(data)
...and the output:

Resources