Out sample-rolling window-forecasts R - r

I am trying to use a rolling window using linear regression. I don't know how I should store the output from forecast into a variable, which I could use for plotting ant etc.
predict.1 <- function(P){
results <- rep(0, P)
for( i in 0:2){
y1<-window(y,start=1937+i,end=1966+i)
x1<-window(ll,start=1937+i,end=1966+i)
dx2<-window(ll,start=1937+i,end=1966+i)
in.sample<-data.frame(y1,x1,x2)
names(in.sample)<-c("Outcome","Predictor1","Predictor2")
x1.pred<-window(x1,start=1967+i,end=1967+i)
x2.pred<-window(x2,start=1967+i,end=1967+i)
out.sample<-data.frame(x1.pred,x2.pred)
new.data<-out.sample
names(new.data)<-c("Predictor1","Predictor2")
results[i]<-predict(lm(Outcome~Predictor1+Predictor2,data=in.sample),new.data,se.fit=TRUE)
}
results
}
I receive this message:
Warning messages:
1: In results[i] <- predict(lm(Outcome ~ Predictor1 + Predictor2, data = in.sample), :
number of items to replace is not a multiple of replacement length
2: In results[i] <- predict(lm(Outcome ~ Predictor1 + Predictor2, data = in.sample), :
number of items to replace is not a multiple of replacement length.
I don't know how to overcome the problem.

Related

Why is DGLM in R not Recognizing my Objects?

I am trying to run the double generalized linear model (DGLM) in R on my traits of interest. I have made a function that extracts the components of interest from dglm with the arguments accepting a column (cT) of my Phenotypic object (Phenos), the snp (i) from my genotypic object (Geno), and PCA's (covar) to control with population structure.
my.pdglm <- function(cT=NULL, i=NULL, Phenos=NULL, Geno=NULL, covar=NULL)
The body of my p.dglm function is this as follows
my.pdglm <- function(cT=NULL, i=NULL, Phenos=NULL, Geno=NULL, covar=NULL) {
y <- Phenos[,cT]
model <- dglm(y ~ Geno[, i] + covar[, 2] + covar[, 3] + covar[, 4] + covar[, 5] + covar[,6] + covar[,7] + covar[, 8], ~ Geno[, i], family = gaussian(link = "identity"))
P.mean <- summary(model)$coef[2, 4] # Extarct p values for mean part
P.disp <- pchisq(q = anova(model)$Adj.Chisq[2], df = anova(model)$DF[2], lower.tail = FALSE)
s.model <- summary(model$dispersion.fit)
beta <- s.model$coef[2, 1] # Extarct cofficients
se <- s.model$coef[2, 2] # Extract standard errors
out <- data.frame(Beta = beta, SE = se, P.mean = P.mean, P.disp = P.disp,
stringsAsFactors = FALSE) # Save all the extracted
return(out)
}
When I try and run this function, I keep getting the following error using this as an example:
my.pdglm(cT=3, i=9173, Phenos=SP_Zm_NULL, Geno=t(Geno), covar=Zm_covar_FULL)
[1] "--------- Fitting DGLM model for SNP 9173 out of 41611 ----------"
Error in eval(predvars, data, env) : object 'y' not found
Called from: eval(predvars, data, env)
When I print(y) as a quality control step, it usually prints, but dglm is not recognizing it. The only way I get my function to work is if I run my function with the exact arguments named as the arguments themselves. Can anyone help me with this? This has been holding me up for a while.

Regression with linear trend goes mad

I want to define a function panel_fit which will perform panel fit for dependent variable (y), and independent variables (x). The panel regression should has linear trend within it.
I want to show you my work on the data following :
library(plm)
data("EmplUK", package="plm")
dep_var <- EmplUK['capital']
#deleting dependent variable - it's meaningless but, it's only for defining function purpose
df1 <- EmplUK[-6]
panel_fit <- function(y, x, inputs = list(), model_type) {
x[, length(x) + 1] <- y
x <- x %>%
group_by_at(1) %>%
mutate(Trend = row_number())
varnames <- names(x)[3:(length(x))]
varnames <- varnames[!(varnames == names(y))]
form <- paste0(varnames, collapse = "+")
model <- plm(as.formula(paste0(names(y), "~", form)), data = x, model = model_type)
summary(model)
}
The error I get is :
panel_fit(dep_var,df1,model_type='within')
Warning messages:
1: In Ops.pseries(y, bX) :
indexes of pseries have same length but not same content: result was assigned first operand's index
Do you know why I got such ? What should I do to solve this problem ?

Unable to build a neural network for regression in Rstudio due to error in match

I am a student trying to build a neural network for regression for the first time to predict my variable Math_G3. However I am unable to do it due to the following error:
Error in x - y : non-conformable arrays
In addition: Warning messages:
1: In cbind(1, act.temp) :
number of rows of result is not a multiple of vector length (arg 1)
2: In cbind(1, act.temp) :
number of rows of result is not a multiple of vector length (arg 1)
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Here is my code for it:
index_train_math<-sample(1:nrow(dat_math),0.6*nrow(dat_math))
#from 1st row to end , put random 70 % AS TRAINING DATA
train_math <- dat_math[index_train_math,]
# training data of math
maxs <-apply(dat_math,2,max)
mins <- apply(dat_math,2,min)
scaled <- as.data.frame(scale(dat_math,center = mins,scale=maxs-mins))
#returns a matrix that needs to be coerced into a data frame
train_math <- scaled[index_train_math,]
test_math <- scaled[-index_train_math,]
n_math <- names(train_math)
f_math <- as.formula(paste("Math_G3 ~ traveltime + studytime + failure + famrel + goout + Dalc + Walc + health + absences + Math_G1 + Math_G2", paste(n_math[!n_math %in% "Math_G3"], collapse = " + ")))
nn <- neuralnet(f_math,data=train_math,hidden=c(5,3),linear.output=T)
Below is a preview of my dataset:
May I know what is wrong with my code and how I can fix it? Thank you!

Fitting and Predicting Arima models in R

The strategy is carried out on a "rolling" basis:
For each day,n, the previous k days of the differenced logarithmic returns of a stock market index are used as a window for fitting an optimal ARIMA models.
#Install relevant packages
install.packages("quantmod")
install.packages("forecast")
#Import the necessary libraries
library(quantmod)
library(forecast)
#Get S&P 500
getSymbols("^GSPC", from = "2000-01-01")
#Compute the daily returns
gspcRet<-(log(Cl(GSPC)))
#Use only the last two years of returns
gspc500<-tail(gspcRet,500)
spReturns<-diff(gspc500)
spReturns[as.character(head(index(Cl(GSPC)),1))] = 0
# Create the forecasts vector to store the predictions
windowLength<- 500
foreLength<-length(spReturns) - windowLength
forecasts <- vector(mode="list", length=foreLength)
fit1 <- vector(mode="list", length=foreLength)
for (d in 0:foreLength) {
# Obtain the S&P500 rolling window for this day
spReturnsOffset<- spReturns[(1+d):(windowLength+d)]
#Searching for the best models
order.matrix<-matrix(0,nrow = 3, ncol = 6 * 2 * 6)
aic.vec<- numeric(6 * 2 * 6)
k<-1
for(p in 0:5) for(d in 0:1) for(q in 0:5){
order.matrix[,k]<-c(p,d,q)
aic.vec[k]<- AIC(Arima( spReturnsOffset, order=c(p,d,q)))
k<-k+1
}
ind<- order(aic.vec,decreasing=F)
aic.vec<- aic.vec[ind]
order.matrix<- order.matrix[,ind]
order.matrix<- t(order.matrix)
result<- cbind(order.matrix,aic.vec)
#colnames(result)<- c("p","d","q","AIC")
p1<- result[1,1]
p2<- result[2,1]
p3<- result[3,1]
p4<- result[4,1]
d1<- result[1,2]
d2<- result[2,2]
d3<- result[3,2]
d4<- result[4,2]
q1<- result[1,3]
q2<- result[2,3]
q3<- result[3,3]
q4<- result[4,3]
#I THINK CODE IS CORRECT TILL HERE PROBLEM IS WITH THE FOLLOWING CODE I GUESS
fit1[d+1]<- Arima(spReturnsOffset, order=c(p1,d1,q1))
forecasts[d+1]<- forecast(fit1,h=1)
#forecasts[d+1]<- unlist(fcast$mean[1])
}
I get the following Error:
Error in x - fits : non-numeric argument to binary operator
In addition: Warning messages:
1: In fit1[d + 1] <- Arima(spReturnsOffset, order = c(p1, d1, q1)) :
number of items to replace is not a multiple of replacement length
2: In mean.default(x, na.rm = TRUE) :
argument is not numeric or logical: returning NA
Can anyone please suggest a fix?

Intro to JAGS analysis

I am a student studying bayesian statistics and have just begun to use JAGS using a intro script written by my lecturer, with us (the students) having to only enter the data and the number of iterations. The following is the script with my data added into it:
setwd("C:\\Users\\JohnSmith\\Downloads")
rawdata = read.table("bwt.txt",header=TRUE)
Birthweight = rawdata$Birthweight
Age = rawdata$Age
model = "model
{
beta0 ~ dnorm(0, 1/1000^2)
beta1 ~ dnorm(0, 1/1000^2)
log_sigma ~ dunif(-10, 10)
sigma <- exp(log_sigma)
for(i in 1:N)
{
mu[i] <- beta0 + beta1 * Age[i]
Birthweight[i] ~ dnorm(mu[i], 1/sigma^2)
}
}
"
data = list(x=Birthweight, y=Age, N=24)
# Variables to monitor
variable_names = c('beta0','beta1')
# How many burn-in steps?
burn_in = 1000
# How many proper steps?
steps = 100000
# Thinning?
thin = 10
# Random number seed
seed = 2693795
# NO NEED TO EDIT PAST HERE!!!
# Just run it all and use the results list.
library('rjags')
# Write model out to file
fileConn=file("model.temp")
writeLines(model, fileConn)
close(fileConn)
if(all(is.na(data)))
{
m = jags.model(file="model.temp", inits=list(.RNG.seed=seed, .RNG.name="base::Mersenne-Twister"))
} else
{
m = jags.model(file="model.temp", data=data, inits=list(.RNG.seed=seed, .RNG.name="base::Mersenne-Twister"))
}
update(m, burn_in)
draw = jags.samples(m, steps, thin=thin, variable.names = variable_names)
# Convert to a list
make_list <- function(draw)
{
results = list()
for(name in names(draw))
{
# Extract "chain 1"
results[[name]] = as.array(draw[[name]][,,1])
# Transpose 2D arrays
if(length(dim(results[[name]])) == 2)
results[[name]] = t(results[[name]])
}
return(results)
}
results = make_list(draw)
However, when I run the following code I get the following error message:
Error in jags.model(file = "model.temp", data = data, inits = list(.RNG.seed = seed, :
RUNTIME ERROR:
Compilation error on line 11.
Unknown parameter Age
In addition: Warning messages:
1: In jags.model(file = "model.temp", data = data, inits = list(.RNG.seed = seed, :
Unused variable "x" in data
2: In jags.model(file = "model.temp", data = data, inits = list(.RNG.seed = seed, :
Unused variable "y" in data
But as far as I can see, line 11 is blank, which leaves me stumped as to where the error is coming from. If anyone can give me some tips as to solve this, it will be greatly appreciated.
The names of the elements of your list of data (data) should match the names of the variables in your model.
You have:
data = list(x=Birthweight, y=Age, N=24)
so JAGS is looking for variables called x and y in your model. However, in your model, you have:
mu[i] <- beta0 + beta1 * Age[i]
Birthweight[i] ~ dnorm(mu[i], 1/sigma^2)
That is, your variables are called Age and Birthweight.
So, either change your list to:
data <- list(Birthweight=Birthweight, Age=Age, N=24)
or change your model to:
mu[i] <- beta0 + beta1 * y[i]
x[i] ~ dnorm(mu[i], 1/sigma^2)
Had you done readLines('model.temp') (or opened model.temp in a text editor), you would have seen that line 11 of that file refers to the line that contains mu[i] <- beta0 + beta1 * Age[i], which is the first error that JAGS encountered due to the reference to Age, for which neither data nor a prior was provided.

Resources