Forecasting timeseries with tslm in R - r

I'm still new to R and am facing a problem i can't seem to resolve.
I would like to forecast my time series data.
I have this year's daily numbers: y, and last year's daily number which I want to use as a predictor.
The numbers show week cycles. I tried this code. (Fake numbers for clarity)
x = rnorm(60,0,1)
y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x
new_x = rnorm(10,0,1)
y <- ts(y,frequency = 7)
fit <- tslm(y ~ trend + season + x)
fcast = forecast.lm(fit, h = 10, newdata = new_x)
I get the error message :
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
variable lengths differ (found for 'x')
In addition: Warning message:
'newdata' had 10 rows but variables found have 60 rows
Any hints on what I did wrong?

From your fit object:
Call:
lm(formula = formula, data = "y", na.action = na.exclude)
Coefficients:
(Intercept) trend season2 season3 season4 season5 season6 season7 x
1.1644029 0.0009672 -1.5575562 -3.6723105 -3.1824001 -1.5658857 0.0789683 0.3053541 9.9233635
The last variable is named x. And the help for forecast.lm says newdata is an optional data.frame. You need to turn new_x into a data.frame, with x as column name.
library(forecast)
x = rnorm(60,0,1)
y = rnorm(60,0 ,1) + 2*cos(2*pi*1:60/7) + 10*x
new_x = rnorm(10,0,1)
y <- ts(y,frequency = 7)
fit <- tslm(y ~ trend + season + x)
# You can directly use `forecast`, as `fit` is an lm object
# and you don't need `h`, as you provide new data.
fcast = forecast(fit, newdata = data.frame(x=new_x))
# Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
# 9.571429 -3.1541222 -4.5886075 -1.719637 -5.37216743 -0.9360771
# 9.714286 12.5962250 11.1367496 14.055700 10.33953926 14.8529108
# 9.857143 10.5924632 9.1480030 12.036924 8.35899443 12.8259321
#10.000000 15.9419378 14.4775444 17.406331 13.67764776 18.2062278
#10.142857 -7.1887433 -8.6444741 -5.733013 -9.43963897 -4.9378477
#10.285714 -9.4133170 -10.8470152 -7.979619 -11.63014523 -7.1964887
#10.428571 2.2702132 0.8331488 3.707278 0.04818005 4.4922464
#10.571429 0.3519401 -1.1037991 1.807679 -1.89896851 2.6028487
#10.714286 -11.8348209 -13.2930857 -10.376556 -14.08963475 -9.5800070
#10.857143 1.0058209 -0.4435763 2.455218 -1.23528154 3.2469233

You could have converted new_x to data.frame and your initial code too would work.
The new_x variable is of type number and needs to have data.frame as a input for forecast.lm.
Regards,
Ganesh Bhat

The error seems to be obvious:
new_data has 10 random variable whereas y & x have 60. Can you update new_data to have 60 random variables and verify that the error does not occur?
Regards,
Ganesh

Related

scoping/non-standard evaluation issue in glm's formula in a function in R

I have a function that computes a table and a model (and more...):
fun <- function(x, y, formula = y ~ x, data = NULL) {
out <- list()
out$tab <- table(x, y)
out$mod <- glm(formula = formula,
family = binomial,
data = data)
out
}
In the formula, I need to use x and y as provided in the function call (e.g. x = DF1$x and y = DF1$y) and variables from another data frame (e.g. a and b from DF2). It fails with my naive function:
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
# Error in eval(predvars, data, env) : object 'y' not found
How can I make glm search x and y from the function environment? I guess this issue is related to non-standard evaluation and/or scoping, but I have no idea how to fix it.
Data for the example:
smp <- function(x = c(TRUE, FALSE),
size = 1e2) {
sample(x = x,
size = size,
replace = TRUE)
}
DF1 <- data.frame(x = smp(),
y = smp())
DF2 <- data.frame(a = smp(x = LETTERS),
b = smp(x = LETTERS))
Why not just add x and y into data in the function?
fun <- function(x, y, formula = y ~ x, data = NULL) {
if(length(x) != length(y) |
length(x) != nrow(data) |
length(y) != nrow(data))stop("x, y and data need to be the same length.\n")
data$x <- x
data$y <- y
out <- list()
out$tab <- table(x, y)
out$mod <- glm(formula = formula,
family = binomial,
data = data)
out
}
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
# $tab
# y
# x FALSE TRUE
# FALSE 27 29
# TRUE 21 23
#
# $mod
# Call: glm(formula = formula, family = binomial, data = data)
#
# Coefficients:
# (Intercept) xTRUE aB aC aD aE aF aG aH aI aJ
# 3.2761 -1.8197 0.3409 -93.9103 -2.0697 20.6813 -41.5963 -1.1078 18.5921 -1.0857 -36.5442
# aK aL aM aN aO aP aQ aR aS aT aU
# -0.5730 -92.5513 -3.0672 22.8989 -53.6200 -0.9450 0.4626 -3.0672 0.3570 -22.8857 1.8867
# aV aW aX aY aZ bB bC bD bE bF bG
# 2.5307 19.5447 -90.5693 -134.0656 -2.5943 -1.2333 20.7726 110.6790 17.1022 -0.5279 -1.2537
# bH bI bJ bK bL bM bN bO bP bQ bR
# -21.7750 114.0199 20.3766 -42.5031 41.1757 -24.3553 -2.0310 -25.9223 -2.9145 51.2537 70.2707
# bS bT bU bV bW bX bY bZ
# -4.7728 -3.7300 -2.0333 -0.3906 -0.5717 -4.0728 0.8155 -4.4021
#
# Degrees of Freedom: 99 Total (i.e. Null); 48 Residual
# Null Deviance: 138.5
# Residual Deviance: 57.73 AIC: 161.7
#
# Warning message:
# glm.fit: fitted probabilities numerically 0 or 1 occurred
#
#DaveArmstrong's answer that was already accepted is correct. This answer explains why there was an error in the original version of the code.
#Thomas quoted the docs in a comment saying
If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.
The word "typically" is key here. The exact rule is that the environment attached to the formula is the one where the formula expression is first evaluated, because ~ is actually a function. It attaches the evaluation environment to the formula object, and that's the one that stays with it as you pass the object around.
If you run glm(y ~ x), the formula is evaluated wherever you call that, so that's the "typical" case.
In your example, you created the formula object when you called
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
That means the global environment (where you made this call) is attached to the formula, and there's no y there, so you got the error.
If you had used the default formula = y ~ x by calling
fun(x = DF1$x,
y = DF1$y,
data = DF2)
with no formula argument, it would work, because default arguments are evaluated in the evaluation frame of the function that uses them. Since fun() has local variables x and y created by the arguments, that would be fine.
You also asked why data = NULL would work in #DaveArmstrong's function. He added x and y to it using
data$x <- x
data$y <- y
If you start with data = NULL, the first line changes it to a list containing x and the second line adds a y component, so you end up with a list containing x and y and that's fine for data in glm().

plotting interaction effects for LASSO models in R

I fitted a lasso logistic model with interaction terms. Then i wanted to visualize those interactions using a interaction plot.
I tried to find some R function that will plot interactions for glmnet models and i couldnt find any .
Is there any R package that will plot interactions for LASSO ?
Since i couldnt find any, i tried to do it manually , by plotting the predicted values. But i am getting some errors.
My code is as follows,
require(ISLR)
require(glmnet)
y <- Smarket$Direction
x <- model.matrix(Direction ~ Lag1 + Lag4* Volume, Smarket)[, -1]
lasso.mod <- cv.glmnet(x, y, alpha=1,family="binomial",nfolds = 5, type.measure="class",
lambda = seq(0.001,0.1,by = 0.001))
lasso.mod$lambda.min
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
lasso.mod1 <- glmnet(x, y, alpha=1,family="binomial",
lambda = lasso.mod$lambda.min)
pred$Direction = predict(lasso.mod1, newx=pred,
type="response", s= lasso.mod$lambda.min)
i am getting this error :
Error in cbind2(1, newx) %*% nbeta :
not-yet-implemented method for <data.frame> %*% <dgCMatrix>
Can any suggest anything to fix this issue ?
Thank you
predict.glmnet says newx must be a matrix. And you need to give interaction value by yourself.
library(dplyr)
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100)) %>%
mutate(`Lag4:Volume` = Lag4 * Volume) # preparing interaction values
pred$Direction = predict(lasso.mod1, newx = as.matrix(pred), # convert to matrix
type = "link", s= lasso.mod$lambda.min)
[EDITED]
Oh, I overlooked more general, better way.
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
pred$Direction = predict(lasso.mod1,
newx = model.matrix( ~ Lag1 + Lag4* Volume, pred)[, -1],
type="response", s= lasso.mod$lambda.min)

R nls Different Errors occur

I'm new in R programming and I don't get a solution to an error which occurs when I use the nls Function.
I try to fit the data from an ecdf (values are extracted and saved in y) to this function model with four parameters:
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1))
When I start the nls Function these error occurs:
Error in numericDeriv(form[[3L]], names(ind), env) :
Fehlender Wert oder etwas Unendliches durch das Modell erzeugt
which means there is a missing value ore some value with infinity is generated through the model.
My vectors Abfluss and y have both the same lengths. Aim is to get the parameter estimation.
Maybe the problem is, that the model only works under this conditions:
c>0, d>0, b<=Abfluss<=a.
I try already the na.rm=True command. Then another error appears:
Error in model.frame.default(formula = ~y + Abfluss, na.rm = TRUE) :
Variablenlängen sind unterschiedlich (gefunden für '(na.rm)')
which means, the Length of variables are different.
I appreciative for every kind of help and advice.
For a better understanding I attach my whole code with whole data:
time<-c(1851:2013)
Abfluss<- c(4853,4214,5803,3430,4645,4485,3100,4797,4030,3590,5396,9864,3683,4485,4064,3420,5396,
4895,3931,4238,3790,3520,4263,5474,3790,4700,5109,4525,4007,6340,4993,6903,8160,3600,3480,3540,
3540,4565,3333,7764,
4755,7940,3112,3169,4435,5365,9422,3150,10500,4512,3790,4618,6126,3769,3704,
5938,5669,4552,5458,5854,4867,6057,4783,5753,5736,4618,6091,5820,5007,7984, 4435,
4645,7465,5820,5988,6022,4300,6062,3302,4877,4586,5275,4410,3174,4966,4939,4638,
5541,5760,6495,5435,4952,4912,6092,5182,5820,5129,6436,6648,3063,5550,5160,4400,
9600,6400,6380,6300,6180,6899,4360,5550,4580,3894,5277,7520,6780,5100,5430,4550,
6620,4050,4560,5290,6610,8560,4943,6940,4744,6650,5700,7440,6200,4597,3697,7300,
4644,5456,6302,3741,5398,9500,6296,5279,5923,6412,6559,6559,5891,5737,5010,5790,
10300,4150,4870,6740,7560,8010,5120,8170,7430, 7330,5900, 11150)
#EV4-Distribution
dEV4 <- function(x, a, b, c,d) {
m<-exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d))
return(m)
}
#Simulation example
Sim<-dEV4(Abfluss,a=max(Abfluss),b=min(Abfluss), c=3, d=1)
dEV4cdf<-cbind(Abfluss,Sim)
#Empirical cdf
p = ecdf(Abfluss)
y<- p(Abfluss) #Extracting of cumulated probabilities
m<-cbind(Abfluss,y)
#plot EV4 and ecdf
plot(dEV4cdf, type="p",main="EV4")
plot(ecdf(Abfluss), add=T)
#Fitting EV4 nls
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1), na.rm=TRUE)
Do not use starting values that are on the boundary of the feasible region and try nlxb in nlmrt instead (which can be used with the same arguments except data = is not optional):
library(nlmrt)
fitsim <- nlxb(y ~ exp(-(((a - Abfluss) / (c * (Abfluss - b))) ^ d)),
data = data.frame(y, Abfluss),
start = list(a = 2 * max(Abfluss), b = min(Abfluss) / 2, c = 3, d = 1))
plot(y ~ Abfluss, pch = 20)
o <- order(Abfluss)
fit <- y - fitsim$resid
lines(fit[o] ~ Abfluss[o], col = "red")
giving:
nlmrt class object: x
residual sumsquares = 0.02908 on 163 observations
after 5001 Jacobian and 6060 function evaluations
name coeff SE tstat pval gradient JSingval
a 20047.7 NA NA NA 1.119e-07 3251
b -1175384 NA NA NA 1.432e-09 0.1775
c 0.0129414 NA NA NA -0.1296 5.808e-06
d 12.146 NA NA NA -2.097e-06 6.798e-11

correlation in multivariate mixed model in r

I am running multivariate mixed model in R by using nlme package. Suppose that x and y are responses variables for longitudinal data which assumed that the error within group is correlated. The residual error matrix is presented as:
So my question is how to involve the correlation into lme function?
I tried commands corr = corComSymm(from =~ 1 | x) or corr = corAR1(from =~ 1 | x) but did not work!
here en example:
# visiting time by months
time = rep(c(0,3,6,9),time = 4, 200)
# subjects
subject = rep(1:50, each = 4)
# first response variable "identity"
x = c(rep(0, 100), rep(1,100))
# second response variable "identity"
y = c(rep(1, 100), rep(0,100))
# values of both reponses variables (x_1, x_2)
value = c(rnorm(100,20,1),rnorm(100,48,1))
# variables refer to reponses variables (x_1, x_2)
variable = factor(c(rep(0,150),rep(1,50)), label=c("X","Y"))
df = data.frame(subject , time, x,y,value, variable)
library(nlme)
# fit the model that each response variable has intercept and slope (time) for each random and fixed effects
# as well as fixed effects slopes for sex and lesion, and each response has different variance
f= lme(value ~ -1 + x + y + x:time + y:time , random = ~ -1 + (x + y) + time:( x + y)|subject ,
weights = varIdent(form=~1| x),corr = corAR1(from = ~ 1|x), control=lmeControl(opt="optim"), data =df)
Error in corAR1(from = ~1 | x) : unused argument (from = ~1 | x)
Any suggestions?
I found this website (below) which helpful and useful, I posted here in case someone might has this problem in future.
https://rpubs.com/bbolker/3336

R variable not found, but specifically defined

I have written a function to run phylogenetic generalized least squares, and everything looks like it should work fine, but for some reason, a specific variable which is defined in the script (W) keeps coming up as undefined. I have stared at this code for hours and cannot figure out where the problem is.
Any ideas?
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(0, 4), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(nlme)
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)
returns "Error in eval(expr, envir, enclos) : object 'W' not found"
even though W is specifically defined!
Thanks!
The error's occuring in the gls calls in myou and corMatrins2: you have to pass in W as a column in dat because gls is looking for it there (when you put weights = ~W as a formula like that it looks for dat$W and can't find it).
Just change data=dat to data=cbind(dat,W=W) in both functions.
The example is not reproducible for me, as lowerB and upperB are not defined, however, perhaps the following will work for you, cbinding dat with W:
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(lowerB, upperB), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)

Resources