Linear regression with multiple lag independent variables - r

I am trying to undertake a linear regression on multiple lagged independent variables. I am trying to automate the part where specifying the number of lags i.e. 1,3,5,etc. would automatically update the code below and provide results with lags defined in a previous step.
My code without any 'lag' automated operation is as follows. In this instance, i have specified 2 lags :
base::summary(stats::lm(ABX_2000$Returns ~ stats::lag(as.ts(ABX_2000$Returns),1) +
stats::lag(as.ts(ABX_2000$Returns),2)))
This code works!
I defined a function as follows::
# function to accept multiple lags
lm_lags_multiple <- function(ds,lags=2){
base::summary(stats::lm(ds ~ paste0("stats::lag(as.ts(ds,k=(", 1:lags, ")))", collapse = " + ")))
}
# run function
lm_lags_multiple(ds=ABX_2000$Returns,lags=2)
On running the above function, i receive an Error message noting:
Variable lengths are different.
I don't know how to solve this Error? Is there a lambda function equivalent in R as in Python?

Let's try this code:
lm_lags_multiple <- function(ds,lags=2){
lst <- list()
for (i in 1:lags){
lst[i] <- paste0("stats::lag(as.ts(ABX_2000$Returns),",i,")")
}
base::summary(stats::lm(as.formula(paste0("ds ~",paste(Reduce(c,lst), collapse = "+")))))
}
Pls don't forget to let us know if it worked :)

Related

R: How to fit gamlss in a foor loop with a variable (character)

I have a tricky problem. I have a dataframe with more than 1000 variables and want to fit each variable to age using fp smoothing function. I know how to use gamlss() for a specific variable (vari), but that's not practical to repeat this explicitly for more than 1000 times. Moreover, I want to plot the fitting for all 1000 variable in a single figure. What I did is:
variables <- colnames(data)[7:dim(data)[2]]
for(vari in variables) {
print("ROI is:")
print(vari)
model_fem <- gamlss(vari ~ fp(age), family=GG, data=females)
But I got errors:
Error in model.frame.default(formula = vari ~ fp(age), data = females) :
variable lengths differ (found for 'fp(age)')
I think the tricky part is from fp(). I have tried to use as.formula, it didn't work. Also because females$vari return NULL, that's why we got this error.
Do you have any solution for this?
Thank you
Character values are very different from formuals. Formulas contain symbols and you need to properly rebuild them to make them dynamic. There are lots of different ways to do that, but here's one that uses reformulate to turn characters into formulas and update() to modify a base formula.
variables <- colnames(data)[7:dim(data)[2]]
form_resp <- ~ fp(age)
for(vari in variables) {
print("ROI is:")
form_model <- update(form, reformulate(".", response=vari))
print(form_model)
model_fem <- gamlss(form_model, family=GG, data=females)
}

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

R Passing linear model to another function inside a function

I am trying to find the optimal "lambda" parameter for the Box-Cox transformation.
I am using the implementation from the MASS package, so I only need to create the model and extract the lambda.
Here is the code for the function:
library(MASS)
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))
It gives the following error:
Error in is.data.frame(data) : object 'my_tmp' not found
The interesting thing is that the very same code is working outside the function. In other words, for some reason, the boxcox function from the MASS package is looking for the variable in the global environment.
I don't really understand, what exactly is going on... Do you have any ideas?
P.S. I do not provide a software/hardware specification, since this error was sucessfully replicated on a number of my friends' laptops.
P.P.S. I have found the way to solve the initial problem in the forecast package, but I still would like to know, why this code is not working.
Sometimes user contributed packages don't always do a great job tracking the environments where calls were executed when manipulating functions calls. The quickest fix for you would be to change the line from
the_lm <- lm(x ~ 1, data = my_tmp)
to
the_lm <- lm(x ~ 1, data = my_tmp, y=True, qr=True)
Because if the y and qr are not requested from the lm call, the boxcox function tries to re-run lm with those parameters via an update call and things get mucked up inside a function scope.
Why don't let box-cox do the fitting?
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
out <- boxcox(x ~ 1, data = my_tmp, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
I think your scoping issue is with update.default which calls eval(call, parent.frame()) and my_tmp doesn't exist in the boxcox environment. Please correct me if I'm wrong on this.
boxcox cannot find your data. This maybe because of some scoping issue.
You can feed data in to boxcox function.
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE, data = my_tmp) # feed data in here
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))

How to use a string as a formula in r

I'm trying to do an ANOVA of all of my data frame columns against time_of_day which is a factor. The rest of my columns are all doubles and of equal length.
x = 0
pdf("Time_of_Day.pdf")
for (i in names(data_in)){
if(x > 9){
test <- aov(paste(i, "~ time_of_day"), data = data_in)
}
x = x+1
}
dev.off()
Running this code gives me this error:
Error: $ operator is invalid for atomic vectors
Where is my code calling $? How can I fix this? Sorry, I'm new to r and am quite lost.
My research question is to see if time of day has an affect on brain volume at different ROIs in the brain. Time of day is divided into three categories of morning, afternoon or night.
Edit: SOLVED
treating the string as a formula will allow this to run although I have been advised to not have this many independent values as it will inflate the statistical results of the model. I am not removing this incase someone has a similar problem with the aov() call.
x = 0
pdf("Time_of_Day.pdf")
for (i in names(data_in)){
if(x > 9){
test <- aov(as.formula(paste(i, "~ time_of_day")), data = data_in)
}
x = x+1
}
dev.off()
I guess your problem is that you don't have an ANOVA formula integrated into your aov() function. See the following working example:
data_in <- data.frame(c(1,2,3),c(4,5,6),c(7,8,9))
names(data_in) <- c("first","second","third")
for (i in seq_along(names(data_in))){
test <- aov(data_in$first ~ data_in$second, data = data_in)
print(summary(test))
}
However, it seems that you tried to calculate an ANOVA for each column, whereas you need at least two variables. That is, a nominal scaled condition variable and an interval scaled dependent variable (e.g. gender and weight). So I'm generally wondering if an ANOVA is the correct method for your question. Anyways, in order to answer this question, sample data and a summary of your research question would be needed.

For loops regression in R

I'm fitting GARCH model to the residuals of and ARIMA, and trying to apply ARCH(p) for p from 1 to 10 to compare the fitness. Here is my code. Errors are returned in the for loop part but I cannot figure out the reason why. Could anyone give some tips?
So for the single value p=1 the codes are as below and it's no problem.
fitone<- garchFit(~garch(1,0),data=logprice)
coef(fitone)
summary(fitone)
And for the for loop my codes go like
for (n in 1:10) {
fit [[n]]<- garchFit(~garch(n,0),data=logprice)
coef(fit[[n]])
summary(fit[[n]])
}
Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) :
Formula and data units do not match.
I never wrote a loop code before. Can someone help me with the codes?
The problem is that generally one tries to evaluate all the variables in a formula in the context of the data= parameter, but your n variable isn't coming from logprice, it's coming from the global environment. You will need to dynamically create the formula. Here's one way to run all the models with lapply rather than a for look would be
library(fGarch)
#sample data
x.vec = as.vector(garchSim(garchSpec(rseed = 1985), n = 200)[,1])
fits <- lapply(1:10, function(n) {
garchFit(bquote(~garch(.(n),0)), data = x.vec, trace = FALSE)
})
and then we can get the coefs with
lapply(fits, coef)

Resources