For loops regression in R - r

I'm fitting GARCH model to the residuals of and ARIMA, and trying to apply ARCH(p) for p from 1 to 10 to compare the fitness. Here is my code. Errors are returned in the for loop part but I cannot figure out the reason why. Could anyone give some tips?
So for the single value p=1 the codes are as below and it's no problem.
fitone<- garchFit(~garch(1,0),data=logprice)
coef(fitone)
summary(fitone)
And for the for loop my codes go like
for (n in 1:10) {
fit [[n]]<- garchFit(~garch(n,0),data=logprice)
coef(fit[[n]])
summary(fit[[n]])
}
Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) :
Formula and data units do not match.
I never wrote a loop code before. Can someone help me with the codes?

The problem is that generally one tries to evaluate all the variables in a formula in the context of the data= parameter, but your n variable isn't coming from logprice, it's coming from the global environment. You will need to dynamically create the formula. Here's one way to run all the models with lapply rather than a for look would be
library(fGarch)
#sample data
x.vec = as.vector(garchSim(garchSpec(rseed = 1985), n = 200)[,1])
fits <- lapply(1:10, function(n) {
garchFit(bquote(~garch(.(n),0)), data = x.vec, trace = FALSE)
})
and then we can get the coefs with
lapply(fits, coef)

Related

for loop in train(caret) to select different predictors in a lm model

I'm just a beginner, so i hope you can help with a problem due the KNN model (via train in caret) in R.
I tried this:
models.list = as.list(vector(length = ncol(FIFA21_db)))
for(i in 1:ncol(mtcars)) {
models.list[[i]] <- train(x = mtcars[,i], y = mtcars[,1], method = "lm")
}
This cause the error " Please use column names for x": Do you know how i can use the column names instead of observations in a for loop? My goal is to use different variables for a lm regression.

Predicting data from gamlss model in handler function using tryCatch in R

I am having a problem using the tryCatch() function in R in a function I created.
What I want to do is this:
simulate data based on model results
analyze simulated data using my gamlss model
use the predict function to extract model predictions over a new range of values
store these predictions in a data frame
do this many times
My main problem is that my model is somewhat unstable and once in a while predictions are kind of wild, which in turn generates an error when I try to analyze it with gamlss. My objective is to write a tryCatch statement within my simulation function and to basically simply run the simulation/prediction code a second time in the event that an error occurs. (I know this is not optimal, I could also write it in a recursive statement using repeat for example and run it until I don't get an error but I get few enough errors that the probability of getting two in a row is quite low, and I'm having enough troube with this task as it is.)
So I simplified my code as much as I could and created a dummy dataframe for which the modelling still works.
I wrote in the code where I believe the error is (with the predict function which does not find the mod_sim object). It is likely there since the cat just above this line prints while the one just below doesn't print.
I think there are some things about how tryCatch works that I don't understand well enough and I'm having a hard time to understand which objects are kept in which parts of functions and when they can be called or not...
Here is the code I have so far. The error occurs at l.84 (identified in the script). The data and code can be found here.
library(tidyverse)
library(gamlss)
library(gamlss.dist)
#Load data
load('DHT.RData')
#Run original model
mod_pred<-gamlss(harvest_total ~ ct,
data = DHT,
family = DPO)
#Function to compute predictions based on model
compute_CI_trad_gamlss<-function(n.sims=200, mod){#,
#DF for simulations
df_sims<-as.data.frame(DHT)
#Dateframe with new data to predict over
new.data.ct<<-expand.grid(ct=seq(from=5, to=32, length.out=50))
#matrix to store predictions
preds.sim.trad.ct <<- matrix(NA, nrow=nrow(new.data.ct), ncol=n.sims)
#Number of obs to simulate
n<-nrow(df_sims)
#Simulation loop (simulate, analyze, predict, write result)
for(i in 1:n.sims){
#Put in tryCatch to deal with potential error on first run
tryCatch({
#Create matrix to store results of simulation
y<-matrix(NA,n,1)
#in DF for simulations, create empty row to be filled by simulated data
df_sims$sim_harvest<-NA
#Loop to simulate observations
for(t in 1:n){
#Simulate data based on model parameters
y[t]<-rDPO(n=1, mu=mod$mu.fv[t], sigma = mod$sigma.fv[t])
}#enf of simulation loop
#Here I want the result of the simulation loop to be pasted in the df_sims dataset
df_sims$sim_harvest<-y
#Analysis of simulated data
mod_sim<-gamlss(sim_harvest ~ ct,
data = df_sims,
family = DPO)
#Refit the model if convergence not attained
if(mod_sim$converged==T){
#If converged do nothing
} else {
#If not converged refit model
mod_sim<-refit(mod_sim)
}
cat('we make it to here!\n')
#Store results in object
ct <<-as.vector(predict(mod_sim, newdata = new.data.ct, type='response'))
cat('but not to here :( \n')
#If we made it down here, register err as '0' to be used in the if statement in the 'finally' code
err<<-0
},
#If error register the error and write it!
error = function(e) {
#If error occured, show it
cat('error at',i,'\n')
#Register err as 1 to be used in the if statement in the finally code below
err<<-1
},
finally = {
if(err==0){
#if no error, do nothing and keep going outside of tryCatch
}#End if err==0
else if (err==1){
#If error, re-simulate data and do the analysis again
y<-matrix(NA,n,1)
df_sims$sim_harvest<-NA
#Loop to simulate observations
for(t in 1:n){
#Simuler les données basées sur les résultats du modèle
y[t]<-rDPO(n=1, mu=mod$mu.fv[t], sigma = mod$sigma.fv[t])
}#enf of simulation loop
#Here I want the result of the simulation loop to be pasted in the df_sims dataset
df_sims$sim_harvest<-y
#Analysis of simulated data
mod_sim<-gamlss(sim_harvest ~ ct,
data = df_sims,
family = DPO)
cat('we also make it here \n')
#Store results in object
ct <<-as.vector(predict(mod_sim, newdata = new.data.ct, type='response'))
cat('but not here... \n')
}#End if err==1,
}#End finally
)#End tryCatch
#Write predictions for this iteration to the DF and start over
preds.sim.trad.ct[,i] <<-ct
#Show iteration number
cat(i,'\n')
}
#Do some more stuff here
#Return results
return(preds = list(ct= list(predictions=preds.sim.trad.ct)))
}
#Run simulation and store object
result<-compute_CI_trad_gamlss(n.sims=20, mod=mod_pred)
Anyway I hope someone can help!
Thanks a lot!
So after a bit of trial and error I managed to make it work. I believe the problem lies in the mod_sim object that is not saved to the global environment. predict (or predict.gamlss here) is probably not looking in the function environment for the mod_sim object although I don't understand why it wouldn't. Anyway using <<- (i.e. assigning the object in the global environment from the function) for every object created in the function seemed to do the trick. If anyone has an explanation on why this happens though I'd be glad to understand what I'm doing wrong!

Scoping with formulae in coxph objects

I'm trying to write a set of functions where the first function fits a cox model (via coxph in the survival package in R), and the second function gets estimated survival for a new dataset, given the fitted model object from the first function. I'm running into some sort of scoping issue that I don't quite know how to solve without substantially re-factoring my code (the only way I could think to do it would be much less general and much harder to read).
I have a very similar set of functions that are based on the glm function that do not run into the same issue and give me the answers I would expect. I've included a short worked example below that demonstrates the issue. The glue.cox and glue.glm are functions that have the basic functionality I am trying to get. glue.glm works as expected (yielding the same values from a calculation in the global environment), but the glue.cox complains that it can't find the data that was used to fit the cox model and ends with an error. I don't understand how to do this with substitute but I suspect that is the way forward. I've hit a wall with experimenting.
library(survival)
data.global = data.frame(time=runif(20), x=runif(20))
newdata.global = data.frame(x=c(0,1))
f1 = Surv(time) ~ x # this is the part that messes it up!!!!! Surv gets eval
f2 = time ~ x # this is the part that messes it up!!!!! Surv gets eval
myfit.cox.global = coxph(f1, data=data.global)
myfit.glm.global = glm(f2, data=data.global)
myfit.glm.global2 = glm(time ~ x, data=data.global)
myfit.cox <- function(f, dat.local){
coxph(f, data=dat.local)
}
myfit.glm <- function(f, dat.local){
glm(f, data=dat.local)
}
mypredict.cox <- function(ft, dat.local){
newdata = data.frame(x=c(0,1))
tail(survfit(ft, newdata)$surv, 1)
}
mypredict.glm <- function(ft, dat.local){
newdata = data.frame(x=c(0,1))
predict(ft, newdata)
}
glue.cox <- function(f, dat.local){
fit = myfit.cox(f, dat.local)
mypredict.cox(fit, dat.local)
}
glue.glm <- function(f, dat.local){
fit = myfit.glm(f, dat.local)
mypredict.glm(fit, dat.local)
}
# these numbers are the goal for non-survival data
predict(myfit.glm.global, newdata = newdata.global)
0.5950440 0.4542248
glue.glm(f2, data.global)
0.5950440 0.4542248 # this works
# these numbers are the goal for survival data
tail(survfit(myfit.cox.global, newdata = newdata.global)$surv, 1)
[20,] 0.02300798 0.03106081
glue.cox(f1, data.global)
Error in eval(predvars, data, env) : object 'dat.local' not found
This appears to work, at least in the narrow sense of making glue.cox() work as desired:
myfit.cox <- function(f, dat.local){
environment(f) <- list2env(list(dat.local=dat.local))
coxph(f, data=dat.local)
}
The trick here is that most R modeling/model-processing functions look for data in the environment associated with the formula.
I don't know why glue.glm works without doing more digging, except for the general statement that [g]lm objects store more of the information needed for downstream processing internally (e.g. in the $qr element) than other model types.

Fitting Step functions

AIM: The aim here was to find a suitable fit, using step functions, which uses age to describe wage, in the Wage dataset in the library ISLR.
PLAN:
To find a suitable fit, I'll try multiple fits, which will have different cut points. I'll use the glm() function (of the boot library) for the fitting purpose. In order to check which fit is the best, I'll use the cv.glm() function to perform cross-validation over the fitted model.
PROBLEM:
In order to do so, I did the following:
all.cvs = rep(NA, 10)
for (i in 2:10) {
lm.fit = glm(wage~cut(Wage$age,i), data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
But this gives an error:
Error in model.frame.default(formula = wage ~ cut(Wage$age, i), data =
list( : variable lengths differ (found for 'cut(Wage$age, i)')
Whereas, when I run the code given below, it runs.(It can be found here)
all.cvs = rep(NA, 10)
for (i in 2:10) {
Wage$age.cut = cut(Wage$age, i)
lm.fit = glm(wage~age.cut, data=Wage)
all.cvs[i] = cv.glm(Wage, lm.fit, K=10)$delta[2]
}
Hypotheses and Results:
Well, it might be possible that cut() and glm() might not work together. But this works:
glm(wage~cut(age,4),data=Wage)
Question:
So, basically we're using the cut() function, saving it's results in a variable, then using that variable in the glm() function. But we can't put the cut function inside the glm() function. And that too, only if the code is in a loop.
So, why is the first version of the code not working?
This is confusing. Any help appreciated.

R - Saving Variable to dataframe from own function

again I'm stuck...
I want to write a function to get several statistics for checking the assumptions for a linear regression. The function I'm quoting is not yet done, but I think you'll get the point:
check.regression <- function(regmodel, dataframe, resplots = TRUE,
durbin = TRUE, savecheck = TRUE) {
print(dwt(regmodel)) # Durbin-Watson-Test
dataframe$stand.res <- rstandard(regmodel) # Saving Standardized Residuals
}
As you see, I want to save the standardized residuals of the model into the given dataframe.
regmodel refers to the model computed by the linear regression lm( y~x) and dataframe is the name of the dataframe from which the regression model is computed.
The problem is: nothing is saved within my function. If I do the command without the function, the residuals are properly saved into my dataframe.
I guess, there has to be something like
save(dataframe$stand.res <- rstandard(regmodel))
as I also have to specify plotting or writing things to the console within a function, but I don't know how that command might be.
Any ideas?
R uses pass-by-value so what is sent to the function is a copy of your data.frame. (sort of, passing on some details.)
So when you call the function, you need to 1) return the modified data.frame and 2) assign it or you will lose the results.
check.regression <- function(regmodel, dataframe, resplots = TRUE,
durbin = TRUE, savecheck = TRUE) {
print(dwt(regmodel)) # Durbin-Watson-Test
dataframe$stand.res <- rstandard(regmodel) # Saving Standardized Residuals
return(dataframe)
}
dataframe <- check.regression(regmodel, dataframe)

Resources