R - Saving Variable to dataframe from own function - r

again I'm stuck...
I want to write a function to get several statistics for checking the assumptions for a linear regression. The function I'm quoting is not yet done, but I think you'll get the point:
check.regression <- function(regmodel, dataframe, resplots = TRUE,
durbin = TRUE, savecheck = TRUE) {
print(dwt(regmodel)) # Durbin-Watson-Test
dataframe$stand.res <- rstandard(regmodel) # Saving Standardized Residuals
}
As you see, I want to save the standardized residuals of the model into the given dataframe.
regmodel refers to the model computed by the linear regression lm( y~x) and dataframe is the name of the dataframe from which the regression model is computed.
The problem is: nothing is saved within my function. If I do the command without the function, the residuals are properly saved into my dataframe.
I guess, there has to be something like
save(dataframe$stand.res <- rstandard(regmodel))
as I also have to specify plotting or writing things to the console within a function, but I don't know how that command might be.
Any ideas?

R uses pass-by-value so what is sent to the function is a copy of your data.frame. (sort of, passing on some details.)
So when you call the function, you need to 1) return the modified data.frame and 2) assign it or you will lose the results.
check.regression <- function(regmodel, dataframe, resplots = TRUE,
durbin = TRUE, savecheck = TRUE) {
print(dwt(regmodel)) # Durbin-Watson-Test
dataframe$stand.res <- rstandard(regmodel) # Saving Standardized Residuals
return(dataframe)
}
dataframe <- check.regression(regmodel, dataframe)

Related

Save customized function inside function in MLFlow log_model

I would like to do something with MLFlow but I do not find any solution on Internet. I am working with MLFlow and R, and I want to save a regression model. The thing is that by the time I want to predict the testing data, I want to do some transformation of that data. Then I have:
data <- #some data with numeric regressors and dependent variable called 'y'
# Divide into train and test
ind <- sample(nrow(data), 0.8*nrow(data), replace = FALSE)
dataTrain <- data[ind,]
dataTest <- data[-ind,]
# Run model in the mlflow framework
with(mlflow_start_run(), {
model <- lm(y ~ ., data = dataTrain)
predict_fun <- function(model, data_to_predict){
data_to_predict[,3] <- data_to_predict[,3]/2
data_to_predict[,4] <- data_to_predict[,4] + 1
return(predict(model, data_to_predict))
}
predictor <- crate(~predict_fun(model,dataTest),model)
### Some code to use the predictor to get the predictions and measure the accuracy as a log_metric
##################
##################
##################
mlflow_log_model(predictor,'model')
}
As you can notice, my prediction function not only consists in predict the new data you are evaluating, but it also makes some transformations in the third and fourth columns. All examples I saw on the web use the function predict in the crate as the default function of R.
Once I save this model, when I run it in another notebook with some Test data, I get the error: "predict_fun" doesn't exist. That is because my algorithm has not saved this specific function. Do you know what can I do to save and specific prediction function that I have created instead of the default functions that are in R?
This is not the real example I am working with, but it is an approximation of it. The fact is that I want to save extra functions apart from the model itself.
Thank you very much!

Bayesian Modelling in R

I am trying to implement a bayesian model in R using bas package with setting up these values for my Model:
databas <- bas.lm(at_areabuilding ~ ., data = dataCOMMA, method = "MCMC", prior = "ZS-null", modelprior = uniform())
I am trying to predict area of a given state with the help of certain area present for that particular state; but for different zip codes. My Model basically finds the various zip codes present in the data for a given state(using a state index for this) and then gives the output.
Now, Whenever I try to predict area of a state, I give this input:
> UT <- data.frame(zip = 84321, loc_st_prov_cd = "UT" ,state_idx = 7)
> predict_1 <- predict(databas,UT, estimator="BMA", interval = "predict", se.fit=TRUE)
> data.frame('state' = 'UT','estimated area' = predict_1$Ybma)
Now, I get the output for this state.
Suppose I have a list of states with given zip codes and I want to run my Model (databas) on that list and get the predictions, I cannot do it by using the above approach as it will take time. Is there any other way to do the same?
I did the same by the help of one gentleman and here is my code:
pred <- sapply(1:nrow(first), function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })
basdata: My Model
first: my new dataset for which I am predicting area.
Now, The issue that i am facing is that the code is taking a long time to predict the values. It iterates over every row and calculates the area. There are 150000 rows in my dataset and I would request if anyone can help me optimizing the performance of this code.
Something like this will iterate over each row of your data frame of states, zips and indices (let's call it states_and_zips) and return a list of predictions. Each element of this list (which I've called pred) goes with the corresponding row of state_and_zips:
pred = lapply(1:nrow(states_and_zips), function(row) {
predict(databas, ~ states_and_zips[row, ],
estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma
})
If Ybma is a single value, then use sapply instead of lapply and it will return a vector of predictions, one for each row of state_and_zips that you can just add as a new column to states_and_zips.

R Passing linear model to another function inside a function

I am trying to find the optimal "lambda" parameter for the Box-Cox transformation.
I am using the implementation from the MASS package, so I only need to create the model and extract the lambda.
Here is the code for the function:
library(MASS)
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))
It gives the following error:
Error in is.data.frame(data) : object 'my_tmp' not found
The interesting thing is that the very same code is working outside the function. In other words, for some reason, the boxcox function from the MASS package is looking for the variable in the global environment.
I don't really understand, what exactly is going on... Do you have any ideas?
P.S. I do not provide a software/hardware specification, since this error was sucessfully replicated on a number of my friends' laptops.
P.P.S. I have found the way to solve the initial problem in the forecast package, but I still would like to know, why this code is not working.
Sometimes user contributed packages don't always do a great job tracking the environments where calls were executed when manipulating functions calls. The quickest fix for you would be to change the line from
the_lm <- lm(x ~ 1, data = my_tmp)
to
the_lm <- lm(x ~ 1, data = my_tmp, y=True, qr=True)
Because if the y and qr are not requested from the lm call, the boxcox function tries to re-run lm with those parameters via an update call and things get mucked up inside a function scope.
Why don't let box-cox do the fitting?
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
out <- boxcox(x ~ 1, data = my_tmp, plotit=FALSE) # Gives the error
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
I think your scoping issue is with update.default which calls eval(call, parent.frame()) and my_tmp doesn't exist in the boxcox environment. Please correct me if I'm wrong on this.
boxcox cannot find your data. This maybe because of some scoping issue.
You can feed data in to boxcox function.
find_lambda <- function(x) {
# Function to find the best lambda for the Box-Cox transform
my_tmp <- data.frame(x = x) # Create a temporary data frame, to use it with the lm
str(my_tmp) # Gives the expected output
the_lm <- lm(x ~ 1, data = my_tmp) # Creates the linear model, no error here
print(summary(the_lm)) # Prints the summary, as expected
out <- boxcox(the_lm, plotit=FALSE, data = my_tmp) # feed data in here
best_lambda <- out$x[which.max(out$y)] # Extracting the best fitting lambda
return(best_lambda)
}
find_lambda(runif(100))

For loops regression in R

I'm fitting GARCH model to the residuals of and ARIMA, and trying to apply ARCH(p) for p from 1 to 10 to compare the fitness. Here is my code. Errors are returned in the for loop part but I cannot figure out the reason why. Could anyone give some tips?
So for the single value p=1 the codes are as below and it's no problem.
fitone<- garchFit(~garch(1,0),data=logprice)
coef(fitone)
summary(fitone)
And for the for loop my codes go like
for (n in 1:10) {
fit [[n]]<- garchFit(~garch(n,0),data=logprice)
coef(fit[[n]])
summary(fit[[n]])
}
Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) :
Formula and data units do not match.
I never wrote a loop code before. Can someone help me with the codes?
The problem is that generally one tries to evaluate all the variables in a formula in the context of the data= parameter, but your n variable isn't coming from logprice, it's coming from the global environment. You will need to dynamically create the formula. Here's one way to run all the models with lapply rather than a for look would be
library(fGarch)
#sample data
x.vec = as.vector(garchSim(garchSpec(rseed = 1985), n = 200)[,1])
fits <- lapply(1:10, function(n) {
garchFit(bquote(~garch(.(n),0)), data = x.vec, trace = FALSE)
})
and then we can get the coefs with
lapply(fits, coef)

How to make a list of lmer model objects to use in a for loop in R?

I'm trying to write a for loop in R (my first!) in order to produce and save diagnostic plots of several mixed effects models fitted using the function lmer in the package lme4. This is what I've done so far exemplified with the sleepstudy data:
require(lme4)
mod1<-lmer(Reaction ~ Days + (1|Subject),sleepstudy)
mod2<-lmer(Reaction ~ 1 + (1|Subject),sleepstudy)
List<-c(mod1,mod2)
names<-c("mod1","mod2")
i=1
for (i in 1:length(List)) {
jpeg(file = paste("modelval_", names[i], ".jpg", sep=""))
par(mfrow=c(2,2))
plot(resid(List[i]) ~ fitted(List[i]),main="residual plot")
abline(h=0)
qqnorm(resid(List[i]), main="Q-Q plot of residuals")
qqnorm(ranef(List[i])$Subject$"(Intercept)", main="Q-Q plot of random effect" )
dev.off()
}
I get the following error message when typing into R consol:
Error in function (formula, data = NULL, subset = NULL, na.action = na.fail, :
invalid type (NULL) for variable 'resid(list[i])'
I've got a feeling the problem is related to the list of models I've created and not the for loop itself and I think it might be related to the model objects being of class S4. Is it possible to make such a list?
I've also tried to make the list like below, with no improvements (still get the same error message)
List<-list(mod1,mod2)
First using c can risk losing the class structure of the objects you've created. To make a list containing your models, use list(mod1, mod2).
Second, List[i] is a list of length 1 containing the i'th element of List. Use List[[i]] to extract the element itself (your model).

Resources