I'm trying to write a function that collects some calls I use often in scripts
I use the sleepstudy data of the lme4 package in my examples
Here's (a simplified version of) the function I started with:
trimModel1 <- function(frm, df) {
require(LMERConvenienceFunctions)
require(lme4)
lm<-lmer(frm,data=df)
lm.trimmed = romr.fnc(lm, df)
df = lm.trimmed$data
# update initial model on trimmed data
lm<-lmer(frm,data=df)
# lm#call$formula<-frm
mcp.fnc(lm)
lm
}
When I call this function like below:
(fm1<-trimModel1(Reaction ~ Days + (Days|Subject),sleepstudy))
The first three lines of the output look like this:
Linear mixed model fit by REML
Formula: frm
Data: df
If I had called the commands of the trimModel1 function in the console the first three lines of the summary of the model look like this:
Linear mixed model fit by REML
Formula: Reaction ~ Days + (Days | Subject)
Data: sleepstudy
The difference is a problem because several packages that use the lme4 package make use of the formula and data fields. For instance the effects package uses these fields and a command like below will not work when I use the trimModel1 function above:
library(effects)
plot(allEffects(fm1))
I looked around on stackoverflow and R discussion groups for a solution and saw that you could change the formula field of the model. If you uncomment the lm#call$formula<-frm line in the trimModel1 function the formula field in the summary is displayed correctly. Unfortunately when I run a function from the effects package now I still get the error:
Error in terms.formula(formula, data = data) :
'data' argument is of the wrong type
This is because the data field is still incorrect.
Another possible solution I found is this function:
trimModel2 <- function(frm, df) {
require(LMERConvenienceFunctions)
require(lme4)
lm<-do.call("lmer",list(frm,data=df))
lm.trimmed = romr.fnc(lm, df)
df = lm.trimmed$data
# update initial model on trimmed data
lm<-do.call("lmer",list(frm,data=df))
mcp.fnc(lm)
lm
}
When I now type the following commands in the console I get no errors:
(fm2<-trimModel2(Reaction ~ Days + (Days|Subject),sleepstudy))
plot(allEffects(fm2))
The allEffects function works but now the problem is that the the summary of the fm2 model displays the raw sleepstudy data. That is not a big problem with the sleepstudy data but with very large datasets sometimes Rstudio crashed when displaying a model.
Does anyone know how to make one (or both) of these functions work correctly?
I think I have to change the fm1#call$data field but I don't know how.
Do it like this:
trimModel1 <- function(frm, df) {
require(LMERConvenienceFunctions)
require(lme4)
dfname <- as.name(deparse(substitute(df)))
lm<-lmer(frm,data=df)
lm.trimmed = romr.fnc(lm, df)
df = lm.trimmed$data
# update initial model on trimmed data
lm<-lmer(frm,data=df)
lm#call$formula <- frm
lm#call$data <- dfname
mcp.fnc(lm)
lm
}
It's the "deparse-substitute trick" to get an object name from the object itself.
Related
I would like to do something with MLFlow but I do not find any solution on Internet. I am working with MLFlow and R, and I want to save a regression model. The thing is that by the time I want to predict the testing data, I want to do some transformation of that data. Then I have:
data <- #some data with numeric regressors and dependent variable called 'y'
# Divide into train and test
ind <- sample(nrow(data), 0.8*nrow(data), replace = FALSE)
dataTrain <- data[ind,]
dataTest <- data[-ind,]
# Run model in the mlflow framework
with(mlflow_start_run(), {
model <- lm(y ~ ., data = dataTrain)
predict_fun <- function(model, data_to_predict){
data_to_predict[,3] <- data_to_predict[,3]/2
data_to_predict[,4] <- data_to_predict[,4] + 1
return(predict(model, data_to_predict))
}
predictor <- crate(~predict_fun(model,dataTest),model)
### Some code to use the predictor to get the predictions and measure the accuracy as a log_metric
##################
##################
##################
mlflow_log_model(predictor,'model')
}
As you can notice, my prediction function not only consists in predict the new data you are evaluating, but it also makes some transformations in the third and fourth columns. All examples I saw on the web use the function predict in the crate as the default function of R.
Once I save this model, when I run it in another notebook with some Test data, I get the error: "predict_fun" doesn't exist. That is because my algorithm has not saved this specific function. Do you know what can I do to save and specific prediction function that I have created instead of the default functions that are in R?
This is not the real example I am working with, but it is an approximation of it. The fact is that I want to save extra functions apart from the model itself.
Thank you very much!
Problem
I've been using a tidy wrapper for the drc package—tidydrc— to build growth curves which produces a tidy version of the normal output (best for ggplot). However, due to the inherit nesting of the models, I can't run simple drc functions since the models are nested inside a dataframe. I've attached code that mirrors drc and tidydrc package below.
Goal
To compare information criteria from multiple model fits for the tidydrc output using the drc function mselect()—ultimately to efficiently select the best fitting model.
Ideal Result (works with drc)
library(tidydrc) # To load the Puromycin data
library(drc)
model_1 <- drm(rate ~ conc, state, data = Puromycin, fct = MM.3())
mselect(model_1, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))
# DESIRED OUTPUT SIMILAR TO THIS
logLik IC Lack of fit Res var
MM.3 -78.10685 170.2137 0.9779485 70.54874 # Best fitting model
LL.3 -78.52648 171.0530 0.9491058 73.17059
W1.3 -79.22592 172.4518 0.8763679 77.75903
W2.4 -77.87330 173.7466 0.9315559 78.34783
W1.4 -78.16193 174.3239 0.8862192 80.33907
LL.5 -77.53835 177.0767 0.7936113 87.80627
baro5 -78.00206 178.0041 0.6357592 91.41919
Not Working Example with tidydrc
library(tidyverse) # tidydrc utilizes tidyverse functions
model_2 <- tidydrc_model(data = Puromycin, conc, rate, state, model = MM.3())
summary(model_2)
Error: summary.vctrs_list_of() not implemented.
Now, I can manually tease apart the list of models in the dataframe model_2 but can't seem to figure out the correct apply statements (it's a mess) to get this working.
Progress Thus Far
These both produce the same error, so at least I've subsetted a level down but now I'm stuck and pretty sure this is not the ideal solution.
mselect(model_2$drmod, list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5()))
model_2_sub <- model_2$drmod # Manually subset the drmod column
apply(model_2_sub, 2, mselect(list(LL.3(), LL.5(), W1.3(), W1.4(), W2.4(), baro5())))
Error in UseMethod("logLik") :
no applicable method for 'logLik' applied to an object of class "list"
I've even tried the tidyverse function unnest() to no avail
model_2_unnest <- model_2 %>% unnest_longer(drmod, indices_include = FALSE)
I have data for three different years and are running a regression for each seperate year with lmList(). When I try to get the LaTex code with stargazer, I get an error saying it doesn't recognize the object type. When running stargazer for a normal linear regression, it works just fine, even though the class for the objects are the same.
This is my regression with lmList
fit <- lmList((lndeltaoms) ~ size + factor(gender)| year, data = tser)
stargazer(fit[["2008"]])
% Error: Unrecognized object type.
Compare this to a normal regression, where it works.
fit2 <- lm((lndeltaoms) ~ size + factor(gender), data=tser)
stargazer(fit2)
But when i compare the classes, they're the same.
class(fit[["2008"]])
[1] "lm"
class(fit2)
[1] "lm"
Since they're the same class, it feels stargazer should recognize both of them in the same way, but there seems to be some issue when extracting a model from the lmList.
Is there any way I can work around this?
It should work fine with lmList() from the nlme package (not the one from the lme4). Try out:
fit1 <- nlme::lmList((lndeltaoms) ~ size + factor(gender)| year, data = tser)
stargazer(fit1[["2008"]]) # ok
fit2 <- lme4::lmList((lndeltaoms) ~ size + factor(gender)| year, data = tser)
stargazer(fit2[["2008"]]) # this does not work
It looks like stargazer() works fine with objects of class lmList but not with lmList4 object resulting from lme4::lmList().
Also, be careful while loading nlme since its function lmList() is masked from lme4::lmList().
rsq <- function(formula, Data1, indices) {
d <- Data1[indices,] # allows boot to select sample
fit <- lm(formula, Data1=d)
return(summary(fit)$r.square)
}
results = boot(data = Data1, statistic = rsq, R = 500)
When I execute the code, I get the following error:
Error in Data1[indices,] : incorrect number of dimensions
Background info: I am creating a predictive model using Linear Regressions. I would like to test my Predictive Model and through some research, I decided to use the Bootstrapping Method.
Credit goes to #Rui Barradas, check comments for original post.
If you read the help page for function boot::boot you will see that the function it calls has first argument data, then indices, then others. So change the order of your function definition to rsq <- function(Data1, indices, formula)
Another problem that I had was that I didn't define the Function.
I am trying to run a simple naive bayes model (trying to redo what I have seen the datacamp course).
I am using the R naivebayes package.
The training dataset is where9am which looks like this:
My first problem is the following... when I have several predictions in a dataframe thursday9am...
... and I use the following code:
locmodel <- naive_bayes(location ~ daytype, data = where9am)
my_pred <- predict(locmodel, thursday9am)
I get a series of <NA> while it works well with the correct prediction if the thursday9am dataframe only contains a single observation.
The second problem is the following: when I use the following code to get the associated probabilities...
locmodel <- naive_bayes(location ~ daytype, data = where9am, type = c("class", "prob"))
predict(locmodel, thursday9am , type = "prob")
... even if I have only one observation in thursday9am, I get a series of <NaN>.
I am not sure what I am doing wrong.