GLM Family using tidymodels

GLM Family using tidymodels - r

I am trying to use the tidymodels package for a GLM and want to use the Gamma or Poisson distribution.
Using glm I would use something like the following
# using glm
mdl <- glm(data = data, y ~ x, family = Gamma(link = "inverse"))
mdl <- glm(data = data, y ~ x, family = poisson(link = "log"))
# using glmnet
library(glmnet)
mdl <- glmnet(data$x, data$y, family = Gamma(link = "inverse"))
mdl <- glmnet(data$x, data$y, family = poisson(link = "log"))
How can I achieve the same using tidymodels? Note that I am trying to do a regression and not a classification (logistic regression) for which I could use parsnip::logistic_reg().
I found one article on Generalized Linear Models on tidymodels, which belongs to the embed package but does not show how to specify the family.
I would expect something similar to this (which does not work as neither linear_reg has the parameters family or link, nor does set_engine support glm in linear regression mode)
mdl <- linear_reg(mode = "regression", family = "gamma", link = "inverse") %>% set_engine("glm") # or glmnet

That was easier than expected:
mdl <- linear_reg(mode = "regression") %>%
set_engine("glmnet", family = "gamma")
# or
mdl <- linear_reg(mode = "regression") %>%
set_engine("glmnet", family = Gamma(link = "inverse"))

Related

Error when predicting partial effects using new data for gamlss model

I'm here re-raising the issue of predicting CI's for gamlss models using the newdata argument. A further complication is that I'm interested in partial effects as well.
A closely related issue (without partial effects) was un-resolved in 2018: Error when predicting new fitted values from R gamlss object.
I'm wondering if there has been updates that also extend to partial effects. The example below reproduces the error (notice the `type = "terms" specifying I'm interested in the effects of each model term)".
library(gamlss)
library(tidyverse)
#example data
test_df <- tibble(x = rnorm(1e4),
x2 = rnorm(n = 1e4),
y = x2^2 + rnorm(1e4, sd = 0.5))
#fitting gamlss model
gam_test = gamlss(formula = y ~ pb(x2) + x,
sigma.fo= y ~ pb(x2) + x,
data = test_df)
#data I want predictions for
pred_df <- tibble(x = seq(-0.5, 0.5, length.out = 300),
x2 = seq(-0.5, 0.5, length.out = 300))
#returns error when se.fit = TRRUE
pred <- predictAll(object = gam_test,
type = "terms",
se.fit = TRUE, #works if se.fit = FALSE
newdata = pred_df)
Many thanks in advance!

I talked to the main developer of the gamlss software (who is responsible for this function).
He says that the option se.fit=TRUE with type="terms"
has not yet been implemented,
and unfortunately he is too busy at present.
One idea is to bootstrap the original data,
and predict terms for each bootstrap sample,
and then use the results to obtain CI's.

R | How to get accuracy from cv.glmnet

I've been using the cv.glmnet function to fit a lasso logistic regression model. I'm using R
Here's my code. I'm using the iris dataset.
df = iris %>%
mutate(Species = as.character(Species)) %>%
filter(!(Species =="setosa")) %>%
mutate(Species = as.factor(Species))
X = data.matrix(df %>% select(-Species))
y = df$Species
Model = cv.glmnet(X, y, alpha = 1, family = "binomial")
How do I get the model accuracy from the cv.glmnet object (Model).
If I had been using caret on a normal logistic regression model, accuracy is already in the output.
train_control = trainControl(method = "cv", number = 10)
M2 = train(Species ~., data = df, trControl = train_control,
method = "glm", family = "binomial")
M2$results
but a cv.glmnet object doesn't seem to contain this information.

You want to add type.measure='class' as in Model 2 below, otherwise the default for family='binomial' is 'deviance'.
df = iris %>%
mutate(Species = as.character(Species)) %>%
filter(!(Species =="setosa")) %>%
mutate(Species = as.factor(Species))
X = data.matrix(df %>% select(-Species))
y = df$Species
Model = cv.glmnet(X, y, alpha = 1, family = "binomial")
Model2 = cv.glmnet(X, y, alpha = 1, family = "binomial", type.measure = 'class')
Then cvm gives the misclassification rate.
Model2$lambda ## lambdas used in CV
Model2$cvm ## mean cross-validated error for each of those lambdas
If you want results for the best lambda, you can use lambda.min
Model2$lambda.min ## lambda with the lowest cvm
Model2$cvm[Model2$lambda==Model2$lambda.min] ## cvm for lambda.min

Plotting a GLM Model

I am facing trouble with plotting the predictions of a glm model . When i run the below code, R draws up an empty plot.
Logistic regression
model2 = glm(as.factor(loan_status) ~ . , data = train , family = binomial(link = 'logit'))
summary(model2)
Prediction
pred1 <- predict(model2,test,type = 'response')
ggplot(data.frame(pred1), aes(pred1))
train dataset consists of categorical and numerical values
Call:
glm(formula = as.factor(loan_status) ~ ., family = binomial(link = "logit"),
data = train)
Appreciate any assistance
Thank you

Using caret with recipes is leading to difficulties with resample

I've been using recipes to pipe into caret::train, which has been going well, but now I've tried some step_transforms, I'm getting the error:
Error in resamples.default(model_list) :
There are different numbers of resamples in each model
when I compare models with and without the transformations. The same code with step_centre and step_scale works fine.
library(caret)
library(tidyverse)
library(tidymodels)
formula <- price ~ carat
model_recipe <- recipe(formula, data = diamonds)
quadratic_model_recipe <- recipe(formula, data = diamonds) %>%
step_poly(all_predictors())
model_list <- list(
linear_model = NULL,
quadratic = NULL
)
model_list$linear_model <-
model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
model_list$quadratic_model <-
quadratic_model_recipe %>% train(
data = diamonds,
method = "lm",
trControl = trainControl(method = "cv"))
resamp <- resamples(model_list)

quadratic = NULL should have been quadratic_model = NULL

R: Bayesian package for nonlinear mixed effects model

I'm looking for a Bayesian parallel for nonlinear mixed effects models, specifically those using the nlme package in R.
I've come across blme but that seems to be only for linear mixed-effects models. Would brms be appropriate in this case? I've tried to write some code that's analogous to the nlme construction below with the function brm.
library(nlme)
model <- nlme(height ~ exp(beta1*age + 1),
data = Loblolly,
fixed = list(beta1 ~ 1),
random = list(Seed = pdDiag(list(beta1 ~ 1))),
start = list(fixed = c(beta1 = 3)))
library(brms)
bayesian_model <- brm(bf(height ~ exp(beta1*age + 1), beta1 ~ 1, nl = TRUE),
data = Loblolly,
prior = c(prior(normal(0, 1), nlpar = beta1)))
I was able to get to this point, but how exactly do I specify random effects for beta1? And how would I specify the diagonal variance structure like I have with random = list(Seed = pdDiag(list(beta1 ~ 1)))?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

GLM Family using tidymodels - r

That was easier than expected: mdl <- linear_reg(mode = "regression") %>% set_engine("glmnet", family = "gamma") # or mdl <- linear_reg(mode = "regression") %>% set_engine("glmnet", family = Gamma(link = "inverse"))

Related

Error when predicting partial effects using new data for gamlss model

R | How to get accuracy from cv.glmnet

Plotting a GLM Model

Using caret with recipes is leading to difficulties with resample

R: Bayesian package for nonlinear mixed effects model

Categories

Resources