Large standard error of prediction from parsnip vs base R

Large standard error of prediction from parsnip vs base R - r

It seems like predict is producing a standard error that is too large. I get 0.820 with a parsnip model but 0.194 with a base R model. 0.194 for a standard error seems more reasonable since about 2*0.195 above and below my prediction are the ends of the confidence interval. What is my problem/misunderstanding?
library(parsnip)
library(dplyr)
# example data
mod_dat <- mtcars %>%
as_tibble() %>%
mutate(cyl_8 = as.numeric(cyl == 8)) %>%
select(mpg, cyl_8)
parsnip_mod <- logistic_reg() %>%
set_engine("glm") %>%
fit(as.factor(cyl_8) ~ mpg, data = mod_dat)
base_mod <- glm(as.factor(cyl_8) ~ mpg, data = mod_dat, family = "binomial")
parsnip_pred <- tibble(mpg = 18) %>%
bind_cols(predict(parsnip_mod, new_data = ., type = 'prob'),
predict(parsnip_mod, new_data = ., type = 'conf_int', std_error = T)) %>%
select(!ends_with("_0"))
base_pred <- predict(base_mod, tibble(mpg = 18), se.fit = T, type = "response") %>%
unlist()
# these give the same prediction but different SE
parsnip_pred
#> # A tibble: 1 x 5
#> mpg .pred_1 .pred_lower_1 .pred_upper_1 .std_error
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18 0.614 0.230 0.895 0.820
base_pred
#> fit.1 se.fit.1 residual.scale
#> 0.6140551 0.1942435 1.0000000
Created on 2020-06-04 by the reprex package (v0.3.0)
--EDIT--
As #thelatemail and #Limey said, using type="link" for the base model will give the standard error on the logit scale (0.820). However, I want the standard error on the probability scale.
Is there an option in the parsnip documentation that I'm missing? I would like to use parsnip.

#thelatemail is correct. From the online doc for predict.glm:
type
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities.
The default is to report using the logit scale,, 'response' requests results on the raw probability scale. It's not obvious from the parsnip::predict documentation that I found how that chooses the scale on which to return its results, but it's clear it's using the raw probability scale.
So both methods are returning correct answers, they're just using different scales.
I don't want to steal an accepted solution from #thelatemail, so invite them to post a similar answer to this.

As #thelatemail said, you can get the standard error on the probability scale with parsnip using the arguments: type="raw", opts=list(se.fit=TRUE, type="response"). But at that point, you might as well use a base model since the output is exactly the same. However, this is still useful if you are already using a parsnip model and you want the standard error output of a base model.
library(parsnip)
library(dplyr)
mod_dat <- mtcars %>%
as_tibble() %>%
mutate(cyl_8 = as.numeric(cyl == 8)) %>%
select(mpg, cyl_8)
parsnip_mod <- logistic_reg() %>%
set_engine("glm") %>%
fit(as.factor(cyl_8) ~ mpg, data = mod_dat)
base_mod <- glm(as.factor(cyl_8) ~ mpg, data = mod_dat, family = "binomial")
predict(parsnip_mod, tibble(mpg = 18), type="raw",
opts=list(se.fit=TRUE, type="response")) %>%
as_tibble()
#> # A tibble: 1 x 3
#> fit se.fit residual.scale
#> <dbl> <dbl> <dbl>
#> 1 0.614 0.194 1
predict.glm(base_mod, tibble(mpg = 18), se.fit = T, type="response") %>%
as_tibble()
#> # A tibble: 1 x 3
#> fit se.fit residual.scale
#> <dbl> <dbl> <dbl>
#> 1 0.614 0.194 1
Created on 2020-06-11 by the reprex package (v0.3.0)

Related

RMSE value on the example of randomForrest

I am watching one of the solutions for House Prices Kaggle competition. I would like to know how do you get RMSE value from this:
Subset the train rows and selected features
dt.train <- fulldt %>% filter(Set == "Train") %>% select("Id", "OverallQual", "TotalArea", "AreaAbvground", "GarageArea", "TotalBaths", "YearBuilt", "Neighborhood", "MSSubClass", "FireplaceQu", "ExterQual", "KitchenQual", "BsmtQual", "HouseStyle") %>% mutate(SalePrice = log(raw.train$SalePrice))
Same for the test features
dt.test <- fulldt %>% filter(Set == "Test") %>%
select("Id", "OverallQual", "TotalArea", "AreaAbvground", "GarageArea", "TotalBaths", "YearBuilt",
"Neighborhood", "MSSubClass", "FireplaceQu", "ExterQual", "KitchenQual", "BsmtQual", "HouseStyle")
Random Forest model
fit <- randomForest(SalePrice ~ ., data = dt.train, importance = T)
Use new model to predict SalePrice values from the test set
pred <- exp(predict(fit , newdata = dt.test))
How do you get RMSE value from pred ?

Let's calculate the RMSE of the training and test rows based on the minimal example iris data:
library(tibble)
library(randomForest)
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
train_df <- head(iris, 100)
test_df <- tail(iris, 50)
model <- randomForest(Sepal.Length ~ ., data = train_df, importance = T)
# Test RMSE
tibble(
truth = predict(model, newdata = test_df),
predicted = test_df$Sepal.Length
) %>%
rmse(truth, predicted)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.836
# Train RMSE
tibble(
truth = predict(model, newdata = train_df),
predicted = train_df$Sepal.Length
) %>%
rmse(truth, predicted)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.265
Created on 2021-12-13 by the reprex package (v2.0.1)

How to use results from different regression models in a scatterplot built using group_by in R?

I would like to add 2 different regression curves, coming from different models, in a scatter plot.
Let's use the example below:
Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)
The scatter plot looks like this:
plot_df <- ggplot(df, aes(x = Weight, y = Increment, color=Id)) + geom_point()
I tested a linear and an exponential regression model and could extract the results following loki's answer there:
linear_df <- df %>% group_by(Id) %>% do(model = glance(lm(Increment ~ Weight,data = .))) %>% unnest(model)
exp_df <- df %>% group_by(Id) %>% do(model = glance(lm(log(Increment) ~ Weight,data = .))) %>% unnest(model)
The linear model fits better for the Ga group, the exponential one for the Aa group, and nothing for the Za one:
> linear_df
# A tibble: 3 x 13
Id r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Aa 0.656 0.637 15.1 34.4 1.50e- 5 1 -81.6 169. 172. 4106. 18 20
2 Ga 1.00 1.00 0.243 104113. 6.10e-32 1 1.01 3.98 6.65 0.942 16 18
3 Za 0.0471 -0.0262 26.7 0.642 4.37e- 1 1 -69.5 145. 147. 9283. 13 15
> exp_df
# A tibble: 3 x 13
Id r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Aa 0.999 0.999 0.0624 24757. 1.05e-29 1 28.2 -50.3 -47.4 0.0700 18 20
2 Ga 0.892 0.885 0.219 132. 3.86e- 9 1 2.87 0.264 2.94 0.766 16 18
3 Za 0.00444 -0.0721 0.941 0.0580 8.14e- 1 1 -19.3 44.6 46.7 11.5 13 15
Now, how can I draw the linear regression line for the Aa group, the exponential regression curve for the Ga group, and no curve for the Za group? There is this, but it applies for different regressions built inside the same model type. How can I combine my different objects?

The formula shown below gives the same fitted values as does 3 separate fits for each Id so create the lm objects for each of the two models and then plot the points and the lines for each. The straight solid lines are the linear model and the curved dashed lines are the exponential model.
library(ggplot2)
fm.lin <- lm(Increment ~ Id/Weight + 0, df)
fm.exp <- lm(log(Increment) ~ Id/Weight + 0, df)
df %>%
ggplot(aes(Weight, Increment, color=Id)) +
geom_point() +
geom_line(aes(y = fitted(fm.lin))) +
geom_line(aes(y = exp(fitted(fm.exp))), lty = 2, lwd = 1)
To only show the Aa fitted lines for the linear model and Ga fitted lines for the exponential model NA out the portions not wanted. In this case we used solid lines for the fitted models.
df %>%
ggplot(aes(Weight, Increment, color=Id)) +
geom_point() +
geom_line(aes(y = ifelse(Id == "Aa", fitted(fm.lin), NA))) +
geom_line(aes(y = ifelse(Id == "Ga", exp(fitted(fm.exp)), NA)))
Added
Regarding the questions in the comments, the formula used above nests Weight within Id and effectively uses a model matrix which, modulo column order, is a block diagonal matrix whose blocks are the model matrices of the 3 individual models. Look at this to understand it.
model.matrix(fm.lin)
Since this is a single model rather than three models the summary statistics will be pooled. To get separate summary statistics use lmList from the nlme package (which comes with R so it does not have to be installed -- just issue a library statement). The statements below will give objects of class lmList that can be used in place of the ones above as they have a fitted method that will return the same fitted values.
library(nlme)
fm.lin2 <- lmList(Increment ~ Weight | Id, df, pool = FALSE)
fm.exp2 <- lmList(log(Increment) ~ Weight | Id, df, pool = FALSE)
In addition, they can be used to get individual summary statistics. Internally the lmList objects consist of a list of 3 lm objects with attributes in this case so we can extract the summary statistics by extracting the summary statistics from each component.
library(broom)
sapply(fm.lin2, glance)
sapply(fm.exp2, glance)
One caveat is that common statistical tests between models using different dependent variables, Increment vs. log(Increment), are invalid.

possible solution
Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)
library(tidyverse)
df_model <- df %>%
group_nest(Id) %>%
mutate(
formula = c(
"lm(log(Increment) ~ Weight, data = .x)",
"lm(Increment ~ Weight,data = .x)",
"lm(Increment ~ 0,data = .x)"
),
transform = c("exp(fitted(.x))",
"fitted(.x)",
"fitted(.x)")
) %>%
mutate(model = map2(data, formula, .f = ~ eval(parse(text = .y)))) %>%
mutate(fit = map2(model, transform, ~ eval(parse(text = .y)))) %>%
select(Id, data, fit) %>%
unnest(c(data, fit))
ggplot(df_model) +
geom_point(aes(Weight, Increment, color = Id)) +
geom_line(aes(Weight, fit, color = Id))
Created on 2021-10-06 by the reprex package (v2.0.1)

Get AUC on training data from a fitted workflow in Tidymodels?

I'm struggling with how the obtain the AUC from a logistic regression model using tidymodels.
Here's an example using the built-in mpg dataset.
library(tidymodels)
library(tidyverse)
# Use mpg dataset
df <- mpg
# Create an indicator variable for class="suv"
df$is_suv <- as.factor(df$class == "suv")
# Create the split object
df_split <- initial_split(df, prop=1/2)
# Create the training and testing sets
df_train <- training(df_split)
df_test <- testing(df_split)
# Create workflow
rec <-
recipe(is_suv ~ cty + hwy + cyl, data=df_train)
glm_spec <-
logistic_reg() %>%
set_engine(engine = "glm")
glm_wflow <-
workflow() %>%
add_recipe(rec) %>%
add_model(glm_spec)
# Fit the model
model1 <- fit(glm_wflow, df_train)
# Attach predictions to training dataset
training_results <- bind_cols(df_train, predict(model1, df_train))
# Calculate accuracy
accuracy(training_results, truth = is_suv, estimate = .pred_class)
# Calculate AUC??
roc_auc(training_results, truth = is_suv, estimate = .pred_class)
The last line returns this error:
> roc_auc(training_results, truth = is_suv, estimate = .pred_class)
Error in metric_summarizer(metric_nm = "roc_auc", metric_fn = roc_auc_vec, :
formal argument "estimate" matched by multiple actual arguments

Since you are doing binary classification, roc_auc() is expecting a vector of class probabilities corresponding to the "relevant" class, not the predicted class.
You can get this using predict(model1, df_train, type = "prob"). Alternatively, if you are using workflows version 0.2.2 or newer you can use the augment() to get class predictions and probabilities without using bind_cols().
library(tidymodels)
library(tidyverse)
# Use mpg dataset
df <- mpg
# Create an indicator variable for class="suv"
df$is_suv <- as.factor(df$class == "suv")
# Create the split object
df_split <- initial_split(df, prop=1/2)
# Create the training and testing sets
df_train <- training(df_split)
df_test <- testing(df_split)
# Create workflow
rec <-
recipe(is_suv ~ cty + hwy + cyl, data=df_train)
glm_spec <-
logistic_reg() %>%
set_engine(engine = "glm")
glm_wflow <-
workflow() %>%
add_recipe(rec) %>%
add_model(glm_spec)
# Fit the model
model1 <- fit(glm_wflow, df_train)
# Attach predictions to training dataset
training_results <- augment(model1, df_train)
# Calculate accuracy
accuracy(training_results, truth = is_suv, estimate = .pred_class)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.795
# Calculate AUC
roc_auc(training_results, truth = is_suv, estimate = .pred_FALSE)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 roc_auc binary 0.879
Created on 2021-04-12 by the reprex package (v1.0.0)

R loop for linear regression lm(y~x) and save model output as a dataset

I would like to make a regression loop lm(y~x) with a dataset with one y and several x, and run the regression for each x, and then also store the results (estimate, p-values) in a data.frame() so I don't have to copy them manually (especially as my real data set it much bigger).
I think this should not be too difficult, but I struggle a lot to make it work and appreciate your help:
Here is my sample data set:
sample_data <- data.frame(
fit = c(0.8971963, 1.4205607, 1.4953271, 0.8971963, 1.1588785, 0.1869159, 1.1588785, 1.142857143, 0.523809524),
Xbeta = c(2.8907744, -0.7680777, -0.7278847, -0.06293916, -0.04047017, 2.3755812, 1.3043990, -0.5698354, -0.5698354),
Xgamma = c( 0.1180758, -0.6275700, 0.3731964, -0.2353454,-0.5761923, -0.5186803, 0.43041835, 3.9111749, -0.5030638),
Xalpha = c(0.2643091, 1.6663923, 0.4041057, -0.2100472, -0.2100472, 7.4874195, -0.2385278, 0.3183102, -0.2385278),
Xdelta = c(0.1498646, -0.6325119, -0.5947564, -0.2530748, 3.8413339, 0.6839322, 0.7401834, 3.8966404, 1.2028175)
)
#yname <- ("fit")
#xnames <- c("Xbeta ","Xgamma", "Xalpha", "Xdelta")
The simple regression with the first independant variable Xbeta would look like this lm(fit~Xbeta, data= sample_data)and I would like to run the regression for each variable starting with an "X" and then store the result (estimate, p-value).
I have found a code that allows me to select variables that start with "X" and then use it for the model, but the code gives me an error from mutate() onwards (indicated by #).
library(tidyverse)
library(tsibble)
sample_data %>%
gather(stock, return, starts_with("X")) %>%
group_nest(stock)
# %>%
# mutate(model = map(data,
# ~lm(formula = "fit~ return",
# data = .x))
# ),
# resid = map(model, residuals)
# ) %>%
# unnest(c(data,resid)) %>%
# summarise(sd_residual = sd(resid))
For then storing the regression results I have also found the following appraoch using the R package "broom": r for loop for regression lm(y~x)
sample_data%>%
group_by(y,x)%>% # get combinations of y and x to regress
do(tidy(lm(fRS_relative~xvalue, data=.)))
But I always get an error for group_by() and do()
I really appreciate your help!

One option would be to use lapply to perform a regression with each of the independent variables. Use tidy from broom library to store the results into a tidy format.
lapply(1:length(xnames),
function(i) broom::tidy(lm(as.formula(paste0('fit ~ ', xnames[i])), data = sample_data))) -> test
and then combine all the results into a single dataframe:
do.call('rbind', test)
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 1.05 0.133 7.89 0.0000995
# 2 Xbeta -0.156 0.0958 -1.62 0.148
# 3 (Intercept) 0.968 0.147 6.57 0.000313
# 4 Xgamma 0.0712 0.107 0.662 0.529
# 5 (Intercept) 1.09 0.131 8.34 0.0000697
# 6 Xalpha -0.0999 0.0508 -1.96 0.0902
# 7 (Intercept) 0.998 0.175 5.72 0.000723
# 8 Xdelta -0.0114 0.0909 -0.125 0.904

Step one
Your data is messy, let us tidy it up.
sample_data <- data.frame(
fit = c(0.8971963, 1.4205607, 1.4953271, 0.8971963, 1.1588785, 0.1869159, 1.1588785, 1.142857143, 0.523809524),
Xbeta = c(2.8907744, -0.7680777, -0.7278847, -0.06293916, -0.04047017, 2.3755812, 1.3043990, -0.5698354, -0.5698354),
Xgamma = c( 0.1180758, -0.6275700, 0.3731964, -0.2353454,-0.5761923, -0.5186803, 0.43041835, 3.9111749, -0.5030638),
Xalpha = c(0.2643091, 1.6663923, 0.4041057, -0.2100472, -0.2100472, 7.4874195, -0.2385278, 0.3183102, -0.2385278),
Xdelta = c(0.1498646, -0.6325119, -0.5947564, -0.2530748, 3.8413339, 0.6839322, 0.7401834, 3.8966404, 1.2028175)
)
tidyframe = data.frame(fit = sample_data$fit,
X = c(sample_data$Xbeta,sample_data$Xgamma,sample_data$Xalpha,sample_data$Xdelta),
type = c(rep("beta",9),rep("gamma",9),rep("alpha",9),rep("delta",9)))
Created on 2020-07-13 by the reprex package (v0.3.0)
Step two
Iterate over each type, and get the P-value, using this nifty function
# From https://stackoverflow.com/a/5587781/3212698
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
Then do some clever piping
tidyframe %>% group_by(type) %>%
summarise(type = type, p = lmp(lm(formula = fit ~ X))) %>%
unique()
#> `summarise()` regrouping output by 'type' (override with `.groups` argument)
#> # A tibble: 4 x 2
#> # Groups: type [4]
#> type p
#> <fct> <dbl>
#> 1 alpha 0.0902
#> 2 beta 0.148
#> 3 delta 0.904
#> 4 gamma 0.529
Created on 2020-07-13 by the reprex package (v0.3.0)

object '...' not found in R Functions with lm -->> (Error in eval(predvars, data, env) : object '...' not found)

I'm using the moderndrive package to calculate a linear regression but using a function. I am trying to create a function where i can just pass in two selected columns(e.g deaths & cases, titles of the columns) from my data frame (Rona_2020). Below is the function...
score_model_Fxn <- function(y, x){
score_mod <- lm(y ~ x, data = Rona_2020)
Reg_Table <- get_regression_table(score_mod)
print(paste('The regression table is', Reg_Table))
}
when I run the function ...
score_model_Fxn(deaths, cases)
I get ...
Error in eval(predvars, data, env) : object 'deaths' not found
What should i do? I have looked several similar issues but to no avail.

What you want to do by passing deaths and cases is called non-standard evaluation. You need to combine this with computing on the language if you want to run a model with the correct formula and scoping. Computing on the language can be done with substitute and bquote.
library(moderndive)
score_model_Fxn <- function(y, x, data){
#get the symbols passed as arguments:
data <- substitute(data)
y <- substitute(y)
x <- substitute(x)
#substitute them into the lm call and evaluate the call:
score_mod <- eval(bquote(lm(.(y) ~ .(x), data = .(data))))
Reg_Table <- get_regression_table(score_mod)
message('The regression table is') #better than your paste solution
print(Reg_Table)
invisible(score_mod) #a function should always return something useful
}
mod <- score_model_Fxn(Sepal.Length, Sepal.Width, iris)
#The regression table is
## A tibble: 2 x 7
# term estimate std_error statistic p_value lower_ci upper_ci
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 intercept 6.53 0.479 13.6 0 5.58 7.47
#2 Sepal.Width -0.223 0.155 -1.44 0.152 -0.53 0.083
print(mod)
#
#Call:
#lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
#
#Coefficients:
#(Intercept) Sepal.Width
# 6.5262 -0.2234
You could have the function return Reg_Table instead if you prefer.

One of the coolest ways of doing this is using the new recipes package to generate the formula for us and then manipulating a tibble to produce or result
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(moderndive)
score_model_Fxn <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{x}},new_role = "outcome") %>%
update_role({{y}},new_role = "predictor") %>%
formula()
Reg_Table <- mtcars %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(paste('The regression table is', Reg_Table))
Reg_Table
}
k <- mtcars %>%
score_model_Fxn(x = cyl,y = gear)
#> [1] "The regression table is list(term = c(\"intercept\", \"gear\"), estimate = c(10.585, -1.193), std_error = c(1.445, 0.385), statistic = c(7.324, -3.101), p_value = c(0, 0.004), lower_ci = c(7.633, -1.978), upper_ci = c(13.537, -0.407))"
k
#> [[1]]
#> # A tibble: 2 x 7
#> term estimate std_error statistic p_value lower_ci upper_ci
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 intercept 10.6 1.44 7.32 0 7.63 13.5
#> 2 gear -1.19 0.385 -3.10 0.004 -1.98 -0.407
Created on 2020-06-09 by the reprex package (v0.3.0)

For those that might be interested...I modified Bruno's answer.
library(tidyverse); library(recipes); library(moderndive)
score_model_Fxn2 <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{y}},new_role = "outcome") %>%
update_role({{x}},new_role = "predictor") %>%
formula()
Reg_Table <- df %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(Reg_Table)
}
score_model_Fxn2()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Large standard error of prediction from parsnip vs base R - r

Related

RMSE value on the example of randomForrest

How to use results from different regression models in a scatterplot built using group_by in R?

Get AUC on training data from a fitted workflow in Tidymodels?

R loop for linear regression lm(y~x) and save model output as a dataset

object '...' not found in R Functions with lm -->> (Error in eval(predvars, data, env) : object '...' not found)

Categories

Resources