I am trying to use quosures to pass along variable names within a custom function for data processing and use in a formula, but my use of quosures in the formula is not correct. Is there a better way to unquote arguments within a formula?
library(dplyr)
library(broom)
library(purrr)
library(tidyr)
foo <- function(mydata, dv, iv, group_var) {
dv = enquo(dv)
iv = enquo(iv)
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data,
~summary(lm(formula(substitute(dv ~ iv)), data = .))
)) %>%
unnest(model %>% map(tidy))
}
foo(mydata=mtcars, dv=mpg, iv=wt, group_var=cyl)
My code produces "Error in mutate_impl(.data, dots) : Evaluation error: object is not a matrix."
This is a working version of code I am trying to make into a function:
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(model = map(data, ~summary(lm(mpg ~ wt, data = .)))) %>%
unnest(model %>% map(tidy))
You need to use base R nonstandard evaluation with functions like lm which are not "in the tidyverse" so to speak.
So you could change things to:
foo <- function(mydata, dv, iv, group_var) {
flma <- as.formula(paste(substitute(dv), "~", substitute(iv)))
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data, ~summary(lm(flma, data = .)))) %>%
unnest(model %>% map(tidy))
}
foo(mtcars, mpg, wt, cyl)
That's fine if you know you are only doing simple regression. For more flexibility just pass the formula directly, as in:
foo2 <- function(mydata, flma, group_var) {
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data, ~summary(lm(flma, data = .)))) %>%
unnest(model %>% map(tidy))
}
foo(mtcars, mpg ~ wt, cyl)
Related
I have a list of linear regression fit objects. Let's create it in this example by:
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars)
What I would like is to keep just the residuals and fitted.values from each of the regression fit objects, within this same pipeline. I was trying to use the keep function, but it doesn't work:
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars) %>%
map(keep, names(.) %in% c("residuals", "fitted.values"))
Error:
Error in probe(.x, .p, ...) : length(.p) == length(.x) is not TRUE
How can I perform this action?
If a data frame is wanted as output then use the code below or if a list is wanted omit the bind_rows line.
library(dplyr)
library(purrr)
nms <- c('hp', 'wt', 'disp')
out <- nms %>%
set_names(x = map(paste('mpg ~', .), as.formula)) %>%
map(lm, data = mtcars) %>%
map(~ data.frame(fit = fitted(.), resid = resid(.))) %>%
bind_rows(.id = "id")
We can simplify this slightly using sapply as it will add names and use reformulate to generate the formula.
out <- nms %>%
sapply(reformulate, response = "mpg") %>%
map(lm, data = mtcars) %>%
map(~ data.frame(fit = fitted(.), resid = resid(.))) %>%
bind_rows(.id = "id")
Say I have some data that looks like this:
N <- 200
X <- sample(letters[1:5],N, replace = T)
Y <- rnorm(N)
W <- abs(rnorm(N))
my_data <- tibble(X, Y, W)
I want to run an intercept-only regression on each subset of my data defined by X. To do so, I use nest(), mutate() and map() like so:
my_data %>%
group_by(X) %>%
nest() %>%
mutate(fit = map(data, ~lm(Y ~ 1, data = .x)))
While this code works, when I try to incorporate regression weights, like this:
my_data %>%
group_by(X) %>%
nest() %>%
mutate(fit = map(data, ~lm(Y ~ 1, data = .x, weights = W)))
I get the following error:
Error: Problem with `mutate()` input `fit`.
x missing or negative weights not allowed
ℹ Input `fit` is `map(data, ~lm(Y ~ 1, data = .x, weights = W))`.
ℹ The error occurred in group 1: X = "a".
Run `rlang::last_error()` to see where the error occurred.
Where am I going wrong?
(Disclaimer: I am new to the tidyverse so am likely doing something dumb)
Write it this way:
my_data %>%
group_by(X) %>%
nest() %>%
mutate(fit = map(data, ~lm(Y ~ 1, data = .x, weights = .x$W)))
The point is that at that point W is not an existing column anymore.
It exists only inside data. With map you are looping over data which at that point is a list of dataframes.
Therefore to call W, you need to call it through $
Also, define W as:
W <- abs(rnorm(N))
Because you can't have negative weights.
Alternatively, you can do it as follow:
my_data %>%
group_by(X) %>%
summarise(fit = list(lm(Y ~ 1, weights = W)))
It will give you the same result. [almost: because X will be sorted]
Check this out:
fit1 <- my_data %>%
arrange(X) %>%
group_by(X) %>%
nest() %>%
mutate(fit = map(data, ~lm(Y ~ 1, data = .x, weights = .x$W))) %>%
pull(fit)
fit2 <- my_data %>%
group_by(X) %>%
summarise(fit = list(lm(Y ~ 1, weights = W))) %>%
pull(fit)
identical(map(fit1, coef), map(fit2, coef))
#> TRUE
If you just need the coefficients, you can do it this way:
my_data %>%
group_by(X) %>%
summarise(fit = coef(lm(Y ~ 1, weights = W)))
I am performing simple linear regressions on multiple groups from my data set. However, I want extract the summaries from each of these regressions and put them into a master table by organized by group. I can run it like this (and it works):
fit_basic <- rs2_anova %>% #Run multiple simple linear regressions
group_by(quant_method) %>%
nest() %>%
mutate(model = map(data, ~lm(recoveries ~ treatment, data = .)))
fit_basic_A <- fit_basic[[1,"model"]] #Remove the model from fit_basic
fit_basic_B <- fit_basic[[1,"model"]] #Remove the model from fit_basic
fit_basic_table_A <- get_regression_table(fit_basic_A) %>%
select("term", "estimate") %>%
pivot_wider(names_from = "term", values_from = "estimate") %>%
mutate(quant_method = "A")
fit_basic_table_B <- get_regression_table(fit_basic_A) %>%
select("term", "estimate") %>%
pivot_wider(names_from = "term", values_from = "estimate") %>%
mutate(quant_method = "B")
fit_basic_table <- rbind(fit_basic_table_A, fit_basic_table_B)
To save myself some lines of code (because I have many more groups than presented here) I thought I could use the map function, but I keep getting stuck at mapping the summary table, which throws an error:
fit_basic <- rs2_anova %>%
group_by(quant_method) %>%
nest() %>%
mutate(model = map(data, ~lm(recoveries ~ treatment, data = .))) %>%
mutate(summaries = map(data, get_regression_table(.$model)))
Error in input_checks(model, digits, print) :
Only simple linear regression models are supported. Try again using only `lm()` models as appropriate.
I also tried something along this line:
fit_basic_table <- map(fit_basic$model,
function(x) {
p <- get_regression_table(x)
cbind(par=rownames(p), p)
})
But I get a list of dataframes that I can't breakdown into a single dataframe and I have lost my group designations. I have tried:
fit_basic_table <- map(fit_basic$model,
function(x) {
p <- get_regression_table(x)
cbind(par=rownames(p), p)
}) %>%
map_df(as_tibble, .id = "id")
and
fit_basic_table <- map(fit_basic$model,
function(x) {
p <- get_regression_table(x)
cbind(par=rownames(p), p)
}) %>%
unnest(cols = "id")
Any thoughts on how to automate this?
*Random test dataframe:
quant_method <- c("A", "A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
treatment <- c("x","x","x","x","x","y","y","y","y","y","x","x","x","x","x","y","y","y","y","y")
recoveries <-c("88","86","87","82","85","76","65","55","72","71","98","96","97","92","99","66",
"55","55","62","61")
rs2_anova <- data.frame(quant_method, treatment, recoveries)
Here is one solution using tidyverse and broom packages. It is slightly different from the purrr method you were attempting, but I think the result shows the terms that you were interested in extracting from the lm object (i.e., term and estimate).
library(tidyverse)
library(broom)
#Added the stringsAsFactors argument = F to avoid an error in the lm model
rs2_anova <- data.frame(quant_method,
treatment,
recoveries,
stringsAsFactors = F)
fit_basic <- rs2_anova %>%
#Group by quant_method column
group_by(quant_method) %>%
#do the linear models by grouping var
do(model = lm(recoveries ~ treatment, data = .)) %>%
#tidy lm object and order it as tibble
tidy(model)
I found the answer here: unnest a list column after modeling after group_by in r
and modified it to:
fit_cec <- rs2_anova %>%
group_by(quant_method) %>%
nest %>%
mutate(data = map(data, ~ .x %>%
summarise(model = list(broom::tidy(lm(recoveries ~ treatment)))))) %>%
unnest(data) %>%
unnest(model)
or to get all estimates, predictions and summaries (https://drsimonj.svbtle.com/running-a-model-on-separate-groups). This also works nicely:
fit_cec <- rs2_anova %>%
group_by(quant_method) %>%
nest %>%
mutate(fit = map(data, ~ lm(loss_abs_BC_2 ~ cec, data = .)),
parameters = map(fit, tidy), #provides estimate table for slope and y-intercept with std.error and estimate p-values
summary = map(fit, broom::glance), #provides R2, adj.R2, sigma, and model p-value
predictions = map(fit, augment)) %>% #provides fitted values with residuals and errors
unnest(parameters) %>%
pivot_wider(names_from = term, values_from = c(estimate, std.error, statistic, p.value)) %>%
unnest(summary) %>%
unnest(predictions)
I am trying to save a series of models using purrr's walk() functions, and am getting the following error:
"Error in map2(.x, .y, .f, ...) : object 'model' not found"
library(dplyr)
library(tidyr)
library(purrr)
mt_models <-
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(
model = map(.x = data, .f = ~lm(mpg ~ wt, data = .x)),
file_name = paste("model", cyl, "cyl.rda", sep = "_")
)
mt_models %>% walk2(.x = model, .y = file_name, .f = ~save(.x, file = .y))
I can successfully save the models using this code below:
walk2(.x = mt_models$model, .y = mt_models$file_name, .f = ~save(.x, file = .y))
But I am trying to understand why model is not passing into walk2() in the first example.
You can use with to provide an environment in which to search for variables
mt_models %>%
with(walk2(.x = model, .y = file_name, .f = ~save(.x, file = .y)))
Outside the mutate/summarise and other tidyverse function, we need to do .$ to extract the column
library(dplyr)
library(purrr)
mt_models %>% {
walk2(.x = .$model, .y = .$file_name, .f = ~save(.x, file = .y))
}
I'm trying to take this code and turn it into a function:
mtcars %>% group_by(gear) %>% select(hp, disp) %>%
summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>%
gather(variable, value, -gear) %>%
arrange(gear, sub('_.*', '', variable), sub('.*_', '', variable)) %>%
separate(variable, into = c('var', 'metric'), '_')
What it does is create a dataframe that I can easily import into Excel and create pivots with. I've tried the following, but it doesn't seem to work:
mean_func <- function(vars,groups) {
results <- test %>% group_by_at(vars(one_of(groups))) %>%
summarise_at(vars(starts_with(vars)), funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>%
gather_(variable, value, -groups) %>%
arrange_(groups) %>%
separate_(variable, into = c('var', 'metric'), '_'); View(results)
}
Seems like the problem is somewhere along the gather statement, but I'm not sure what could be wrong here. Any thoughts?
Use a bit of tidy cleanup to change the grouping variable and use group_by_ for programming.
library(tidyverse)
mean_func <- function(vars, groups) {
groups = enquo(groups)
vars %>%
group_by_(groups) %>%
dplyr::select(hp, disp) %>%
summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>%
gather(variable, value, -!!groups) %>%
arrange(!!groups, sub('_.*', '', variable), sub('.*_', '', variable)) %>%
separate(variable, into = c('var', 'metric'), '_')
}
mean_func(mtcars, gear)