Object '.' not found while piping with dplyr - r

I am trying to conduct a survival curve using the survival package. The MWE code is as follows:
df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .) %>%
ggsurvplot(fit = .)
I get the error Error in eval(fit$call$data) : object '.' not found
When I try to break this down further by:
survfit <- df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .)
ggsurvplot(fit = survfit)
I get an identical error. Is anyone able to figure out how to pipe from my dataframe all the way through a survival curve? The reason I would like to do this is to streamline the filtering of my dataframe in order to produce a multitude of different survival curves without having to create many subsetted dataframes.

Apparently, ggsurvplot expects an object of class "survfit" as its first argument but also needs the data set as an argument.
The example below is based on the first example of function
survfit.formula {survival}.
library(dplyr)
library(survival)
library(survminer)
aml %>%
survfit(Surv(time, status) ~ x, data = .) %>%
ggsurvplot(data = aml)
In the question's case this would become
df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .) %>%
ggsurvplot(data = filter(df, fac <= "Limit"))

Related

Add multiple model statistics with add_glance_table() horizontaly (and not verticaly) in tbl_regression

I used tbl_regression and add_glance_table() from gtsummary to build a table with model statistic:
library(gtsumary)
coxph(Surv(time, event) ~ score, data = dat) %>%
tbl_regression(exponentiate = TRUE) %>%
add_glance_table(concordance)
1st question: How can I move the model statistic horizontaly, to the right?
Because, in the end, I want to display multiple model statistic, with C index in the last column, like this:
tbl_uvregression(
dat_score,
method=survival::coxph,
y = Surv(time, event),
exponentiate = TRUE)
2nd question: How do I add add_glance_table in tbl_uvregression?
You can merge any additional columns/statistics into a gtsummary using the modify_table_body() function (the table_body is an internal data frame that is styled and printed as the summary table).
It's possible to add the c-index in a tbl_uvregression() setting. But I think it requires a higher understanding of the internals of a tbl_uvregression() object. In the example below, I estimate each univariable model separately, summarize the model with tbl_regression(), merge in the c-index, then stack all the tbls with tbl_stack().
Happy Programming!
library(gtsummary)
library(tidyverse)
library(survival)
packageVersion("gtsummary")
#> [1] '1.5.2'
covariates <- c("age", "marker")
# iterate over the covariates
tbl <-
covariates %>%
map(
function(varname) {
# build regression model
mod <-
str_glue("Surv(ttdeath, death) ~ {varname}") %>%
as.formula() %>%
coxph(data = trial)
# calculate and format c-index. adding variable column to merge in the next step
df_cindex <-
broom::glance(mod) %>%
select(concordance) %>%
mutate(
concordance = style_sigfig(concordance, digits = 3),
variable = varname
)
# summarize model
tbl_regression(mod, exponentiate = TRUE) %>%
# merge in the c-index
modify_table_body(~left_join(.x, df_cindex, by = "variable")) %>%
modify_header(concordance = "**c-index**") # assigning a header label unhides the column
}
) %>%
#stack all tbls
tbl_stack()
Created on 2022-04-09 by the reprex package (v2.0.1)

Grouped linear regression prediction on different grouped by group in R

I'm trying to build models based on specific groups in a dataset and use the models generated to predict fit on a different dataset by following the group restrictions. In other words, using the example below, models built using subset: cyl==4 of original data should be used only to predict subset: cyl==4 of new dataset (data1). Anyone can help with this interesting problem?
I tried to used data1%>% group_by(cyl) to specify the new data but that didn't help
Thank you
library(broom)
library(dplyr)
library(purrr)
data1 <- head(mtcars,20)
x<-mtcars %>%
group_by(cyl) %>%
summarise(fit = list(lm(wt ~ mpg)),
data = list(cur_data())) %>%
mutate(col = map(fit, augment, newdata = data1%>% group_by(cyl)))```
Here is a quick way to do this
library(dplyr)
models = mtcars %>% group_by(cyl) %>% do(model = lm(wt ~ mpg, data = .))
Then access the individual models with
library(broom)
tidy(models$model[[1]])
Another way to do the same -
models <- mtcars %>%
nest_by(cyl) %>%
mutate(mod = list(lm(mpg ~ disp, data = data)))

What does se.fit represent exactly? How can we compute it manually?

Given the following code in R why should we have an array of standard errors instead of one standard error?
library(dplyr)
library(HistData)
data("GaltonFamilies")
set.seed(1983)
galton_heights <- GaltonFamilies %>%
filter(gender == "male") %>%
group_by(family) %>%
sample_n(1) %>%
ungroup() %>%
select(father, childHeight) %>%
rename(son = childHeight)
Now consider this:
fit <- galton_heights %>% lm(son ~ father, data = .)
Y_hat <- predict(fit, se.fit = TRUE)
The predicted values could be extracted by this code:
Y_hat$fit
which gives us the predicted height of Sons for the fathers in galton_heights.
However, I do not understand why do we have an array of standard errors when we run this code:
Y_hat$se.fit
What do these values refer to? and how could they be calculated manually?

run multiple model and save model comparison results in dataframe in r

I want to run lm models and save model comparison result and extract p-values. I would like to save all the info in a dataframe.
Using diamonds dataset as an example:
diamonds %>%
group_by(cut) %>%
do(model1 = lm(price~carat, data=.),
model2 = lm(price~carat+depth, data=.)) %>%
mutate(anova = anova(model2,model1)) %>%
mutate(pval= anova$'Pr(>F'[2])
I got error message below:
Error in mutate_impl(.data, dots) :
Column `anova` must be length 1 (the group size), not 6
My question is:
Why I got the error message and how to save anova result in the dataframe?
how to make the whole process work if lm or anova do not work on some subsets? something like try..catch..
My real data is more complicated then this. Just use diamonds and linear model to illustrate the idea.
Thanks a lot.
This is a really good application of the tidyr::nest() function in conjunction with purrr and broom. What you do is:
- Group the data frame
- Apply a model with mutate(mod = map(data, model)
- summarize the model using broom::tidy()
- extract the relevant statistics.
For more on this here's a great talk by Hadley on the subject: https://www.youtube.com/watch?v=rz3_FDVt9eg
In your case I think you can do something like this:
library(tidyverse)
library(broom)
diamonds %>%
group_by(cut) %>%
nest() %>%
mutate(
model1 = map(data, ~lm(price~carat, data=.)),
model2 = map(data, ~lm(price~carat+depth, data=.))
) %>%
mutate(anova = map2(model1, model2, ~anova(.x,.y))) %>%
mutate(tidy_anova = map(anova, broom::tidy)) %>%
mutate(p_val = map_dbl(tidy_anova, ~.$p.value[2])) %>%
select(p_val)

Averaging R objects

Lets say that I have several R objects, e.g. lm outputs:
m1 <- lm(x ~ y, data = data, subset = sample==1)
m2 <- lm(x ~ y, data = data, subset = sample==2)
m3 <- lm(x ~ y, data = data, subset = sample==3)
m4 <- lm(x ~ y, data = data, subset = sample==4)
and now I want to average those objects, i.e. I want to average all estimates produced by lm. I would be very happy If I could get summary statistics of all the parameters in the objects, i.e. average intercept etc. What simplifies the problem is that all the objects would be roughly the same, just calculated on different samples.
Is there any way to do this in a general fashion, that is, using a single general function rather that taking all the individual values and averaging them one at a time? Also, I would need this kind of function for different kinds of objects.
Probably lapply could be used in some way, however how to deal with multiple (varying) layers of nesting?
This should work (example using the mtcars dataset):
library(dplyr)
meanpars <- mtcars %>%
group_by(cyl) %>%
do(mod = lm(mpg ~ wt, data = .)) %>%
summarise(
intercepts = coef(mod)[1],
wtbeta = coef(mod)[2]) %>%
summarise(
meaninter = mean(intercepts),
meanbeta = mean(wtbeta))
Here's with your toy data plugged in:
library(dplyr)
meanpars <- data %>%
group_by(sample) %>%
do(mod = lm(x ~ y, data = .)) %>%
summarise(
intercepts = coef(mod)[1],
ybeta = coef(mod)[2]) %>%
summarise(
meaninter = mean(intercepts),
meanbeta = mean(ybeta))
Edit: If you don't want to average the coefficients in the end, just remove the last summarise function and you'll still get a data.frame with the results from your models.

Resources