“non-numeric argument to binary operator” when using ggsurvplot() + geom_dl() - r

I would like to attach labels using geom_dl to a Kaplan–Meier (KM) plot produced by ggsurvplot. It works fine if I use the plot part of ggsurvplot.
library(survival)
library(survminer)
library(directlabels)
fit <- survfit(Surv(time, status) ~ disease, data = kidney)
surv_km <- ggsurvplot(fit, risk.table = TRUE)
surv_km$plot + geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
However, I get an error If I use the complete surv_km
surv_km + geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
>Error in surv_km + geom_dl(aes(label = gsub("disease=", "", strata)), :
non-numeric argument to binary operator
In addition: Warning message:
Incompatible methods ("+.ggsurv", "+.gg") for "+"

Use %++% instead of +. See ?add_ggsurvplot for more details.
fit <- survfit(Surv(time, status) ~ disease, data = kidney)
surv_km <- ggsurvplot(fit, risk.table = TRUE, legend = "none")
surv_km %++% geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
To remove labels at the end of the table, we can assign surv_km[["table"]][["layers"]][[2]] to NULL, e.g.
surv_km <- surv_km %++%
geom_dl(aes(label = gsub('disease=','',strata)),
method = list(dl.trans(x = x + .2), "last.points"))
surv_km[["table"]][["layers"]][[2]]<-NULL
Finally to remove disease= in graph's and/or table's legend labels, use gsub with names(fit$strata), as so
attr(fit$strata, "names") = gsub("disease=","",attr(fit$strata, "names"))
#then repeat above steps

Related

Resolving discrepancy between poisson glm fits and regular quadratic fit in R (ggplot2)?

I ran a poison model over some of my count data (only one example is shown here). I tried to do a quadratic curve fit via model (graph 1, below) and a regular fit using an inbuilt function in ggplot2. I am not sure why they are so drastically different. I see this across several of my graphs (I tested to check if it was Poisson distributed). I am wondering if my predict function is doing something wonky?
library(investr)
library(ggplot2)
y.test <- c(3.09601,3.546579, 12.115740, 2.226694, 1.180938, 4.420249, 2.001162, 3.788012, 21.170732, 7.494421 , 5.602522 , 3.300510, 11.404264 ,23.115029,
19.371686, 25.444904, 17.094280 ,1.368615 ,19.343291 , 9.724363 , 8.086256 ,13.021972 ,10.740431 , 2.768960 ,14.494745 ,19.040086 , 7.072040, 8.748415,
10.012655, 14.759963 , 6.669221, 9.179184, 14.069743 ,12.132714, 8.517986, 18.095548, 9.076304, 9.197501, 7.972339 , 3.111373, 10.802117, 16.874861,
2.977454 ,15.195754, 5.433059 , 8.569472, 24.479745 , 3.756167 ,7.028482 , 7.412065 , 6.298529 , 3.585942 , 4.706638 , 9.002232, 5.276891)
x.test <- c(1:55)
df.test <- data.frame(x.test, y.test)
mod <- glm(y.test ~ x.test + I(x.test^2), data = df.test, family = poisson)
predicted.spp <- data.frame(predFit(mod, interval='confidence', level=.95))
df.test$predicted.mean <- predicted.spp$fit
df.test$predicted.upr <- predicted.spp$lwr
df.test$predicted.lwr <- predicted.spp$upr
ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
geom_line(aes(y=predicted.mean), colour="blue") +
geom_ribbon(aes(ymin=predicted.lwr, ymax=predicted.upr), alpha=0.8) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")
my.formula = y~x + I(x^2)
ggplot(df.test, aes(x = x.test, y = y.test)) + geom_point() +
geom_smooth(method="lm", formula = my.formula, color = "black" ) +
stat_poly_eq(formula = my.formula, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE, size = 2.5, col = "black")
By default predFit (and most GLM prediction machinery) returns results on the link scale, in this case the log scale. In your glm example you need predFit(mod, interval='confidence', type = "response" level=.95). (If you wanted you could also exponentiate the results predFit gives you yourself.)
You could also use
geom_smooth(method="glm", formula = my.formula,
method.args = list(family = "poisson"), color = "black" )

Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?

I'd like to include the relevant statistics from a geom_quantile() fitted line in a similar way to how I would for a geom_smooth(method="lm") fitted linear regression (where I've previously used ggpmisc which is awesome). For example, this code:
# quantile regression example with ggpmisc equation
# basic quantile code from here:
# https://ggplot2.tidyverse.org/reference/geom_quantile.html
library(tidyverse)
library(ggpmisc)
# see ggpmisc vignette for stat_poly_eq() code below:
# https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html#stat_poly_eq
my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)
# linear ols regression with equation labelled
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point()
m +
geom_smooth(method = "lm", formula = my_formula) +
stat_poly_eq(aes(label = paste(stat(eq.label), "*\" with \"*",
stat(rr.label), "*\", \"*",
stat(f.value.label), "*\", and \"*",
stat(p.value.label), "*\".\"",
sep = "")),
formula = my_formula, parse = TRUE, size = 3)
generates this:
For a quantile regression, you can swap out geom_smooth() for geom_quantile() and get a lovely quantile regression line plotted (in this case the median):
# quantile regression - no equation labelling
m +
geom_quantile(quantiles = 0.5)
How would you get the summary statistics out to a label, or recreate them on the go? (i.e. other than doing the regression prior to the call to ggplot and then passing it in to then annotate (e.g. similar to what was done here or here for a linear regression?
Please consider this an appendix to Pedro's excellent answer, where he did most of the heavy lifting - this adds some presentation tweaks (colour and linetype) and code to simplify multiple quantiles, producing the plot below:
library(tidyverse)
library(ggpmisc) #ensure version 0.3.8 or greater
library(quantreg)
library(generics)
my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)
# base plot
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point()
# function for labelling
# Doesn't neatly handle P values (e.g return "P<0.001 where appropriate)
stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, colour = "red", label.y = 0.9, ...) {
stat_fit_tidy(method = "rq",
method.args = list(formula = formula, tau = tau),
tidy.args = list(se.type = "nid"),
mapping = aes(label = sprintf('italic(tau)~"="~%.3f~";"~y~"="~%.3g~+~%.3g~x*", with "~italic(P)~"="~%.3g',
after_stat(x_tau),
after_stat(Intercept_estimate),
after_stat(x_estimate),
after_stat(x_p.value))),
parse = TRUE,
colour = colour,
label.y = label.y,
...)
}
# This works, though with double entry of plot specs
# custom colours and linetype
# https://stackoverflow.com/a/44383810/4927395
# https://stackoverflow.com/a/64518380/4927395
m +
geom_quantile(quantiles = c(0.1, 0.5, 0.9),
aes(colour = as.factor(..quantile..),
linetype = as.factor(..quantile..))
)+
scale_color_manual(values = c("red","purple","darkgreen"))+
scale_linetype_manual(values = c("dotted", "dashed", "solid"))+
stat_rq_eqn(tau = 0.1, colour = "red", label.y = 0.9)+
stat_rq_eqn(tau = 0.5, colour = "purple", label.y = 0.95)+
stat_rq_eqn(tau = 0.9, colour = "darkgreen", label.y = 1.0)+
theme(legend.position = "none") # suppress legend
# not a good habit to have double entry above
# modified with reference to tibble for plot specs,
# though still a stat_rq_eqn call for each quantile and manual vertical placement
# https://www.r-bloggers.com/2019/06/curly-curly-the-successor-of-bang-bang/
my_tau = c(0.1, 0.5, 0.9)
my_colours = c("red","purple","darkgreen")
my_linetype = c("dotted", "dashed", "solid")
quantile_plot_specs <- tibble(my_tau, my_colours, my_linetype)
m +
geom_quantile(quantiles = {{quantile_plot_specs$my_tau}},
aes(colour = as.factor(..quantile..),
linetype = as.factor(..quantile..))
)+
scale_color_manual(values = {{quantile_plot_specs$my_colours}})+
scale_linetype_manual(values = {{quantile_plot_specs$my_linetype}})+
stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[1]}}, colour = {{quantile_plot_specs$my_colours[1]}}, label.y = 0.9)+
stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[2]}}, colour = {{quantile_plot_specs$my_colours[2]}}, label.y = 0.95)+
stat_rq_eqn(tau = {{quantile_plot_specs$my_tau[3]}}, colour = {{quantile_plot_specs$my_colours[3]}}, label.y = 1.0)+
theme(legend.position = "none")
#mark-neal stat_fit_glance() does work with quantreg::rq(). Using stat_fit_glance()is however more involved. This stat does not "know" what to expect from glance(), so one has to assemble the label manually.
One needs to know what is available for this. One can either run fit the model outside the ggplot and use glance() to find out what columns it returns or one can do this in the ggplot with the help of package 'gginnards'. I will show this alternative, continuing from your code example above.
library(gginnards)
m +
geom_quantile(quantiles = 0.5) +
stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), geom = "debug")
geom_debug() by default just prints its input to the R console, its input is what the statistics returns.
# A tibble: 1 x 11
npcx npcy tau logLik AIC BIC df.residual x y PANEL group
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <fct> <int>
1 NA NA 0.5 816. -1628. -1621. 232 1.87 0.0803 1 -1
We can the access each of this columns using after_stat() (earlier incarnations being stat() and enclosing the names .... We need to do the formatting using the encoding notation of sprintf(). If as in this case we assemble a string that needs to be parsed into an expression, parse = TRUE is also needed.
m +
geom_quantile(quantiles = 0.5) +
stat_fit_glance(method = "rq", method.args = list(formula = y ~ x),
mapping = aes(label = sprintf('italic(tau)~"="~%.2f~~AIC~"="~%.3g~~BIC~"="~%.3g',
after_stat(tau), after_stat(AIC), after_stat(BIC))),
parse = TRUE)
This example results in the following plot.
With stat_fit_tidy() the same approach should have worked. However, in 'ggpmisc' (<= 0.3.7) it worked with "lm" but not with "rq". This bug is fixed in 'ggpmisc' (>= 0.3.8), now in CRAN.
The example below works only with 'ggpmisc' (>= 0.3.8)
The remaining questions is whether the tibble that glance() or tidy() return contains the information one wants to add to the plot, which does not seem to be the case for tidy.qr(), at least by default. However, tidy.rq() has a parameter se.type that determines the values returned in the tibble. The revised stat_fit_tidy() accepts named arguments to be passed to tidy(), making the following possible.
m +
geom_quantile(quantiles = 0.5) +
stat_fit_tidy(method = "rq",
method.args = list(formula = y ~ x),
tidy.args = list(se.type = "nid"),
mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
after_stat(Intercept_estimate),
after_stat(x_estimate),
after_stat(x_p.value))),
parse = TRUE)
This example results in the following plot.
Defining a new stat stat_rq_eq() would make this even simpler:
stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, ...) {
stat_fit_tidy(method = "rq",
method.args = list(formula = formula, tau = tau),
tidy.args = list(se.type = "nid"),
mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
after_stat(Intercept_estimate),
after_stat(x_estimate),
after_stat(x_p.value))),
parse = TRUE,
...)
}
With the answer becoming:
m +
geom_quantile(quantiles = 0.5) +
stat_rq_eqn(tau = 0.5)
Package 'ggpmisc' (>= 0.4.5) allows a much simpler answer, which is closer to the solution hoped for by #MarkNeal in his question about median regression. This answer should be preferred to earlier ones when using a recent version of 'ggpmisc'. Not shown: passing se = FALSE to stat_quant_line() disables the confidence band.
library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point()
m +
stat_quant_line(quantiles = 0.5) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
quantiles = 0.5,
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
The default is to plot the median and quartiles.
m +
stat_quant_line() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
We can also map the quantiles to color and linetype aesthetics easily.
m +
stat_quant_line(aes(linetype = after_stat(quantile.f),
color = after_stat(quantile.f))) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = ""),
color = after_stat(quantile.f)),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
We can also plot the quartiles as a band by using stat_quant_band() instead of stat_quant_line().
m +
stat_quant_band() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)

Fit and plot a Weibull model to a survival data

I want to achieve the exact same thing asked in this question:
How to plot the survival curve generated by survreg (package survival of R)?
Except for the fact that I don't want the data to be stratified by a variable (in the question above it was stratified by sex).
I just want the progression free survival for the whole group of treated patients.
So when I copy the code from the other question, here is where I get stuck:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung) # in my case here I would replace as.factor(sex) by 1
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)) #Since I don't want to stratify, what do I do with these 2 lines of code?
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
I tried replacing as.factor(sex) by 1 and then the rest of the code just does not make sense, can someone help me with this?
Many thanks in advance!
If you just want to plot the overall empirical survival curve, you might do something like this:
library(survival)
library(survminer)
library(tidyr)
s <- with(lung, Surv(time, status))
fKM <- survfit(s ~ 1, data = survival::lung)
ggsurvplot(fKM, ggtheme = theme_bw())
However, if you want to fit a Weibull model with no predictors, then your formula is fine.
sWei <- survreg(s ~ 1, dist = 'weibull', data = lung)
probs <- seq(0.01, 1, by = 0.01)
time <- predict(sWei, type = "quantile", se = TRUE, p = probs)
The only problem is that time is now a named list of two matrices: fit and se.fit. Both have the same number of rows as lung, but all rows are identical, so we just take one from each and calculate the confidence interval in a data frame which we can then use to create a ggplot:
ggplot(data = data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])) +
geom_step(aes(p, time, colour = "All"), size = 1) +
geom_ribbon(aes(p, ymin = lower, ymax = upper, fill = "All"), alpha = 0.2) +
coord_flip(ylim = c(0, 1000)) +
scale_fill_discrete(name = "Strata") +
scale_color_discrete(name = "Strata") +
theme_bw() +
theme(legend.position = "top")
Which we can see looks like a pretty good fit.
If you want both in the same plot you can do something like:
df <- data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])
ggsurvplot(fKM, ggtheme = theme_bw())$plot +
geom_line(data = df, aes(time, p), linetype = 2, size = 1) +
geom_line(data = df, aes(upper, p), linetype = 2, size = 1) +
geom_line(data = df, aes(lower, p), linetype = 2, size = 1)
Created on 2020-08-18 by the reprex package (v0.3.0)

ggsurvplot: unable to use survfit when called from a function

I am trying to plot a survival plot and have run into an issue when trying to move my survfit function into a main function, where I can call it for different data sets. When I run the code
fit<- survfit(Surv(time, status) ~ sex, data = lung)
allsurv <- function(fit){
ggsurvplot(
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F")) +
scale_y_continuous(expand = c(0.02, 0.02),breaks = seq(from = 0, to = 1, by = 0.1),labels=percent) +
scale_x_continuous(expand = c(0.006, 0.006),
limits = c(0,366*12), breaks = seq(0, 4392, 100))
}
allsurv(fit)
The function is plotted normally
However when I call survfit from a function:
fit_all <- function(x){
survfit(Surv(time, status) ~ sex, data = x)
}
allsurv(fit_all(lung))
I receive an error: " Error in eval(fit$call$data) : object 'x' not found "
Any ideas to what I am doing wrong ?
Survminer includes a function surv_fit that acts as a wrapper around survfit. If you use surv_fit instead of survfit, the "call" of the returned object will include the whole data frame instead of just data = x. That works better when calling ggsurvplot inside a function:
https://www.rdocumentation.org/packages/survminer/versions/0.4.6/topics/surv_fit
allsurv <- function(fit){
ggsurvplot(
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F"))
}
fit_all <- function(x){
surv_fit(Surv(time, status) ~ sex, data = x)
}
allsurv(fit_all(lung))
I managed to get it to work. For anyone else with the same issue. The problem seems to be caused by what calling the function on its own or from a different function returns.
fit_all(lung)
which returns Call: survfit(formula = Surv(time, status) ~ sex, data = x)
fit
which returns Call: survfit(formula = Surv(time, status) ~ sex, data = lung)
It seems that ggsurvplot treats the data in the call as the data for the plot and when called from another function ' data = x'. It seems the way to get past that is to define the data before the survfit function in the ggsurvplot.
allsurv <- function(fit, x){
ggsurvplot(
data = x,
fit,
pval = TRUE,
pval.coord = c(200, 0.10),
conf.int = TRUE,
xlab = "Days",
ggtheme = theme_light(),
surv.median.line = "hv",
legend.labs = c("Female","Male"),
legend.title = "",
palette = c("#8C3F4D","#3E606F")) +
scale_y_continuous(expand = c(0.02, 0.02),breaks = seq(from = 0, to = 1, by = 0.1),labels=percent) +
scale_x_continuous(expand = c(0.006, 0.006),
limits = c(0,366*12), breaks = seq(0, 4392, 100))
}
allsurv(fit_all(lung), lung)

How to plot a linear and quadratic model on the same graph?

So I have 2 models for the data set that I am using:
> Bears1Fit1 <- lm(Weight ~ Neck.G)
>
> Bears2Fit2 <- lm(Weight ~ Neck.G + I(Neck.G)^2)
I want to plot these two models on the same scatterplot. I have this so far:
> plot(Neck.G, Weight, pch = c(1), main = "Black Bears Data: Weight Vs Neck Girth", xlab = "Neck Girth (inches) ", ylab = "Weight (pounds)")
> abline(Bears1Fit1)
However, I am unsure of how I should put the quadratic model on the same graph as well. I want to be able to have both lines on the same graph.
Here is an example with cars data set:
data(cars)
make models:
model_lm <- lm(speed ~ dist, data = cars)
model_lm2 <- lm(speed ~ dist + I(dist^2), data = cars)
make new data:
new.data <- data.frame(dist = seq(from = min(cars$dist),
to = max(cars$dist), length.out = 200))
predict:
pred_lm <- predict(model_lm, newdata = new.data)
pred_lm2 <- predict(model_lm2, newdata = new.data)
plot:
plot(speed ~ dist, data = cars)
lines(pred_lm ~ new.data$dist, col = "red")
lines(pred_lm2 ~ new.data$dist, col = "blue")
legend("topleft", c("linear", "quadratic"), col = c("red", "blue"), lty = 1)
with ggplot2
library(ggplot2)
put all data in one data frame and convert to long format using melt from reshape2
preds <- data.frame(new.data,
linear = pred_lm,
quadratic = pred_lm2)
preds <- reshape2::melt(preds,
id.vars = 1)
plot
ggplot(data = preds)+
geom_line(aes(x = dist, y = value, color = variable ))+
geom_point(data = cars, aes(x = dist, y = speed))+
theme_bw()
EDIT: another way using just ggplot2 using two geom_smooth layers, one with the default formula y ~ x (so it need not be specified) and one with a quadratic model formula = y ~ x + I(x^2). In order to get a legend we can specify color within the aes call naming the desired entry as we want it to show in the legend.
ggplot(cars,
aes(x = dist, y = speed)) +
geom_point() +
geom_smooth(method = "lm",
aes(color = "linear"),
se = FALSE) +
geom_smooth(method = "lm",
formula = y ~ x + I(x^2),
aes(color = "quadratic"),
se = FALSE) +
theme_bw()

Resources