I am attempting to create an adjusted survival curve (from a Cox model) and would like to display this information as cumulative events.
I have attempted this:
library(survival)
data("ovarian")
library(survminer)
model<-coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)
gplot<-ggadjustedcurves(model) ## Expected plot of adjusted survival curve
Because the "fun=" still has not been implemented in ggadjustedcurves I took the advice of a user on this page and extracted the elements into plotdata and created a new column as shown below.
plotdata<-gplot$data
plotdata%<>%
mutate(new=1-surv) ## 1-survival probability
I am new to R environment and ggplot so how can I then plot the new adjusted survival curve with the new created column and keep the theme of the original plot (contained in gplot).
Thanks!
Edit:
My current solution is as follows.
library(rms)
model<-coxph(Surv(futime, fustat) ~ age+ strata(rx), data=ovarian)
survfit(model, conf.type = "plain", conf.int = 1)
plot(survfit(model), conf.int = T,col = c(1,2), fun='event')
This achieves the survival curve I wanted however I am not sure if the confidence bars are really the standard errors (+/-1). I supplied 1 to the conf.int argument and believe this to create the standard errors in this way since conf.type is specified as plain.
How can I further customize this plot as the base graph looks rather bland! How do I get a display as close as possible to the survminer curves?
You can use the adjustedCurves package instead, which allows both plotting confidence intervals and naturally includes an option to display cumulative incidence functions. First, install it using:
devtools::install_github("https://github.com/RobinDenz1/adjustedCurves")
Now you can use:
library(adjustedCurves)
library(survival)
library(riskRegression)
# needs to be a factor
ovarian$rx <- factor(ovarian$rx)
# needs to include x=TRUE
model <- coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian, x=TRUE)
adj <- adjustedsurv(data=ovarian,
event="fustat",
ev_time="futime",
variable="rx",
method="direct",
outcome_model=model,
conf_int=TRUE)
plot(adj, cif=TRUE, conf_int=TRUE)
Which produces:
I would probably not use this method here, though. Simulation studies have shown that the cox-regression based method performs badly in small sample sizes. You might want to take a look at method="iptw" or method="aiptw" inside the adjustedCurves package instead.
Related
I have a multinomial logistic regression model built using multinom() function from nnet package in R. I have a 7 class target variable and I want to plot the coefficients that the variables included in the model have for each class of my dependent variable.
For a binary logistic regression I used coefplot() function from arm package, but I don't know how to do this for a multiclass problem.
I want my plots to look like this:
I couldn't easily find a sensible multinom() example: the one below gives ridiculous values, but the structure of the code should work anyway. The basic idea is to use broom::tidy() to extract coefficients and ggplot/ggstance to plot them. ggstance is specifically for plotting horizontal point-ranges and displacing them from each other an appropriate amount; this can also be done via coord_flip(), but coord_flip() induces a certain lack of flexibility (e.g. it can't easily be combined with faceting).
library(nnet)
library(broom)
library(ggplot2); theme_set(theme_bw())
library(ggstance)
Create example multinom() fit:
nvars <- c("mpg","disp","hp")
mtcars_sc <- mtcars
mtcars[nvars] <- scale(mtcars[nvars])
m <- multinom(cyl~mpg+hp+disp,mtcars_sc,
maxit=1e4)
Extract coefficients and drop intercept terms:
tt <- broom::tidy(m,conf.int=TRUE)
tt <- dplyr::filter(tt, term!="(Intercept)")
Plot:
ggplot(tt, aes(x=estimate,y=term,colour=y.level))+
geom_pointrangeh(aes(xmin=conf.low,
xmax=conf.high),
position=position_dodgev(height=0.75))
Given that you're able to get your data like this:
coeff <- factor(1:7,labels=c("inc", "lwg", "hcyes", "wcyes","age", "k618", "k5"))
values <- c(-0.1,0.6,0.15,0.8,-0.05,-0.05,-1.5)
upper <- c(-0.1,1,.6,1.3,-.05,.1,-1)
lower <- c(-0.1,.2,-.2,.3,-.05,-.2,-2)
df <- data.frame(coeff,values,upper,lower)
Then all you have to do is run:
library(ggplot2)
ggplot(df, aes(x=coeff, y=values, ymin=lower, ymax=upper)) +
geom_pointrange() +
geom_hline(yintercept=0, linetype=2)+
coord_flip()
The result should look like this:
You can experiment with certain options to get it to look identical to your example
We can use survminer to plot the survival function or cumulative hazard function, but I cannot see a way to use it to plot the hazard function.
For example,
library(survival)
library(tidyverse)
library(survminer)
data(lung)
# Run Kaplan-Meier on the data
mod.lung <- survfit(Surv(time, status) ~ 1, data = lung)
# Kaplan-Meier Survival Curve
ggsurvplot(mod.lung)
# Cumulative Hazard
ggsurvplot(mod.lung, fun = function(y) -log(y))
Since the cumulative hazard function is H(t) = -log(S(t)) then I just need to add in fun = function(y) -log(y) to get the cumulative hazard plot.
The hazard function is h(t) = -d/dt log(S(t)), and so I am unsure how to use this to get the hazard function in a survminer plot.
An alternative definition of the hazard function is h(t) = f(t)/S(t), however, I'm unsure how to use this to get the plot.
I have found ways to get a hazard plot using ggplot2, for example
survival.table1 <- broom::tidy(mod.lung) %>% filter(n.event > 0)
survival.table1 <- survival.table1 %>% mutate(hazard = n.event / (n.risk * (lead(time) - time)))
ggplot() +
geom_step(data = survival.table1, aes(x = time, y = hazard)) +
labs(x = "Time", y = "Hazard")
However, I mainly wish to find a way with the survminer package, partly to have some consistency.
Thanks
In the rms package, you can using the survplot function with "what" parameter specified as "hazard" to plot the hazard function verse time.
https://rdrr.io/cran/rms/man/survplot.html
A couple of years late, but here for others.
survplot can only be used to plot the hazard if the estimate was created by the psm function. The psm function of the rms library fits the accelerated failure time family of parametric survival models (defaulting to a Weibull distribution). Other available distributions are in the documentation for the survreg package:
These include "weibull", "exponential", "gaussian", "logistic","lognormal" and "loglogistic". Otherwise, it is assumed to be a user defined list conforming to the format described in survreg.distributions
library(rms)
mod.lung <- psm(Surv(time, status) ~ 1, data = lung)
survplot(mod.lung, what="hazard")
For non parametric survival models, the muhaz library might be more useful. This example uses the default "epanechnikov" boundary kernel function. You may wish to explore different bandwidth options - see the muhaz package documentation.
library(muhaz)
mod.lung <- muhaz(lung$time, lung$status - 1) # status must be 1 for failure and 0 for censored
plot(mod.lung)
Alternatively, to apply B-splines instead of kernel density smoothing to the hazard, have a look at the bshazard library
library(bshazard)
mod.lung <- bshazard(Surv(time, status) ~ 1, data = lung)
plot(mod.lung)
I am running a GAM model through the mgcv package with family = cox.ph() and have my data grouped by strata (strata = id). The data corresponds to one use location for an individual animal and 20 random locations associated with that individual that were available for use.
require(mgcv)
require(survival)
require(smoothHR)
gam1 = gam(time1~s(DWL)+strata(id),family=cox.ph(),method = "REML",data=dataset, weight = event1)
The model is running smoothly but I am unsure how to plot relationships to x-variable. DWL is a continuous variable. I have used the following to graph predictions:
x = seq(0,120) #extent of DWL values
plot(gam1,residuals=T,trans=function(x)exp(x)/(1+exp(x)),shade=T)
I am a bit confused about the use of the trans argument in the plot syntax. Using the cox.ph() for your family agrument, Is the logit-link the proper way to evaluate your predicted y-response to the x variable DWL?
Thank you,
P Farrell
I am plotting regression summaries for a quantile regression I did with quantreg.
Obviously the method plot.summary.rqs is in use here. The problem is that is use quite a few explanatory variables each of which are displayed in the plot. Most of the coefficients behave not significantly different from OLS, so I just want to pick out and display a few of them.
How can I select the plots that I need to show? I am using knitr for my reports but do not want to show dozens of variables (and you get there quickly using dummies). Is there a way to cherry pick?
By default, plot.summary.rqs plots all coefficients:
library(quantreg)
data(stackloss)
myrq <- rq(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., tau = seq(0.35, 0.65, 0.1), data=stackloss)
plot(summary(myrq)) # Plots all 4 coefficients
To cherry pick coefficients, the parm argument can be used:
plot(summary(myrq), parm = 2) # Plot only second regressor (Air.Flow)
plot(summary(myrq), parm = "Water.Temp") # Plot only Water.Temp
plot(summary(myrq), parm = 3:4) # Plot third and fourth regressor
I am trying to show that there is a wierd "bump" in some data I am analysing (it is to do with market share. My code is here:-
qplot(Share, Rate, data = Dataset3, geom=c("point", "smooth"))
(I appreciate that this is not very useful code without the dataset).
Is there anyway that I can get the numeric vector used to generate the smoothed line out of R? I just need that layer to try to fit a model to the smoothed data.
Any help gratefully received.
Yes, there is. ggplot uses the function loess as the default smoother in geom_smooth. this means you can use loess directly to estimate your smoothing parameters.
Here is an example, adapted from ?loess :
qplot(speed, dist, data=cars, geom="smooth")
Use loess to estimate the smoothed data, and predict for the estimated values::
cars.lo <- loess(dist ~ speed, cars)
pc <- predict(cars.lo, data.frame(speed = seq(4, 25, 1)), se = TRUE)
The estimates are now in pc$fit and the standard error in pc$fit.se. The following bit of code extraxts the fitted values into a data.frame and then plots it using ggplot :
pc_df <- data.frame(
x=4:25,
fit=pc$fit)
ggplot(pc_df, aes(x=x, y=fit)) + geom_line()