How to create an aesthetically pleasant Hazard Ratio Chart in R - r

I was looking at a Youtube video I found online. I'm new to survival analysis. The host mentioned that the second graph was created using a mixture of packages Broom & ggplot2.
Any ideas?
# Current Code:
sigMod = coxph(Surv(time, DEATH_EVENT) ~ age+anaemia+creatinine_phosphokinase+ejection_fraction+
serum_creatinine+hypertension, data=HF)
ggforest(sigMod, data = HF)
EDIT 1
Added code so far:
tidy(sigMod) %>% select(term, estimate) %>%
ggplot(aes(x=estimate, y=term)) + geom_boxplot()
EDIT 2
My Model's data after using Broom:
| Term | Estimate |
|---------------------|------------------|
| Age | 0.0436065795 |
| Anaemia1 | 0.3932590155 |
| creatinine_phosphokinase | 0.0001964616 |
| ejection_fraction | -0.0517850968 |
| serum_creatinine | 0.3483455436 |
| hypertensionPresent | 0.4667523759 |

Here's a fully reproducible example of how something like your target plot could be achieved, using the pbc dataset from the survival package. Just swap in your own coxph call at the start:
library(survival)
library(tidyverse)
library(broom)
coxph(Surv(time, status) ~ sex + ascites + spiders + hepato + edema,
data = pbc) %>%
tidy() %>%
mutate(upper = estimate + 1.96 * std.error,
lower = estimate - 1.96 * std.error) %>%
mutate(across(all_of(c("estimate", "lower", "upper")), exp)) %>%
ggplot(aes(estimate, term, color = estimate > 1)) +
geom_vline(xintercept = 1, color = "gray75") +
geom_linerange(aes(xmin = lower, xmax = upper), size = 1.5, alpha = 0.5) +
geom_point(size = 4) +
theme_minimal(base_size = 16) +
scale_color_manual(values = c("green4", "red3"), guide = "none") +
xlim(c(0, 5)) +
labs(title = "Hazard ratio for various clinical findings in PBC", y = NULL,
x = "Hazard ratio estimate (95% Confidence Intervals)") +
theme(axis.text.y = element_text(hjust = 0, size = 18))

Related

How to remove variables in plot_summs?

I use plot_summs to plot my regression coefficients. below is a reproducible sample. I want to do two things. First, I want to remove Frost and Murder from the graph. Second, I want to change the color of Illiteracy and Population to Green and Life Exp and HS Grad to red. I’d appreciate any help I can get here.
fit1 <- lm(Income ~ Frost + Illiteracy + Murder +
Population + Area + Life Exp + HS Grad,
data = states, weights = runif(50, 0.1, 3))
plot_summs(fit1, scale = TRUE)
Without using the jtools package:
If I'm not wrong, plot_summ with scale - TRUE, scales the independent variables and plots the summary of the variable estimates, with mean as point and 2*SE as segments.
states2 <- states
states2[,-1] <- scale(states2[,-1]) # Considering first column is Income
fit2 <- lm(Income ~ ., data = states2, weights = runif(50, 0.1, 3))
df <- as.data.frame(summary(fit2)[["coefficients"]][-1,1:2])
df$variable <- rownames(df)
df <- df[!df$variable %in% c("Frost", "Murder"), ]
library(ggplot2)
ggplot(df) +
geom_point(aes(x = variable, y = Estimate,
color = variable), size = 6) +
geom_segment(aes(x = variable, xend = variable,
y = Estimate - (2 * `Std. Error`),
yend = Estimate + (2 * `Std. Error`),
color = variable), lwd = 2) +
scale_color_manual(values = c("Illiteracy" = "green","Population" = "green",
"Area" = "blue",
"`Life Exp`" = "red", "`HS Grad`" = "red")) +
coord_flip() +
theme_classic()

Fit and plot a Weibull model to a survival data

I want to achieve the exact same thing asked in this question:
How to plot the survival curve generated by survreg (package survival of R)?
Except for the fact that I don't want the data to be stratified by a variable (in the question above it was stratified by sex).
I just want the progression free survival for the whole group of treated patients.
So when I copy the code from the other question, here is where I get stuck:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung) # in my case here I would replace as.factor(sex) by 1
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)) #Since I don't want to stratify, what do I do with these 2 lines of code?
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
I tried replacing as.factor(sex) by 1 and then the rest of the code just does not make sense, can someone help me with this?
Many thanks in advance!
If you just want to plot the overall empirical survival curve, you might do something like this:
library(survival)
library(survminer)
library(tidyr)
s <- with(lung, Surv(time, status))
fKM <- survfit(s ~ 1, data = survival::lung)
ggsurvplot(fKM, ggtheme = theme_bw())
However, if you want to fit a Weibull model with no predictors, then your formula is fine.
sWei <- survreg(s ~ 1, dist = 'weibull', data = lung)
probs <- seq(0.01, 1, by = 0.01)
time <- predict(sWei, type = "quantile", se = TRUE, p = probs)
The only problem is that time is now a named list of two matrices: fit and se.fit. Both have the same number of rows as lung, but all rows are identical, so we just take one from each and calculate the confidence interval in a data frame which we can then use to create a ggplot:
ggplot(data = data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])) +
geom_step(aes(p, time, colour = "All"), size = 1) +
geom_ribbon(aes(p, ymin = lower, ymax = upper, fill = "All"), alpha = 0.2) +
coord_flip(ylim = c(0, 1000)) +
scale_fill_discrete(name = "Strata") +
scale_color_discrete(name = "Strata") +
theme_bw() +
theme(legend.position = "top")
Which we can see looks like a pretty good fit.
If you want both in the same plot you can do something like:
df <- data.frame(p = 1 - probs,
time = time$fit[1,],
upper = time$fit[1,] + 1.96 * time$se.fit[1,],
lower = time$fit[1,] - 1.96 * time$se.fit[1,])
ggsurvplot(fKM, ggtheme = theme_bw())$plot +
geom_line(data = df, aes(time, p), linetype = 2, size = 1) +
geom_line(data = df, aes(upper, p), linetype = 2, size = 1) +
geom_line(data = df, aes(lower, p), linetype = 2, size = 1)
Created on 2020-08-18 by the reprex package (v0.3.0)

How to add weights parameter to Generalized Mixed Model

How do you add the weight of an observation to a Mixed Model?
I thought I could add the Freq column to wt argument, but apparently not.
using RDatasets MixedModels
titanic = RDatasets.dataset("datasets", "Titanic")
titanic.surv_flg = titanic.Survived .== "Yes";
This runs:
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age + Sex + (1 | Class)), titanic, Bernoulli(), nAGQ = 2, fast = true)
But this doesn't
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wt = Freq, Bernoulli(), nAGQ = 2, fast = true)
I found this out on another forum.
the parameter should be wts not wt.
So it should be:
MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wts = Freq, Bernoulli(), nAGQ = 2)

Plotting Panel data Mixed Effect model with Random and Fixed models

I am working on panel data models and I am now using Mixed model from lme4 package, I also Used model basen on random, fixed, LSDV, Fisrt_diff, etc...
I have a function that plot all models coeffs. in ggplot, however plotting coefficients from lme4 is an issue I can make it work:
Is there a way hot to make below code work for all model, including also model mixed?
library(plm)
library(lme4)
library(ggplot2)
mixed <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fixed = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "within")
random = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "random")
pool = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "pooling")
first_diff = plm(Reaction ~ Days, data = sleepstudy, index = c("Subject", "Days"), model = "fd")
# Function to extract point estimates
ce <- function(model.obj) {
extract <- summary(get(model.obj))$coefficients[2:nrow(summary(get(model.obj))$coefficients), 1:2]
return(data.frame(extract, vars = row.names(extract), model = model.obj))
}
# Run function on the three models and bind into single data frame
coefs <- do.call(rbind, sapply(paste0(list(
"fixed", "random", "pool", "first_diff"
)), ce, simplify = FALSE))
names(coefs)[2] <- "se"
gg_coef <- ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept = 0, lty = 1, lwd = 0.5, colour = "red") +
geom_errorbar(aes(ymin = Estimate - se, ymax = Estimate + se, colour = vars),
lwd = 1, width = 0
) +
geom_point(size = 3, aes(colour = vars)) +
facet_grid(model ~ ., scales="free") +
coord_flip() +
guides(colour = FALSE) +
labs(x = "Coefficient", y = "Value") +
ggtitle("Raw models coefficients")
gg_coef
The error you have with the current code, is that
data(sleepstudy)
mixed <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
coefficients(summary(mixed))
Estimate Std. Error t value
(Intercept) 251.40510 6.823773 36.842535
Days 10.46729 1.545958 6.770744
Days is numeric in the sleepstudy dataset and used a continuous predictor. Using your ce function, this returns an error because the row names are dropped, with 2:nrow(..).
To get similar estimates to your other models, set Days to factor and random effect to (1|Day). I don't think (Days | Subject) make sense.
sleepstudy$Days = factor(sleepstudy$Days)
mixed <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy)
and we alter your ce code slightly, using drop=FALSE,to prevent the empty row.names
ce <- function(model.obj) {
summ.model <- summary(get(model.obj))$coefficients
extract <- summ.model[2:nrow(summ.model),drop=FALSE, 1:2]
return(data.frame(extract, vars = row.names(extract), model = model.obj))
}
coefs <- do.call(rbind, sapply(paste0(list(
"fixed", "random", "pool", "first_diff","mixed"
)), ce, simplify = FALSE))
names(coefs)[2] <- "se"
run the rest of what you have:
gg_coef <- ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept = 0, lty = 1, lwd = 0.5, colour = "red") +
geom_errorbar(aes(ymin = Estimate - se, ymax = Estimate + se, colour = vars),
lwd = 1, width = 0
) +
geom_point(size = 3, aes(colour = vars)) +
facet_grid(model ~ ., scales="free") +
coord_flip() +
guides(colour = FALSE) +
labs(x = "Coefficient", y = "Value") +
ggtitle("Raw models coefficients")
gg_coef

Line plot of mixed models / lsmeans results (with ggplot?)

I have longitudinal repeated measures on individuals over 4 timepoints. Following a mixed models analysis with time as fixed effect and random slopes I have used lsmeans to estimate the mean values at each time point as well as 95% confidence intervals. I would now like to plot a line graph with time points (x) and mean values of my outcome variable (y) with the CIs. Can I use e.g. ggplot to plot the results that I got from lsmeans? Or is there another smart way to plot this?
The results that I get from lsmeans, and that I would like to plot (lsmean, lower.CL, upperCL over time), are:
$lsmeans
time lsmean SE df lower.CL upper.CL
0 21.967213 0.5374422 60 20.892169 23.04226
1 16.069586 0.8392904 60 14.390755 17.74842
2 13.486802 0.8335159 60 11.819522 15.15408
3 9.495137 0.9854642 60 7.523915 11.46636
Confidence level used: 0.95
Is this what you meant?
# To convert from lsmeans output (d <- lsmeans(paramaters))
d <- summary(d)$lsmeans[c("lsmean", "lower.CL", "upper.CL")]
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2) +
geom_point(aes(y = lsmean), size = 3,
shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean",
title = "ls mean result over time") +
theme_bw()
To summarize, the whole code that will give you the estimates and plot of the mixed model is:
## random slope model
summary(model <- lme(outcome ~ time, random = ~1+time|ID, data = data,
na.action = na.exclude, method = "ML"))
## pairwise comparisons of timepoints
install.packages("lsmeans")
library(lsmeans)
lsmeans(model, pairwise~time, adjust="tukey")
### Draw the picture
d <- summary(lsmeans(model, ~time))
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean, group = 1)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
geom_point(aes(y = lsmean), size = 3, shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean", title = "ls mean result over time") +
theme_bw()

Resources