I have a data frame that is made up individuals in different treatment groups, with a 1 for if they survived and a 0 for if they are dead, and a 3rd column indicating which dish. I ran a glmer model using lme4 package with Dish_ID as my random variable. I have a piece of code from base plot which plots the mortality rate against treatment group using the line from my glmer model. How can I write the same observed vs fitted plots in ggplot. I have tried looking online but cant seem to find an answer that explains the process.
I want to get the line from my binomial model (manually not using geom_smooth) and then plot my observed points in red in ggplot2. Thanks for the help.
library(tidyverse)
library(ggplot2)
library(lme4)
mortality_data$Dish_ID <- as.factor(mortality_data$Dish_ID)
mortality_model <- glmer(Survived ~ Treatment + (1|Dish_ID), data = mortality_data, family = "binomial")
summary(mortality_model)
plot(mortality_data$Treatment, 1 - fitted(mortality_model), ylim = c(0,1))
plot(mortality_data$Treatment, 1 - fitted(mortality_model), ylim = c(0,1), type = "l", xlab = "Concentration of Cu2SO4", ylab = "Mortality rate")
tv <- unique(mortality_data$Treatment)
#observed in red
for (i in tv) {
points(i, y = 1 - mean(mortality_data$Survived[mortality_data$Treatment == i]), col = "red")
}
my data frame looks something like this if it is of any use. There are 540 individuals, 90 for each treatment group
Treatment
Survived
Dish_ID
0.05
1
Dish_1
0.04
0
Dish_3
Related
I am using R. I am following this tutorial over here (https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/ ) and I am trying to adapt the code for a similar problem.
In this tutorial, a statistical model is developed on a dataset and then this statistical model is used to predict 3 news observations. We then plot the results for these 3 observations:
#load libraries
library(survival)
library(dplyr)
library(ranger)
library(data.table)
library(ggplot2)
#use the built in "lung" data set
#remove missing values (dataset is called "a")
a = na.omit(lung)
#create id variable
a$ID <- seq_along(a[,1])
#create test set with only the first 3 rows
new = a[1:3,]
#create a training set by removing first three rows
a = a[-c(1:3),]
#fit survival model (random survival forest)
r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data = a, mtry = 4, importance = "permutation", splitrule = "extratrees", verbose = TRUE)
#create new intermediate variables required for the survival curves
death_times <- r_fit$unique.death.times
surv_prob <-data.frame(r_fit$survival)
avg_prob <- sapply(surv_prob, mean)
#use survival model to produce estimated survival curves for the first three observations
pred <- predict(r_fit, new, type = 'response')$survival
pred <- data.table(pred)
colnames(pred) <- as.character(r_fit$unique.death.times)
#plot the results for these 3 patients
plot(r_fit$unique.death.times, pred[1,], type = "l", col = "red")
lines(r_fit$unique.death.times, r_fit$survival[2,], type = "l", col = "green")
lines(r_fit$unique.death.times, r_fit$survival[3,], type = "l", col = "blue")
From here, I would like to try an add confidence interval (confidence regions) to each of these 3 curves, so that they look something like this:
I found a previous stackoverflow post (survfit() Shade 95% confidence interval survival plot ) that shows how to do something similar, but I am not sure how to extend the results from this post to each individual observation.
Does anyone know if there is a direct way to add these confidence intervals?
Thanks
If you create your plot using ggplot, you can use the geom_ribbon function to draw confidence intervals as follows:
ggplot(data=...)+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )+
geom_line(aes(x=..., y=...),color=...)+
geom_ribbon(aes(x=.. ,ymin =.., ymax =..), fill=.. , alpha =.. )
You can put + after geom_line and repeat the same steps for each observation.
You can also check:
Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included and
https://bookdown.org/ripberjt/labbook/appendix-guide-to-data-visualization.html
I am using the R programming language. I am trying to follow this tutorial :https://rdrr.io/cran/randomForestSRC/man/plot.competing.risk.rfsrc.html
This tutorial shows how to use the "survival random forest" algorithm - an algorithm used to analyze survival data. In this example, the "follic" data set is used, the survival random forest algorithm is used to analyze the instant hazard of observation experiencing "status 1" vs "status 2" (this is called "competing risks).
In the code below, the survival random forest model is trained on the follic data set using all observations except the last two observations. Then, this model is used to predict the hazards of the last two observations:
#load library
library(randomForestSRC)
#load data
data(follic, package = "randomForestSRC")
#train model on all observations except the last 2 observations
follic.obj <- rfsrc(Surv(time, status) ~ ., follic[c(1:539),], nsplit = 3, ntree = 100)
#use model to predict the last two observations
f <- predict(follic.obj, follic[540:541, ])
#plot individual curves - does not work
plot.competing.risk(f)
However, this seems to produce the average hazards for the last two observations experiencing "status 1 vs status 2".
Is there a way to plot the individual hazards of the first observation and the second observation?
Thanks
EDIT1:
I know how to do this for other functions in this package, e.g. here you can plot these curves for 7 observations at once:
data(veteran, package = "randomForestSRC")
plot.survival(rfsrc(Surv(time, status)~ ., veteran), cens.model = "rfsrc")
## pbc data
data(pbc, package = "randomForestSRC")
pbc.obj <- rfsrc(Surv(days, status) ~ ., pbc)
## use subset to focus on specific individuals
plot.survival(pbc.obj, subset = c(3, 10))
This example seems to show the predicted survival curves for 7 observations (plus the confidence intervals - the red line is the average) at once. But I still do not know how to do this for the "plot.competing.risk" function.
EDIT2:
I think there might be an indirect way to solve this - you can predict each observation individually:
#use model to predict the last two observations individually
f1 <- predict(follic.obj, follic[540, ])
f2 <- predict(follic.obj, follic[541, ])
#plot individual curves
plot.competing.risk(f1)
plot.competing.risk(f2)
But I was hoping there was a more straightforward way to do this. Does anyone know how?
One possible way is to modify the function plot.competing.risk for individual line, and plot over a for loop for overlapping individual lines, as shown below.
#use model to predict the last three observations
f <- predict(follic.obj, follic[539:541, ])
x <- f
par(mfrow = c(2, 2))
for (k in 1:3) { #k for type of plot
for (i in 1:dim(x$chf)[1]) { #i for all individuals in x
#cschf <- apply(x$chf, c(2, 3), mean, na.rm = TRUE) #original group mean
cschf = x$chf[i,,] #individual values
#cif <- apply(x$cif, c(2, 3), mean, na.rm = TRUE) #original group mean
cif = x$cif[i,,] #individual values
cpc <- do.call(cbind, lapply(1:ncol(cif), function(j) {
cif[, j]/(1 - rowSums(cif[, -j, drop = FALSE]))
}))
if (k==1)
{matx = cschf
range = range(x$chf)
}
if (k==2)
{matx = cif
range = range(x$cif)
}
if (k==3)
{matx = cpc
range = c(0,1) #manually assign, for now
}
ylab = c("Cause-Specific CHF","Probability (%)","Probability (%)")[k]
matplot(x$time.interest, matx, type='l', lty=1, lwd=3, col=1:2,
add=ifelse(i==1,F,T), ylim=range, xlab="Time", ylab=ylab) #ADD tag for overlapping individual lines
}
legend <- paste(c("CSCHF","CIF","CPC")[k], 1:2, " ")
legend("bottomright", legend = legend, col = (1:2), lty = 1, lwd = 3)
}
My data are masses of offspring in kg, and a column of 1's and 0's to represent whether a mother was in her terminal year or not.
Chick Mass Terminal Effect
3.4 0
3.1 1
2.4 1
3.6 0
etc..
So I have a model fitted to assess whether mass (in kg) has an effect on mortality (binomial)
m10 <-glm(Terminal_Effect~chick_mass, data = cranesData, family = binomial(link="logit"))
summary(m10)
plot(cranesData$Terminal_Effect~cranesData$chick_mass, xlab = "Chick Mass (kg)", ylab = "Probability of Mother Death", pch = 19)
When I plot this, there are multiple lines on my plot, is there a way to change this to a single line?
Any help would be appreciated :)
Sort your predictor values before plotting.
using the iris dataset with some variation:
set.seed(111)
dat = iris
dat$Species = as.numeric(dat$Species=="setosa")
dat$Petal.Width = dat$Petal.Width + rnorm(nrow(dat))
Order the predictor, in this case it is petal.width:
dat = dat[order(dat$Petal.Width),]
fit = glm(Species ~ Petal.Width,data=dat,family="binomial")
plot(dat$Species ~ dat$Petal.Width)
lines(dat$Petal.Width,fit$fitted.values,col="blue")
I am trying to compare the survival in my study cohort with the survival in the Dutch general population (matched for age and sex). I created a rate table of the Dutch population.
library(relsurv)
setwd("")
nldpop <- transrate.hmd("mltper_1x1.txt","fltper_1x1.txt")
Then, I wanted to create a plot of the survival of my cohort (observed) and the survival of the population (expected) with age on the X-axis. However, the 'survexp' function does not seem to support a (start,stop,event)-format. Only with the normal (futime, event)-format it works, see below, but then I have follow-up time on the X-axis. Does anyone know how to get the age on the X-axis instead of follow-up time?
# Observed and expected survival with time on X-axis
fit <- survfit(Surv(futime, event)~1)
efit <- survexp(futime ~ 1, rmap = list(year=(date_entry), age=(age_entry), sex=(sex)),
ratetable=nldpop)
plot(fit)
lines(efit)
You didn't provide your example data, so i used survival::mgus data for this. Your problem may be due to incorrectly specifying variable names in the rmap option. See plot here
library(relsurv)
nldpop <- transrate.hmd("mltper_1x1.txt", "fltper_1x1.txt")
mgus2 <- mgus %>% mutate(date_year = dxyr + 1900)
fit <- survfit(Surv(futime, death) ~ 1, data = mgus2)
efit <- survexp(Surv(futime, death) ~ 1, data = mgus2,
ratetable = nldpop, rmap = list(age = age*365.25, year = date_year, sex = sex))
plot(fit)
lines(efit)
I built a logistic regression model (called 'mylogit') using the glm function in R as follows:
mylogit <- glm(answer ~ as.factor(gender) + age, data = mydata, family = "binomial")
where age is numeric and gender is categorical (male and female).
I then proceeded to make predictions with the model built.
pred <- predict(mylogit, type = "response")
I can easily make a time series plot of the predictions by doing:
plot.ts(ts(pred))
to give a plot that looks like this:
Plot of Time against Predictions
which gives a plot of the predictions.
My question is this:
Is it possible to put the x axis in segments according to gender (male or female) which was specified in the glm? In other words, can I have predictions on the y axis and have gender (divided into male and female) on the x axis?
A sample of the data I want to plot from is similar to this:
I did:
bind = cbind(mydata, pred)
'bind' looks like this:
pred age gender
0.9461198 32 male
0.9463577 45 female
0.9461198 45 female
0.9461198 37 female
0.9477645 40 male
0.8304513 32 female
Check out #4 on this blog post, "4. How To Create Two Different X- or Y-axes".
My suggestion to you is that you look at some of the dedicated R plotting tools, like ggplot2.
I don't think you need to use ts and plot.ts because the data you have is not a time series, right? Just sort pred before plotting.
# Get data
str <- "pred,age,gender
0.9461198,32,male
0.9463577,45,female
0.9461198,45,female
0.9461198,37,female
0.9477645,40,male
0.8304513,32,female"
bind <- read.csv(textConnection(str))
# Plot
bind <- bind[order(bind$gender),]
plot(bind$pred, col = bind$gender)
library(ggplot2)
ggplot(bind, aes(x = gender, y = pred)) +
geom_point(position = position_jitter(width = .3))
Or without creating bind you could do plot(pred[order(mydata$gender)]).