I'm trying to plot a Kaplan-Meier survival plot in R, but I'm having some trouble.
I'm quite new to R, so forgive my terrible code.
library(survival)
data_time = c(0.19,0.75,0.27,0.26,0.22,0.91,0.21,0.091,0.19,0.37,0.093,0.92,0.046,0.93,042)
data_event = c(1,1,1,1,0,0,1,1,0,0,0,1,1,1,0)
surv_object = Surv(time = data_time, event = data_event)
survfit(surv_object)
This of course gives me an error: "The survfit function requires a formula as its first argument".
I've split the data into two vectors, the first for the life-length, and the second for whether or not that specific data point was censored or not, with 0 meaning not censored, and 1 meaning censored.
I thought the Surv function was supposed to produce the formula required for the survfit function, with the default being the Kaplan-Meier.
The survfit function, as the name suggests, serves to fit a survival model, i.e. predicting survival based on some variables. The "formula" is the non-linear y = f(x) model that is fitted, expressed as Surv(...) ~ x1 + ... + xn.
However, it is definitely possible to do a Kaplan-Meier survival plot without any predictors. Just fitting the model on a constant (i.e. 1) should do the trick. Then, I like to use the ggsurvplot function from the survminer package.
install.packages("survminer")
library(survminer)
library(survival)
data_time = c(0.19,0.75,0.27,0.26,0.22,0.91,0.21,0.091,0.19,0.37,0.093,0.92,0.046,0.93,0.42)
data_event = c(1,1,1,1,0,0,1,1,0,0,0,1,1,1,0)
surv_object = Surv(time = data_time, event = data_event)
# Regress on a constant
fit <- survfit(surv_object ~ 1)
# Plot the fit
ggsurvplot(fit, data.frame(time=data_time, event=data_event), conf.int=FALSE)
Of course, the plot will be a lot more interesting if you're fitting some strata.
Note: I assume you missed a period in the last even time, and fixed it.
Related
I have fitted a lme model in R with a logit transformed response. I have not been able to find a direct command that does the logit transformation so I have done it manually.
logitr<-log(r/1-r)
I then use this as response in my lme model with interaction between two factors and a numerical variable.
model<-lme(logitr<-factor1*factor2*numeric,random=1|random)
Now, R obviously do not know that this model is logit transformed. How can I specify this to R?
I have without luck tried:
update(model, tran="logit")
The reason why I want to specify that the model is logit transformed is because I want to plot the backtransformed results using the function emmip in the emmeans package, showing the trends of the interaction between my variables.
Normally (if I only had factors) I would just use:
update_refgrid_model<-update(ref_grid(model, tran="logit"))
But this approach does not work when I want to use emmip to plot the trends of the interaction between a numerical variable and factors. If I specify:
emmip(update_refgrid_model, factor1~numeric|factor2, cov.reduce = range, type = "response")
then I do not get any trends plotted, only the estimate for the average level on the numerical variable.
So, how can I specify the logit transformation and plot the backtransformed trends of a lme model with factors interacting with numerical variables?
You don't update the model object, you update the reference grid:
rg = update(ref_grid(model, cov.reduce = range), tran = "logit")
emmip(rg, factor1~numeric|factor2, type = "response")
It is possible to update a model with other things, just not the transformation; that is in the update method for emmGrid objects.
Update
Here's an example showing how it works
require(emmeans)
## Loading required package: emmeans
foo = transform(fiber, p = (strength - 25)/25)
foo.lm = lm(log(p/(1-p)) ~ machine*diameter, data = foo)
emm = emmeans(foo.lm, ~diameter|machine,
tran = "logit", at = list(diameter = 15:32))
## Warning in ref_grid(object, ...): There are unevaluated constants in the response formula
## Auto-detection of the response transformation may be incorrect
emmip(emm, machine ~ diameter)
emmip(emm, machine ~ diameter, type = "r")
Created on 2020-06-02 by the reprex package (v0.3.0)
How can I plot predicted survival curves of a continuous covariate (let's say 20th and 80th percentile of the value) using the corrected group prognosis method as implemented in R by Therneau
For example,
library(survival)
library(survminer)
fit <- coxph( Surv(stop, event) ~ size + strata(rx), data = bladder )
ggadjustedcurves(fit, data=bladder, method = "conditional", strata=rx)
Now, this is useful because I am given two survival curves that are stratified by rx (either 0 or 1) and the conditional method is being acted upon the bladder data set. However, let's say I would like to use the marginal method but not stratify and instead plot my continuous covariate at 20th and 80th value but also re-balance the subpopulation. Would like any step in the right direction.
To re-state, I have a Cox model with continuous predictors. I would like to build a Cox model but not stratify on rx but have this in the model. Then, I want to pass the created Cox object into ggadjustedcurves() function with uses "subpopulation re-balancing" when given a reference data set. And then, instead of showing two survival curves stratified on a categorical variable, I want to plot two representative survival curves at the 20th and 80th percentile.
EDIT
My first attempt
fit2 <- coxph( Surv(stop, event) ~ size + rx, data = bladder ) #remove strata
fit2
# CGP
pred<- data.frame("rx" = 1, "size" = 3.2)
ggadjustedcurves(fit2, data = pred , method = "conditional", reference = bladder)
Is this what I think it is? Conditional re-balancing has been applied to the reference data set and then the predicted curves are generated for an individual with rx=1 and size of 3.2.
It is difficult to understand what you are truly looking for, but I think I have a rough idea. I think you want to plot the survival curve that would have been observed if every person in your sample had received a specific value for the continuous covariate. If there is no confounding, you can simply use a Cox model that includes only the continuous covariate and use the predict() function for a range of points in time and plot the results. If you need to adjust for confounding, you can include the confounders in the Cox model and use g-computation to obtain the desired probabilities. I describe this in a recent preprint: https://arxiv.org/pdf/2208.04644.pdf
This can be done in R using the contsurvplot package (also developed by me). First, install the package using:
devtools::install_github("RobinDenz1/contsurvplot")
Afterwards, fit your Cox model, but use x=TRUE in the coxph call:
library(survival)
library(contsurvplot)
library(riskRegression)
library(ggplot2)
fit2 <- coxph(Surv(stop, event) ~ size + rx, data=bladder, x=TRUE)
You can now call the plot_surv_lines function to obtain the causal survival curves for specific values of size, given the model. Using the horizon argument you can tell the function for which values you want to plot the survival curves. I choose the 20% and 80% quantile of size as you described:
plot_surv_lines(time="stop",
status="event",
variable="size",
data=bladder,
model=fit2,
horizon=quantile(bladder$size, probs=c(0.2, 0.8)))
The package contains a lot more plotting routines to visualize the causal effect of a continuous variable on a time-to-event outcome that might be more suitable for what you actually want.
I've plotted the response curves for each of my predictors against all of predicted values to determine how each predictor influences my counts. However, I also want to plot the binary part of my zero-inflated model to see how the predictors in the binary part of the zero-inflated model help explain the probability of false zeroes. I am trying to get a plot similar to the one at the bottom of the page of the link below however they don't provide reproducible code in that example.
https://fukamilab.github.io/BIO202/04-C-zero-data.html#sketch_fitted_and_predicted_values
I've included some code below where I have my zero-inflated model and the predictors used. I then use the predict function to predict the estimates for a much larger raster grid (new.data) and I want to see the response between those predicted values and the predictors I use across the entire raster grid.
mod1 = zeroinfl(Response~x1+x2|x1,link ="logit",data=data,
dist="negbin")
modpred=predict(mod1, new.data, se.fit=T, type = "response")
response1 <- ggplot(data, aes(x = x1, y = modpred)) + geom_point()+
+geom_smooth(data = data, aes(x = x1, y = modpred))
I'm trying to plot the resultant curve from fitting a non-linear mixed model. It should be something like a curve of a normal distribution but skewed to the right. I followed previous links here and here, but when I use my data I can not make it happen for different difficulties (see below).
Here is the dataset
and code
s=read.csv("GRVMAX tadpoles.csv")
t=s[s$SPP== levels(s$SPP)[1],]
head(t)
vmax=t[t$PERFOR=="VMAX",]
colnames(vmax)[6]="vmax"
vmax$TEM=as.numeric(as.character(vmax$TEM));
require(lme4)
start =c(TEM=25)
is.numeric(start)
nm1 <- nlmer ( vmax ~ deriv(TEM)~TEM|INDIVIDUO,nlpars=start, nAGQ =0,data= vmax)# this gives an error suggesting nlpars is not numeric, despite start is numeric...:~/
After that, I want to plot the curve over the original data
with(vmax,plot(vmax ~ (TEM)))
x=vmax$TEM
lines(x, predict(nm1, newdata = data.frame(TEM = x, INDIVIDUO = "ACI5")))
Any hint?
Thanks in advance
I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.