Calculate the Survival prediction using Cox Proportional Hazard model in R - r

I'm trying to calculate the Survival prediction using Cox Proportional Hazard model in R.
library(survival)
data(lung)
model<-coxph(Surv(time,status ==2)~age + sex + ph.karno + wt.loss, data=lung)
predict(model, data=lung, type ="expected")
When I use the above code, I get the Cumulative hazard's prediction corresponding to the formula
h^i(t)=h^0(t)exp(x′iβ^)
But my concern is all about predicting the Survival corresponding to the formula,
S^i(t)=S^0(t)exp(x′iβ^)
How do I predict the Survival in R?
Thanks in Advance.

You can use either predict or survfit. With predict you need to give the newdata argument a list with values for all the variables in the model:
predict(model,
newdata=list(time=100,status=1,age=60,sex=1, ph.karno=60,wt.loss=15),
type ="expected")
[1] 0.2007497
There's a plot method for survfit objects:
?survreg
png(); plot(survfit(model)); dev.off()

Related

How to obtain the QQ plot of a spline model R

I have a model that I've fitted using splines:
ssfit.3 <- smooth.spline(anage$lifespan ~ log(anage$Metabolic.by.mass),
df = 3)
I'm trying to obtain the model diagnostics such as the residual plot and the QQ plot for this model. I know for a linear model you can do
plot(lm)
which outputs all the different plots. How can I do this with spline models since plot(ssfit.3) does not output the same?
Extract the residuals and use qqnorm()/qqline().
example(smooth.spline) ## to get a model to work with
qqnorm(residuals(s2m))
qqline(residuals(s2m))

Parameter estimates and variance for stratified variables in Cox regression (strata / survival package)

I have run Cox regression using the survival package to calculate mortality hazard ratio of an exposure A. I have found that the age variable violated the proportional hazard assumption (with cox.zph) and used strata(age)to stratify age in further models.
I need a parameter estimate of the age variable, as well as the variance and the matrix of covariance (to calculate Rate Advancement Periods)... And I don't know where to find them!
Am I missing something or am I misunderstanding what strata is doing?
Here is a reproducible example, using the lung data from the survival package.
library(survival)
I create the survival object and do a first Cox regression with non-stratified age variable.
lung$SurvObj <- with(lung, Surv(time, status == 2))
coxreg1 <- coxph(SurvObj ~ age + sex, data = lung)
So, I get coefficients, variance, and covariance matrix for the parameter estimates.
> coxreg1$coefficients
age sex
0.01704533 -0.51321852
> vcov(coxreg1)
age sex
age 8.506877e-05 8.510634e-05
sex 8.510634e-05 2.804217e-02
Now, if do a second regression with the stratified age variable, I don't get any coefficient estimates, variance or covariance.
coxreg2 <- coxph(SurvObj ~ strata(age) + sex, data = lung)
> coxreg2$coefficients
sex
-0.64471
> vcov(coxreg2)
sex
sex 0.0449369
Thanks for the help!
When you use a variable for stratification you don't get any coefficient estimate for it. Instead separate baseline hazards are estimated for the different age groups.
The essence of a stratified cox regression is to fit a model that has a different baseline hazard in each stratum.

How can I get the probability density function from a regression random forest?

I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.

Predict Weibull instant hazard

I am trying to model baseline hazard using different survival models.
For example I can model the cumulative baseline hazard using Cox as follows:
library(survival)
df <- aml
cox <- coxph(Surv(time, status) ~ 1, df)
cox.summary <- basehaz(cox, centered = TRUE)
plot(cox.summary)
However Cox is not ideal for this and I would like to model instant hazards rather than cumulative, so I am trying to use Weibull as follows:
weibull <- survreg(Surv(time, status) ~ 1, data=df, dist="weibull")
However I cannot work out how to predict the instant hazard from this. For example if I try the following code I just get a constant value (38.18681) at each time point.
predict(weibull, newdata=data.frame(time=df$time))
Where am I going wrong here ? In case I wasn't clear my aim is to visualize the instantaneous hazard in a plot of hazard vs time.

both x and y variables are censored in a regression

Is there any existing routine in R to do Tobit regression model with both censored x and y varialbes? I know survreg function in survival pacakge can deal with censored response variables. What about the left censored x predictor variable?
There is a framework for both for tobit regression and for "interval"-censored variables in the survival package. This is Therneau's example using Tobin's original data:
tfit <- survreg(Surv(durable, durable>0, type='left') ~age + quant,
data=tobin, dist='gaussian')
predict(tfit,type="response")
And the Surv function will accept interval censoring.

Resources