Plotting Hazard Function with bshazard package + Hazard ratios in life table - r

I'm trying to plot the hazard function of a survival analysis I'm doing for my PhD, comparing the hazard rate of two different conditions.
I can't find a way to make the code function as intended (here for reference, Fig. 4, page 7), in order to obtain the confidence intervals of the smoothed hazard lines for both levels of the predictor variable.
I'm adding my code for reference:
fitt<-bshazard(Surv(time,event) ~ session.type,data=data,lambda=10,nbin=60)
plot(fitt,overall=FALSE, col=1, conf.int = TRUE)
The function "overall=FALSE" gives me two smoothed hazard curves both does not include the confidence intervals, which I need to extrapolate results from the plot. Here an image of the plot I obtained from the code:
If anyone knows a way to obtain the hazard rates (with upper and lower confidence intervals) in a time table in order to know those values for each time interval, it would help me a lot.
Thanks to anyone who could help!

One way is to run the function bshazard stratifying data by the two levels of the session.type. Considering session.type with two levels (for example 0 and 1) your code to obtain the hazard rates (with upper and lower confidence intervals) is:
-for session.type = 0:
Fitt0 <- bshazard(Surv(time,evento) ~1, data= data [data$session.type ==0,],lambda=10,nbin=60)
plot(fitt0,overall=TRUE, col=1, conf.int = TRUE)
-for level session.type=1
Fitt1 <- bshazard(Surv(time,evento) ~1, data= data [data$session.type ==1,],lambda=10,nbin=60)
plot(fitt1,overall=TRUE, col=1, conf.int = TRUE)

Related

95% confidence interval for smooth.spline in R [duplicate]

I have used smooth.spline to estimate a cubic spline for my data. But when I calculate the 90% point-wise confidence interval using equation, the results seems to be a little bit off. Can someone please tell me if I did it wrongly? I am just wondering if there is a function that can automatically calculate a point-wise interval band associated with smooth.spline function.
boneMaleSmooth = smooth.spline( bone[males,"age"], bone[males,"spnbmd"], cv=FALSE)
error90_male = qnorm(.95)*sd(boneMaleSmooth$x)/sqrt(length(boneMaleSmooth$x))
plot(boneMaleSmooth, ylim=c(-0.5,0.5), col="blue", lwd=3, type="l", xlab="Age",
ylab="Relative Change in Spinal BMD")
points(bone[males,c(2,4)], col="blue", pch=20)
lines(boneMaleSmooth$x,boneMaleSmooth$y+error90_male, col="purple",lty=3,lwd=3)
lines(boneMaleSmooth$x,boneMaleSmooth$y-error90_male, col="purple",lty=3,lwd=3)
Because I am not sure if I did it correctly, then I used gam() function from mgcv package.
It instantly gave a confidence band but I am not sure if it is 90% or 95% CI or something else. It would be great if someone can explain.
males=gam(bone[males,c(2,4)]$spnbmd ~s(bone[males,c(2,4)]$age), method = "GCV.Cp")
plot(males,xlab="Age",ylab="Relative Change in Spinal BMD")
I'm not sure the confidence intervals for smooth.spline have "nice" confidence intervals like those form lowess do. But I found a code sample from a CMU Data Analysis course to make Bayesian bootstap confidence intervals.
Here are the functions used and an example. The main function is spline.cis where the first parameter is a data frame where the first column are the x values and the second column are the y values. The other important parameter is B which indicates the number bootstrap replications to do. (See the linked PDF above for the full details.)
# Helper functions
resampler <- function(data) {
n <- nrow(data)
resample.rows <- sample(1:n,size=n,replace=TRUE)
return(data[resample.rows,])
}
spline.estimator <- function(data,m=300) {
fit <- smooth.spline(x=data[,1],y=data[,2],cv=TRUE)
eval.grid <- seq(from=min(data[,1]),to=max(data[,1]),length.out=m)
return(predict(fit,x=eval.grid)$y) # We only want the predicted values
}
spline.cis <- function(data,B,alpha=0.05,m=300) {
spline.main <- spline.estimator(data,m=m)
spline.boots <- replicate(B,spline.estimator(resampler(data),m=m))
cis.lower <- 2*spline.main - apply(spline.boots,1,quantile,probs=1-alpha/2)
cis.upper <- 2*spline.main - apply(spline.boots,1,quantile,probs=alpha/2)
return(list(main.curve=spline.main,lower.ci=cis.lower,upper.ci=cis.upper,
x=seq(from=min(data[,1]),to=max(data[,1]),length.out=m)))
}
#sample data
data<-data.frame(x=rnorm(100), y=rnorm(100))
#run and plot
sp.cis <- spline.cis(data, B=1000,alpha=0.05)
plot(data[,1],data[,2])
lines(x=sp.cis$x,y=sp.cis$main.curve)
lines(x=sp.cis$x,y=sp.cis$lower.ci, lty=2)
lines(x=sp.cis$x,y=sp.cis$upper.ci, lty=2)
And that gives something like
Actually it looks like there might be a more parametric way to calculate confidence intervals using the jackknife residuals. This code comes from the S+ help page for smooth.spline
fit <- smooth.spline(data$x, data$y) # smooth.spline fit
res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals
sigma <- sqrt(var(res)) # estimate sd
upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band
lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band
matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")
And that results in
And as far as the gam confidence intervals go, if you read the print.gam help file, there is an se= parameter with default TRUE and the docs say
when TRUE (default) upper and lower lines are added to the 1-d plots at 2 standard errors above and below the estimate of the smooth being plotted while for 2-d plots, surfaces at +1 and -1 standard errors are contoured and overlayed on the contour plot for the estimate. If a positive number is supplied then this number is multiplied by the standard errors when calculating standard error curves or surfaces. See also shade, below.
So you can adjust the confidence interval by adjusting this parameter. (This would be in the print() call.)
The R package mgcv calculates smoothing splines and Bayesian "confidence intervals." These are not confidence intervals in the usual (frequentist) sense, but numerical simulations have shown that there is almost no difference; see the linked paper by Marra and Wood in the help file of mgcv.
library(SemiPar)
data(lidar)
require(mgcv)
fit=gam(range~s(logratio), data = lidar)
plot(fit)
with(lidar, points(logratio, range-mean(range)))

How to create a ROC curve for a model which has log of odds as response?

I have a question on plotting ROC curve for my model which has log of odds as the response. For example:
model<-lm((ln(y/1-y)~Temp+RH+DmaxT, data=fit) #'y' is a proportion
Predicted response was obtained for a new data set as:
Predicted_model<-predict(model, newdata, type = 'response')
Predicted values were back-transformed to get values in proportion
I have new observations in proportion and I used 0.05 cutoff value to represent control (<0.05) and cases (>0.05)
newdata$observed<-ifelse(newdata$observed > 0.05, "cases", "controls")
I plotted ROC curve using the following formula
roc(newdata$observed, predicted_model_backtrans, legacy.axes = TRUE, plot = TRUE, print.auc = TRUE)
With this formula, I got AUC value 1 and the plot is different than expected. I couldn't figure out what would be the best way to create ROC curve for my model type. Any help would be appreciated.
I also tried to create ROC curve where I changed observed and predicted proportion into binary characteristics (control (<0.05) and cases (>0.05)) which gave me straight line curve rather than smooth.

Plot a quadratic relation for a predictor of a cox regression with R

I need to plot the relative risk for a quadratic effect in a cox regression. My model looks like this:
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
Where meal.cal_q is defined as:
lung$meal.cal_q <- lung$meal.cal^2
The plot should consider the coefficients of meal.cal and meal.cal_q and show the relative risk on the y-axis and the meal.cal values on the x-axis. The relative risk should be defined as the risk at a given meal.cal value compared to all of the predictors being at their mean. Additionaly, the plot should include the 95% confidence intervals. The plot should look something like this:
Expected plot
If possible, the plot should be a ggplot object so that I can customize it.
I have been reading for hours on the web, but can not figure out how make the described plot and hope someone can help me. I tried it for example with the predict() function:
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal, na.rm= TRUE), by= 1)
meal.cal_q_new <- meal.cal_new^2
n <- length(meal.cal_new)
lung_new <- data.frame(ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n), pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n), meal.cal= meal.cal_new, meal.cal_q = meal.cal_q_new)
predicted_rel_risk <- predict(cox_mod, lung_new, interval = "confidence")
print(predicted_rel_risk)
Firstly, the predicted values do not include the 95% confidence itnervals. And in addition there are negative values in predicted_rel_risk which in my opinien should not be the case since the minimal relative risk should be zero.
Therefore I can not get the desired plot. So all I can do is this:
lung_new$predicted_rel_risk <- predicted_rel_risk
ggplot(lung_new, aes(meal.cal, predicted_rel_risk)) +
geom_smooth(se= TRUE)
The resulting plot does not include the confidence intervals and shows neagtive relative risk. Here is what I get:
Thank you a lot in advance!
The prediction includes negative values since you did not specify that you want to obtain the relative risk (as you stated). Try the following code
predicted_rel_risk <- predict(cox_mod, lung_new, interval = "confidence",
type= "risk")
This gives you the following plot:
Plot without negative values
In order to get the confidence intervalls as well, you can use bootstrapping. To put it short, this means that a random sample will be drawn from your data and the relative risk will be calculated. This procedure will be repeated 10,000 times, for example. This gives you 10,000 different relative risk values for each value of your predictor. You get the main line for your plot by calculating the mean relative risk for each value of your predictor. To get the condidence intervall, you need to order the relative risks from the smallest to the greatest for each value of your predictor. The 250th (9,750th) relative risk value gives you your lower (upper) ci. Again, it is the 250th (9,750th) value of each predictor value.
Hope this helps.

How can I get the probability density function from a regression random forest?

I am using random-forest for a regression problem to predict the label values of Test-Y for a given set of Test-X (new values of features). The model has been trained over a given Train-X (features) and Train-Y (labels). "randomForest" of R serves me very well in predicting the numerical values of Test-Y. But this is not all I want.
Instead of only a number, I want to use random-forest to produce a probability density function. I searched for a solution for several days and here is I found so far:
"randomForest" doesn't produce probabilities for regression, but only in classification. (via "predict" and setting type=prob).
Using "quantregForest" provides a nice way to make and visualize prediction intervals. But still not the probability density function!
Any other thought on this?
Please see the predict.all parameter of the predict.randomForest function.
library("ggplot2")
library("randomForest")
data(mpg)
rf = randomForest(cty ~ displ + cyl + trans, data = mpg)
# Predict the first car in the dataset
pred = predict(rf, newdata = mpg[1, ], predict.all = TRUE)
hist(pred$individual)
The histogram of 500 "elementary" predictions looks like this:
You can also use quantregForest with a very fine grid of quantiles, convert them into a "cumulative distribution function (cdf)" with R-function ecdf and convert this cdf into a density estimation with a kernel density estimator.

Any simple way to get regression prediction intervals in R?

I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command:
> confint(W1500.LR1, level = 0.95)
2.5 % 97.5 %
(Intercept) 666.2817393 668.0216072
Distance 0.3934499 0.3946572
which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the Prediction Intervals. Only thing I have learnt so far is that, I can get the prediction intervals for specific values of Distance (say 200, 500, etc.) using the code:
predict(W1500.LR1, newdata, interval="predict")
This is not useful for me because I have over 300K different distance values, requiring to run this code for each of them. Any simple way to get the prediction intervals like the confint command I showed above?
Had to make up my own data but here you go
x = rnorm(300000)
y = jitter(3*x,1000)
fit = lm(y~x)
#Prediction intervals
pred.int = predict(fit,interval="prediction")
#Confidence intervals
conf.int = predict(fit,interval="confidence")
fitted.values = pred.int[,1]
pred.lower = pred.int[,2]
pred.upper = pred.int[,3]
plot(x[1:1000],y[1:1000])
lines(x[1:1000],fitted.values[1:1000],col="red",lwd=2)
lines(x[1:1000],pred.lower[1:1000],lwd=2,col="blue")
lines(x[1:1000],pred.upper[1:1000],lwd=2,col="blue")
So as you can see your prediction is for predicting new data values and not for constructing intervals for the beta coefficients. So the confidence intervals you actually want would be obtained in the same fashion from conf.int.

Resources