R-Weibull Survival Model - r

I'm trying estimate the transitional probabilities from healthy to diseased, however, I'm getting a negative shape and scale parameter, and hazard ratio is NaN.
I'm very new to survival analysis and also R, I think I'm making some mistakes while data processing unable to figure that out.
Here is my code:
# fit the Weibull survival model
fit <- survreg(Surv(time, event) ~ 1, dist="weibull")
# extract the shape and scale parameters from the fitted model
shape <- summary(fit)$coef[2]
scale <- summary(fit)$coef[1]/shape
# calculate the hazard rate
haz <- scale * (time/scale)^(shape-1)
# estimate the incidence rate and mortality rate
incidence_rate <- haz * exp(-haz * time)
mortality_rate <- haz
# calculate the transition probabilities between health states
trans_prob <- cbind(incidence_rate, mortality_rate)

Related

differences between sampleSurv, getFitEsts and getSCurves in icenReg

I would like to
obtain the predicted time of the event, given a set of covariates
obtain the time at which the risk is equal to my specified threshold,
given covariates obtain the risk, given time and covariates
All this using ic_par (parametric) or ic_npar (non-parametric) or ic_sp (semi-parametric) models (not bayesian models) from icenReg
There are 3 functions in icenReg (https://cran.r-project.org/web/packages/icenReg/icenReg.pdf) that I believe do at least two of those things:
sampleSurv
getFitEsts
getSCurves
Can someone explain what those three functions do? Especially the difference between sampleSurv and getFitEsts?
From what I understand, the time to the event is modelled as a probability curve.
So you do not obtain a defined predicted time to the event, but rather a probability of this event occuring through time.
Thus, you can obtain the probability of the event to occur after X days, or you can obtain the time at which the event has a probability of X % to have occured.
getFitEsts() will provide these 2 estimates from an object previously fitted by ic_sp(), ic_par() or ic_bayes()
Here is an example of how to obtain these estimates, with an example from icenReg package :
data("IR_diabetes")
flatPrior_fit <- ic_bayes(cbind(left, right) ~ gender, data = IR_diabetes, model = "po", dist = "gamma")
newdata <- data.frame(gender = c(unique(IR_diabetes$gender)))
rownames(newdata) <- c(as.character(unique(IR_diabetes$gender)))
# plot the survival probability curve
plot(flatPrior_fit)
# plot the same curve according to each factor
plot(flatPrior_fit,newdata)
maleCovs <- data.frame(gender = c("male"))
femaleCovs <- data.frame(gender = c("female"))
# median survival time as calculated by the model if males and females are considered together
# = 50 % probability of the event occurring
getFitEsts(flatPrior_fit , p = 0.5)
# median survival time for males
getFitEsts(flatPrior_fit, newdata = maleCovs, p = 0.5)
# median survival time for females
getFitEsts(flatPrior_fit, newdata = femaleCovs, p = 0.5)
# Probability that males died at day 15 ( = 1 - probability that they survived )
getFitEsts(flatPrior_fit, newdata = maleCovs, q = 15)
# Probability that males died at day 15 ( = 1 - probability that they survived )
getFitEsts(flatPrior_fit, newdata = femaleCovs, q = 15)
getScurves() work only for semi parametric models.
It allows to obtain the interval of the survival probability for each time step, as plotted on the curve :
data("IR_diabetes")
# fit a semi parametric model (proportional odds)
sp_fit <- ic_sp(cbind(left, right) ~ gender, data = IR_diabetes, model = "po")
# plot the survival curve
plot(sp_fit, newdata)
# obtain the intervals and associated survival probability of this survival curve for each time step
getSCurves(sp_fit,newdata)
Finally, sampleSurv() draw samples from the probability curve you fitted, contained between the intervals computed, according to the quantile you need. These results are variable because there is multiple possibilities between these intervals.
I hope it helped a bit to understand these functions

MASS:: fitdistr negative binomial with weights in R

We are carrying out an Operational Risk study, in particular we are fitting a severity frequency function with a negative binomial as follows:
# Negative Binomial Fitting
fit = MASS::fitdistr(datosf$Freq,"negative binomial")[[1]]
BN_s <- fit[1]
BN_mu <- fit[2]
# fitdistr parametrises the BN with size and mu, we calculate the parameter p as size/(size+mu)
BN_prob<-fit[1]/(fit[1]+fit[2])
# scale size to model annual frequency
BN_size= BN_s*f_escala
# goodness-of-fit test
chi_2_test = chisq.test(datosf$Freq,rnbinom(n=l,size=BN_s,prob=BN_prob))
# goodness-of-fit plot
nbinom = function(x)dnbinom(x, size = BN_s, mu = BN_mu)
hist(datosf$Freq, freq=FALSE, nclass=50)
curve(nbinom, from=0, to=max(datosf$Freq), n=max(datosf$Freq)+1, add=TRUE, col="blue")
In the data frame datosf$Freq we have the frequency (of the historical series) grouped monthly.
Currently, we have the objective of weighting these years according to the time horizon using the function:
w(t) = 1.05 - t/20 where t is the number of years and t=1,....,10
i.e. the objective is to maximise the following likelihood function:
L(x_i,\theta) = \prod_{i} w_i f(x_i,\theta)
Where x_i is the frequency and f(x_i) is the negative binomial density function.
How can we readapt the code to include the weights w_i?
Thank you very much!

How to calculate individual restricted mean survival time (RMST) in R?

I encountered some issues when calculating restricted mean survival time (RMST) in R and I made some attempts.
Here is the idea that I tried to calculate the RMST by myself.
i) I fitted a cox regression model to get estimated function of h(t), and I deploy individual covariables to calculate individual h(t);
ii) I derived individual survival curve S(t) by the above individual h(t);
iii) I then calculated individual RMST by the above individual S(t) with the following formula: RMST = integrate(S(t)) by 0 to tau. (I don't know how to put a formal formula here and I am sure you can understand what I am saying).
I have tried the above method to calculate individual RMST with the following R code:
# load R package
library(survRM2)
library(survival)
# generate example
D <- rmst2.sample.data()
time <- D$time
status <- D$status
x <- D[,c(4,6,7)]
# fit cox regression model with weibull baseline
fit<-survreg(Surv(time,status)~ x[[1]] + x[[2]]+ x[[3]],data = D,dist = "weibull")
# get cox regression coefficients of covariables
beta=fit$coefficients
# get paramaters within baseline hazard
gamma.weibull=fit$scale
# cutomize a function to calculate individual hazard
hazard <- function(u,x1,x2,x3) {
gamma.weibull*u^(gamma.weibull-1)*exp(beta[1]+beta[2]*x1+beta[3]*x2+beta[4]*x3)
}
# cutomize a function to calculate individual survival
surv <-function(t,x1,x2,x3) {
sapply(t,function(z){
exp(-integrate(hazard,lower=0,upper=z,x1=x1,x2=x2,x3=x3)$value)
}
)
}
rmst <- c() # genrate a empty vector
for(i in 1:312) { # 312 is the sample size
rmst[i]=integrate(surv,0,5,x1=x[[1]][i],x2=x[[2]][i],x3=x[[3]][i])$value
}
# Error in integrate(surv, 0, 5, x1 = x[[1]][i], x2 = x[[2]][i], x3 = x[[3]][i]) :
# the integral is probably divergent
I have three questions:
1) Is there anything wrong about my idea or computational process?
2) In the step iii), there are some cases that integrals are non-integrable (that is, integrals do not converge). Is there any solution, or should I use approximate evaluation?
3) One last shoot, is there any better method to calculate this individual RMST?

Parameters and AUC and IC50 of a dose response curve

I have these dose response data:
df <- data.frame(viability=c(14,81,58,78,71,83,64,16,32,100,100,81,86,83,100,90,15,100,38,100,91,84,92,100),
dose=c(10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061,10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061,10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061),
stringsAsFactors=F)
I then use the drc package's drm function to fit a log-logistic curve to these data:
library(drc)
fit <- drm(viability~dose,data=df,fct=LL.4(names=c("slope","low","high","ED50")),type="continuous")
> summary(fit)
Model fitted: Log-logistic (ED50 as parameter) (4 parms)
Parameter estimates:
Estimate Std. Error t-value p-value
slope:(Intercept) 5.15328 18.07742 0.28507 0.7785
low:(Intercept) 20.19430 12.61122 1.60130 0.1250
high:(Intercept) 83.33181 4.96736 16.77586 0.0000
ED50:(Intercept) 2.98733 1.99685 1.49602 0.1503
Residual standard error:
21.0743 (20 degrees of freedom)
I then generate predictions so I'll be able to plot the curve:
pred.df <- expand.grid(dose=exp(seq(log(max(df$dose)),log(min(df$dose)),length=100)))
pred <- predict(fit,newdata=pred.df,interval="confidence")
pred.df$viability <- pmax(pred[,1],0)
pred.df$viability <- pmin(pred.df$viability,100)
pred.df$viability.low <- pmax(pred[,2],0)
pred.df$viability.low <- pmin(pred.df$viability.low,100)
pred.df$viability.high <- pmax(pred[,3],0)
pred.df$viability.high <- pmin(pred.df$viability.high,100)
I also use the PharmacoGx Bioconductor package to compute AUC and IC50 for both the curve and its high and low bounds:
library(PharmacoGx)
auc.mid <- computeAUC(rev(pred.df$dose),rev(pred.df$viability))/((max(pred.df$viability)-min(pred.df$viability))*(max(pred.df$dose)-min(pred.df$dose)))
auc.low <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.low))/((max(pred.df$viability.low)-min(pred.df$viability.low))*(max(pred.df)-min(pred.df$dose)))
auc.high <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.high))/((max(pred.df$viability.high)-min(pred.df$viability.high))*(max(pred.df$dose)-min(pred.df$dose)))
ic50.mid <- computeIC50(rev(pred.df$dose),rev(pred.df$viability))
ic50.low <- computeIC50(rev(pred.df$dose),rev(pred.df$viability.low))
ic50.high <- computeIC50(rev(pred.df$dose),rev(pred.df$viability.high))
Ceating a table with all the parameters so I can plot everything together:
ann.df <- data.frame(param=c("slope","low","high","ED50","auc.mid","auc.high","auc.low","ic50.mid","ic50.high","ic50.low"),value=signif(c(summary(fit)$coefficient[,1],auc.mid,auc.high,auc.low,ic50.mid,ic50.high,ic50.low),2),stringsAsFactors=F)
And finally plotting it all:
library(ggplot2)
library(grid)
library(gridExtra)
pl <- ggplot(df,aes(x=dose,y=viability))+geom_point()+geom_ribbon(data=pred.df,aes(x=dose,y=viability,ymin=viability.low,ymax=viability.high),alpha=0.2)+labs(y="viability")+
geom_line(data=pred.df,aes(x=dose,y=viability))+coord_trans(x="log")+theme_bw()+scale_x_continuous(name="dose",breaks=sort(unique(df$dose)),labels=format(signif(sort(unique(df$dose)),3),scientific=T))
ggdraw(pl)+draw_grob(tableGrob(ann.df,rows=NULL),x=0.1,y=0.175,width=0.3,height=0.4)
Which gives:
My questions are:
I thought that slope should be negative. How come it's 5.2?
the auc.mid, auc.high, and auc.lowcumputed as:
auc.mid <- computeAUC(rev(pred.df$dose),rev(pred.df$viability))
auc.low <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.low))
auc.high <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.high))
give 21.47818, 37.52389, and 2.678228, respectively.
Since these are not in the [0,1] range I thought that divinding them by the area under the highest corresponding viability will give what I'm looking for, i.e., relative AUC, but these values seem too low relative to what the figure shows. What are these AUCs then?
Also, how come auc.mid > auc.low > auc.high? I would think that it should be auc.high > auc.mid > auc.low
The IC50 values also seem a little low. Do they make sense?
Bonus question: how do I avoid the trailing zeros in slope, low, high, ED50, ic50.mid, and ic50.high in the figure?
The parameter you are pulling out is the hill slope parameter, or the coefficient in front of the concentration variable in the exponential, not the actual slope of the curve.
The AUC provided is in the [0-100] range, for the area above the curve. I ran the code and got the order as auc.low>auc.mid>auc.high. Traditionally the area under the response curve was reported, or 1-viability.
It is important to note that the PharmacoGx package uses a 3 parameter hill slope model, similar to LL.3 in drc. Therefore, the plot will not correspond to the function fit by PharmacoGx to calculate the IC50 or AUC.
Source: PharmacoGx dev.

Format in R for point prediction of survival analysis

I am befuddled by the format to perform a simple prediction using R's survival package
library(survival)
lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)
So fitting a simple exponential regression (for example purposes only) is:
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")
How would I predict the percent survival at time=400?
When I use the following:
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")
I get the following:
myPredict400
1
421.7758
I was expecting something like 37% so I am missing something pretty obvious
The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.
Try this:
library(survival)
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution
pct <- 1:99/100 # this creates the empirical survival probabilities
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct)
indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400
print(1 - pct[indx]) # 0.39
Straight from the help docs, here's a plot of it:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)
Edited
You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is inf because the survival rate in the 100th percentile is infinite. This is what the quantile and pct arguments do.
For example, setting pct = 1:999/1000 you get much more precise values for the prediction (myPredict400). Also, if you set pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.

Resources