I have these dose response data:
df <- data.frame(viability=c(14,81,58,78,71,83,64,16,32,100,100,81,86,83,100,90,15,100,38,100,91,84,92,100),
dose=c(10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061,10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061,10,0.62,2.5,0.16,0.039,0.0024,0.0098,0.00061),
stringsAsFactors=F)
I then use the drc package's drm function to fit a log-logistic curve to these data:
library(drc)
fit <- drm(viability~dose,data=df,fct=LL.4(names=c("slope","low","high","ED50")),type="continuous")
> summary(fit)
Model fitted: Log-logistic (ED50 as parameter) (4 parms)
Parameter estimates:
Estimate Std. Error t-value p-value
slope:(Intercept) 5.15328 18.07742 0.28507 0.7785
low:(Intercept) 20.19430 12.61122 1.60130 0.1250
high:(Intercept) 83.33181 4.96736 16.77586 0.0000
ED50:(Intercept) 2.98733 1.99685 1.49602 0.1503
Residual standard error:
21.0743 (20 degrees of freedom)
I then generate predictions so I'll be able to plot the curve:
pred.df <- expand.grid(dose=exp(seq(log(max(df$dose)),log(min(df$dose)),length=100)))
pred <- predict(fit,newdata=pred.df,interval="confidence")
pred.df$viability <- pmax(pred[,1],0)
pred.df$viability <- pmin(pred.df$viability,100)
pred.df$viability.low <- pmax(pred[,2],0)
pred.df$viability.low <- pmin(pred.df$viability.low,100)
pred.df$viability.high <- pmax(pred[,3],0)
pred.df$viability.high <- pmin(pred.df$viability.high,100)
I also use the PharmacoGx Bioconductor package to compute AUC and IC50 for both the curve and its high and low bounds:
library(PharmacoGx)
auc.mid <- computeAUC(rev(pred.df$dose),rev(pred.df$viability))/((max(pred.df$viability)-min(pred.df$viability))*(max(pred.df$dose)-min(pred.df$dose)))
auc.low <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.low))/((max(pred.df$viability.low)-min(pred.df$viability.low))*(max(pred.df)-min(pred.df$dose)))
auc.high <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.high))/((max(pred.df$viability.high)-min(pred.df$viability.high))*(max(pred.df$dose)-min(pred.df$dose)))
ic50.mid <- computeIC50(rev(pred.df$dose),rev(pred.df$viability))
ic50.low <- computeIC50(rev(pred.df$dose),rev(pred.df$viability.low))
ic50.high <- computeIC50(rev(pred.df$dose),rev(pred.df$viability.high))
Ceating a table with all the parameters so I can plot everything together:
ann.df <- data.frame(param=c("slope","low","high","ED50","auc.mid","auc.high","auc.low","ic50.mid","ic50.high","ic50.low"),value=signif(c(summary(fit)$coefficient[,1],auc.mid,auc.high,auc.low,ic50.mid,ic50.high,ic50.low),2),stringsAsFactors=F)
And finally plotting it all:
library(ggplot2)
library(grid)
library(gridExtra)
pl <- ggplot(df,aes(x=dose,y=viability))+geom_point()+geom_ribbon(data=pred.df,aes(x=dose,y=viability,ymin=viability.low,ymax=viability.high),alpha=0.2)+labs(y="viability")+
geom_line(data=pred.df,aes(x=dose,y=viability))+coord_trans(x="log")+theme_bw()+scale_x_continuous(name="dose",breaks=sort(unique(df$dose)),labels=format(signif(sort(unique(df$dose)),3),scientific=T))
ggdraw(pl)+draw_grob(tableGrob(ann.df,rows=NULL),x=0.1,y=0.175,width=0.3,height=0.4)
Which gives:
My questions are:
I thought that slope should be negative. How come it's 5.2?
the auc.mid, auc.high, and auc.lowcumputed as:
auc.mid <- computeAUC(rev(pred.df$dose),rev(pred.df$viability))
auc.low <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.low))
auc.high <- computeAUC(rev(pred.df$dose),rev(pred.df$viability.high))
give 21.47818, 37.52389, and 2.678228, respectively.
Since these are not in the [0,1] range I thought that divinding them by the area under the highest corresponding viability will give what I'm looking for, i.e., relative AUC, but these values seem too low relative to what the figure shows. What are these AUCs then?
Also, how come auc.mid > auc.low > auc.high? I would think that it should be auc.high > auc.mid > auc.low
The IC50 values also seem a little low. Do they make sense?
Bonus question: how do I avoid the trailing zeros in slope, low, high, ED50, ic50.mid, and ic50.high in the figure?
The parameter you are pulling out is the hill slope parameter, or the coefficient in front of the concentration variable in the exponential, not the actual slope of the curve.
The AUC provided is in the [0-100] range, for the area above the curve. I ran the code and got the order as auc.low>auc.mid>auc.high. Traditionally the area under the response curve was reported, or 1-viability.
It is important to note that the PharmacoGx package uses a 3 parameter hill slope model, similar to LL.3 in drc. Therefore, the plot will not correspond to the function fit by PharmacoGx to calculate the IC50 or AUC.
Source: PharmacoGx dev.
Related
I am trying to find the value for BMI at each consecutive Odds ratio (1,2,3, etc) for my restricted cubic spline I created using the rms package. I am struggling to find a way to do this. I am aware that the summary function can be used to find the effect from the lower to upper interquartile range (or any specified range), however, I would like to find the value on my X-axis at (for example) at the intersection at odds of 2.0, as well as the 95%CI at that point. Does anyone have any experience or insight on accomplishing this?
library(rms)
ddist <- datadist(df)
options(datadist='ddist')
k <- with(df, quantile(X, c(.05, 0.25, 0.50, .75, .95)))
k
ddist$limits["Adjust to","X"] <- 24.37
spline_model <- lrm(Y ~ rcs(X, k), data=df)
summary(spline_model)
dataplot <- Predict(spline_model, BMI_NUM, ref.zero=TRUE, fun=exp)
Below is the output for summary
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
BMI_NUM 21.239 28.147 6.9084 0.19187 0.10947 -0.022679 0.40642
Odds Ratio 21.239 28.147 6.9084 1.21150 NA 0.977580 1.50140
I figured out what I was looking for.
All that I needed to do was run print(dataplot) and it produced a dataframe of my continuous variable, Odds Ratio, and upper/lower 95% Confidence interval.
I then just wrote it to a csv write.csv(dataplot,"Spline_values.csv") and that was all that I needed. Thanks!
When fitting a drm model to my dose response data the obtained ED50 is off
log(ED50), log2(ED50), log10(ED50)
# here is some actual data
test_response <- c(0,1.130912987,-2.0159852,6.158574058,5.679161469,12.97,20.46711451,35.55009039,66.31837697)
test_dose <- c(0,0.006103516,0.024414063,0.09765625,0.39,1.5625,6.25,25,100)
test_df <- data.frame(dose=test_dose, response=test_response)
# fitting a Hill formula
m1 <- drm(response ~ dose, data=test_df, fct=LL.4())
# optaining ED50
ED(m1, 50)
I get an estimated ED50 of 1649. When I fit the same data in a proprietary program I get a similar plot but with an (correct) EC50 of 45.5
I tried log10 as well as log2 of the calculated ED50 but it's off anyways
Solution
So I figured it out, the models were correct, but apparently there's when using ED() one can choose between type="absolute" and type="relative" where "relative" ist the default; in my case given those response values are absolute, changing this parameter gave me the right ED50s
I am befuddled by the format to perform a simple prediction using R's survival package
library(survival)
lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)
So fitting a simple exponential regression (for example purposes only) is:
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")
How would I predict the percent survival at time=400?
When I use the following:
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")
I get the following:
myPredict400
1
421.7758
I was expecting something like 37% so I am missing something pretty obvious
The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.
Try this:
library(survival)
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution
pct <- 1:99/100 # this creates the empirical survival probabilities
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct)
indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400
print(1 - pct[indx]) # 0.39
Straight from the help docs, here's a plot of it:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)
Edited
You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is inf because the survival rate in the 100th percentile is infinite. This is what the quantile and pct arguments do.
For example, setting pct = 1:999/1000 you get much more precise values for the prediction (myPredict400). Also, if you set pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.
I have a logistic regression model that I am using to predict size at maturity for king crab, but I am having trouble setting up the code for bootstrapping using the boot package. This is what I have:
#FEMALE GKC SAM#
LowerChatham<-read.table(file=file.choose(),header=TRUE)
#LOGISTIC REGRESSION FIT#
glm.out<-glm(Mature~CL,family=binomial(link=logit),data=LowerChatham)
plot(Mature~CL,data=LowerChatham)
lines(LowerChatham$CL,glm.out$fitted,col="red")
title(main="Lower Chatham")
summary(glm.out)
segments(98.9,0,98.9,0.5,col=1,lty=3,lwd=3)
SAM<-data.frame(CL=98.97)
predict(glm.out,SAM,type="response")
I would like to to bootstrap the statistic CL=98.97 since I am interested in the size at which 50% of crab are mature, but I have no idea how to setup my function to specify the that statistic and let alone the bootstrap function in general to get my 95% C.I. Any help would be greatly appreciated! Thanks!
In each bootstrap iteration, you want to do something like
range <- 1:100 # this could be any substantively meaningful range
p <- predict(glm.out, newdata = data.frame(CL=range), "response")
range[match(TRUE,p>.5)] # predicted probability of 50% maturity
where you specify a range of values of CL to whatever precision you need. Then calculate the predicted probability of maturity at each of those levels. Then find the threshold value in the range where the predicted probability cross 0.5. This is the statistic it sounds like you want to bootstrap.
You also don't need the boot to do this. If you define a function that samples and outputs that statistic as its result, you can just do replicate(1000, myfun) to get your bootstrap distribution, as follows:
myfun <- function(){
srows <- sample(1:nrow(LowerChatham),nrow(LowerChatham),TRUE)
glm.out <- (Mature ~ CL, family=binomial(link=logit), data=LowerChatham[srows,])
range <- 1:100 # this could be any substantively meaningful range
p <- predict(glm.out, newdata = data.frame(CL=range), "response")
return(range[match(TRUE,p>.5)]) # predicted probability of 50% maturity
}
bootdist <- replicate(1000, myfun()) # your distribution
quantile(unlist(bootdist),c(.025,.975)) # 95% CI
Hey out there how can I can I export a table of the results used to make the chart I generated for this linear regression model below.
d <- data.frame(x=c(200110,86933,104429,240752,255332,75998,
204302,97321,342812,220522,110990,259706,65733),
y=c(200000,110000,165363,225362,313284,113972,
137449,113106,409020,261733,171300,344437,89000))
lm1 <- lm(y~x,data=d)
p_conf1 <- predict(lm1,interval="confidence")
nd <- data.frame(x=seq(0,80000,length=510000))
p_conf2 <- predict(lm1,interval="confidence",newdata=nd)
plot(y~x,data=d,ylim=c(-21750,600000),xlim=c(0,600000)) ## data
abline(lm1) ## fit
matlines(d$x,p_conf1[,c("lwr","upr")],col=2,lty=1,type="b",pch="+")
matlines(nd$x,p_conf2[,c("lwr","upr")],col=4,lty=1,type="b",pch="+")
Still not entirely sure what you want but this would seem to be reasonable:
dat1 <- data.frame(d,p_conf1)
dat2 <- data.frame(nd,y=NA,p_conf2)
write.csv(rbind(dat1,dat2),file="linpredout.csv")
It includes x, y (equal to the observation or NA for non-observed points), the predicted value fit, and lwr/upr bounds.
edit: fix typo.
This will return a matrix that has some of the information needed to construct the confidence intervals:
> coef(summary(lm1))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21749.037058 2.665203e+04 0.8160369 4.317954e-01
x 1.046954 1.374353e-01 7.6177997 1.037175e-05
Any text on linear regression should have the formula for the confidence interval. You may need to calculate some ancillary quantities dependent on which formula you're using. The code for predict is visible ... just type at the console :
predict.lm
And don't forget that confidence intervals are different than prediction intervals.