I am trying to figure out how to get confidence intervals from predicted values from a model run on medrc (nlme model). The code worked on the regular drc package model, which does not use random effects, so I assume there is something I am not doing right with this nlme model to get CI because I am getting errors.
Below is an example data frame of the data I am using
df <- data.frame(Geno = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,
9,9,9,9,10,10,10,10,11,11,11,11,12,12,12,12,13,13,13,13,14,14,14,14),
Treatment = c(3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM"),
Temp = c(32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535),
PAM = c(0.62225,0.593,0.35775,0.654,0.60625,0.5846667,0.316,0.60875,0.62275,0.60875,0.32125,
0.63725,0.60275,0.588,0.32275,0.60875,0.65225,0.6185,0.29925,0.64525,0.61925,0.61775,
0.11725,0.596,0.603,0.6065,0.2545,0.59025,0.586,0.5895,0.27025,0.59125,0.6345,0.6135,
0.3755,0.622,0.53375,0.552,0.2485,0.51925,0.6375,0.6256667,0.3575,0.63975,0.59375,0.6055,
0.333,0.64125,0.55275,0.51025,0.319,0.55725,0.6375,0.64725,0.348,0.66125))
df$Geno <- as.factor(df$Geno)
With this data, I am running this model that has 3 parameters for the dose-response curve model, b =slope, d= max, e= ED50.
model <- medrm(PAM ~ Temp,
data=df,
random= d + e ~ 1|Geno,
fct=LL.3(),
control=nlmeControl(msMaxIter = 2000, maxIter=2000, minScale=0.00001, tolerance=0.1, pnlsTol=1))
summary(model)
plot(model)
From this model I want to make prediction values for different temperatures along the model
model_preddata = data.frame(Temp = seq(28,39, length.out = 100))
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence'))
with this I get an error but I can make it predict the PAM values if I add this
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence', level = 0))
However this does not give me the lower and upper bounds columns like it does when I run this code with other non mixed effect models.
Can anyone help me figure out how to get the CI from the predicted values of this model
I am trying to predict and plot the (estimated) survival curve for a new observation in R. Using the "survival" library and the "lung" data set, I first fit a cox proportional hazards model to the data. Then, I tried to predict and plot the survival curve for a hypothetical new observation (I entered the details for this hypothetical new observation in the "list" command). However, this is not working.
I have attached my code below:
#load library
library(survival)
data(lung)
#create survival object
s <- with(lung,Surv(time,status))
#create model
modelA <- coxph(s ~ as.factor(sex)+age+ph.ecog+wt.loss+ph.karno,data=lung, model=TRUE)
summary(modelA)
#plot
plot(survfit(modelA), ylab="Probability of Survival",
xlab="Time", col=c("red", "black", "black"))
#predict for a hypothetical NEW observation (here is where the error is)
lines(predict(modelA, newdata=list(sex=1,
age = 56,
ph.ecog = 1,
ph.karno = 50,
wt.loss = 11),
type="quantile",
p=seq(.01,.99,by=.01)),
seq(.99,.01,by=-.01),
col="blue")
## Error in match.arg(type) :
## 'arg' should be one of “lp”, “risk”, “expected”, “terms”, “survival”
Does anyone know what I am doing wrong? Thanks
This is what the survfit function is for. In your example, you plot the survfit for the model, but you can feed a newdata argument into this function and it will produce the estimated survival for these data.
If we reproduce your example:
library(survival)
s <- with(lung, Surv(time, status))
modelA <- coxph(s ~ as.factor(sex) + age + ph.ecog + wt.loss + ph.karno,
data = lung, model = TRUE)
plot(survfit(modelA), ylab = "Probability of Survival",
xlab = "Time", col = c("red", "black", "black"))
Then we can create a survival curve given your specified covariates like this:
est <- survfit(modelA, newdata = data.frame(sex = 1,
age = 56,
ph.ecog = 1,
ph.karno = 50,
wt.loss = 11))
Now est is an S3 object with members that include time and survival, so we can plot a blue line tracking the estimated survival of individuals with the given covariates like this:
lines(est$time, est$surv, col = 'blue', type = 's')
Or plot it on its own with a 95% confidence interval:
plot(est, ylab = "Probability of Survival",
xlab = "Time", col = c("red", "black", "black"))
Created on 2022-05-26 by the reprex package (v2.0.1)
See the description of the predict() function (you can open it in R help by running ?predict.coxph, or here for example):
type - the type of predicted value. Choices are the linear predictor
("lp"), the risk score exp(lp) ("risk"), the expected number of events
given the covariates and follow-up time ("expected"), and the terms of
the linear predictor ("terms"). The survival probability for a subject
is equal to exp(-expected).
You can see that your type="quantile" does not match expected input. If you call predict() without the type argument, in your case it will default to using lp (linear predictor).
When you call predict() function for your object modelA, it determines that it is of coxph class, so the predict.coxph() function is applied. The arguments like type="quantile" and p=seq(.01,.99,by=.01) are not acceptable for predict.coxph() (p is ignored, type raises error). They are used in another function, predict.survreg() - for it to be called, your modelA object must be of survreg class, i.e. it should be created using survreg() call instead of coxph() call.
I am trying to calculate ROC Curve and AUC using ranger for a binomial classification problem (0 and 1), where the response variable is defined as BiClass.
Suppose I cast a data frame to Train_Set and Test_Set (75% and 25 % respectively) and compute binary class probabilities using:
library(ranger)
library(ROCR)
library(mlr)
library(pROC)
library(tidyverse)
Biclass.ranger <- ranger(BiClass ~ ., ,data=Train_Set, num.trees = 500, importance="impurity", save.memory = TRUE, probability=TRUE)
pred <- predict(BiClass.ranger, data = Test_Set, num.trees = 500, type='response', verbose = TRUE)
My intention is now to compute ROC curve (and AUC). I tried the following code, through which I get ROC curve (using ROCR and mlr packages):
pred_object <- prediction(pred$predictions[,2], Test_Set$BiClass)
per_measure <- performance(pred_object, "tnr", "fnr")
plot(per_measure, col="red", lwd=1)
abline(a=0,b=1,lwd=1,lty=1,col="gray")
Or, aletrnatively using pROC package:
probabilities <- as.data.frame(predict(Biclass.ranger, data = Test_Set, num.trees = 500, type='response', verbose = TRUE)$predictions)
probabilities$predic <- colnames(probabilities)[max.col(probabilities,ties.method="first")] # For each row, return the column name of the largest value from 0 and 1 columns (prediction column). This will be a character type
probabilities$prednum <- as.numeric(as.character(probabilities$predic)) # create prednum as a numeric data type in probabilities
probabilities <- dplyr::mutate_if(probabilities, is.character, as.factor) # convert character to factor
probabilities <- cbind(probabilities,BiClass=Test_Set$BiClass) # append BiClass. This data frame contains the response variable from the Test_Data, along with prediction (prednum) and probability classes (0 and 1)
ROC_ranger <- pROC::roc(Table$BiClass, pred$predictions[,2])
plot(ROC_ranger, col = "blue", main = "ROC - Ranger")
paste("Accuracy % of ranger: ", mean(Test_Set$BiClass == round(pred$predictions[,2], digits = 0))) # print the performance of each model
The ROC curve obtained is given below:
I have the following questions:
1) How can I set a threshold value and plot confusion matrix for the set threshold?
I compute the confusion matrix presently using:
probabilities <- as.data.frame(predict(Biclass.ranger, data = Test_Set, num.trees = 500, type='response', verbose = TRUE)$predictions)
max.col(probabilities) - 1
confusionMatrix(table(Test_Set$BiClass, max.col(probabilities)-1))
2) How do I calculate the optimal thershold value (global value at which I have more true positives or true negatives) through optimization?
Again, referring to the pROC and the guidelines proposed by its author using:
myroc <- pROC::roc(probabilities$BiClass, probabilities$`1`)
mycoords <- pROC::coords(myroc, "all", transpose = FALSE)
plot(mycoords$threshold, mycoords$specificity, type="l", col="red", xlab="Cutoff", ylab="Performance")
lines(mycoords$threshold, mycoords$sensitivity, type="l", col="blue")
legend(0.23,0.2, c("Specificity", "Sensitivity"), col=c("red", "blue"), lty=1)
best.coords <- coords(myroc, "best", best.method="youden", transpose = FALSE)
abline(v=best.coords$threshold, lty=2, col="grey")
abline(h=best.coords$specificity, lty=2, col="red")
abline(h=best.coords$sensitivity, lty=2, col="blue")
I was able to draw this curve using youden index:
]2
Does it mean there isn't a lot of freedom to vary threshold to play with specificity and sensitivity, since the dashed blue and red lines are not far away from each other?
3) How to evaulate AUC?
I calculated AUC using pROC again following the guidelines of its author. See below:
ROC_ranger <- pROC::roc(probabilities$BiClass, probabilities$`1`)
ROC_ranger_auc <- pROC::auc(ROC_ranger)
paste("Area under curve of random forest: ", ROC_ranger_auc) # AUC of the model
The goal finally is to increase the True Neagtives, which are presently defined by 1 in BiClass and of course True Positives (defined by 0 in BiClass) in the confusion matrix. At present, the Accuracy of my classification algorithm is 0.74 and the AUC is 0.81 respectively.
I am constructing this ROC curve from my SVM model, but the curve came out inverted. Also, although my SVM prediction has high accuracy (~93%), my ROC curve shows that my area under the curve is just about 2.7%. Moreover, it tells me that the optimal cutoff value is infinity, which is not what I expected from my model fitting.
I have fitted my SVM model using the built-in SVM function just like in the code I showed below, and then I predicted using the function predict(). Then, I computed the prediction() and calculated the performance(), the cutoff value, and the AUC (all code shown below)
svm.fit <- svm(label ~ NDAI + SD + CORR, data = trainSet, scale = FALSE, kernel = "radial", cost = 2, probability=TRUE)
svm.pred <- predict(svm.fit, testSet, probability=TRUE)
mean(svm.pred== testSet$label)*100
prediction.svm <- prediction(attr(svm.pred, "probabilities")[,2], testSet$label)
eval.svm <- performance(prediction.svm, "acc")
roc.svm <- performance(prediction.svm, "tpr", "fpr")
#identify best values and cutoff
max_index.svm <- which.max(slot(eval.svm, "y.values")[[1]])
max.acc_svm <- (slot(eval.svm, "y.values")[[1]])[max_index.svm]
opt.cutoff_svm <- (slot(eval.svm, "x.values")[[1]])[max_index.svm][[1]]
#AUC
auc.svm <- performance(prediction.svm, "auc")
auc.svm <- unlist(slot(auc.svm, "y.values"))
auc.svm <- round(auc.svm, 4)
plot(roc.svm,colorize=TRUE)
points(0.072, 0.93, pch= 20)
legend(.6,.2, auc.svm, title = "AUC", cex = 0.8)
legend(.8,.2, round(opt.cutoff_svm,4), title = "cutoff", cex = 0.8)
I expect the output to have AUC close to 1, and a small cutoff which is close to 0.5, with a curve with AUC close to 1. Has anyone encountered a similar problem like this one? If yes, how should I fix my code?