Specificity/Sensitivity vs cut-off points using pROC package - plot

I need to plot the following graph so I can choose the optimal threshold for a logistic regression model.
However I can't use the packages (epi and roc) which are used in many of the research I have done. I do have the package pROC. Is there anyway to plot the graph using this package. Also how else could I choose the optimal threshold? How does it work using only the ROC curve?

If you are using the pROC package, the first step is to extract the coordinates of the curve. For instance:
library(pROC)
data(aSAH)
myroc <- roc(aSAH$outcome, aSAH$ndka)
mycoords <- coords(myroc, "all")
Once you have that you can plot anything you like. This should be somewhat close to your example.
plot(mycoords["threshold",], mycoords["specificity",], type="l",
col="red", xlab="Cutoff", ylab="Performance")
lines(mycoords["threshold",], mycoords["sensitivity",], type="l",
col="blue")
legend(100, 0.4, c("Specificity", "Sensitivity"),
col=c("red", "blue"), lty=1)
Choosing the "optimal" cutpoint is as difficult as defining what is optimal in the first place. It highly depends on the context and your application.
A common shortcut is to use the Youden index, which is simply the point with the cutoff with max(specificity + sensitivity). Again with pROC:
best.coords <- coords(myroc, "best", best.method="youden")
abline(v=best.coords["threshold"], lty=2, col="grey")
abline(h=best.coords["specificity"], lty=2, col="red")
abline(h=best.coords["sensitivity"], lty=2, col="blue")
With pROC you can change the criteria for the "best" threshold. See the ?coords help page and the best.method and best.weights arguments for quick ways to tune it. You may want to look at the OptimalCutpoints package for more advanced ways to select your own optimum.
The output plot should look something like this:

Related

R: Holt Model. Unable to plot timeseries prediction (predict)

I have been able to use a lm poly-model to model and predict some timeseries data. However when I change to using a holt model, I obtain an error in the R console.
Here is what I am trying to do:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
#poly4model <- lm(y~poly(x, degree=4), data=df)
holtmodel <- holt(df$y) # might need df$value here ?
v <- seq(1, 44)
v2 <- seq(44, 55)
pdf("postprocessed_holts.pdf")
plot(df, xlim=c(0, 55))
##lines(v, predict(poly4model, data.frame(x=v)), col="blue", pch=20, lwd=3)
##lines(v2, predict(poly4model, data.frame(x=v2)), col="red", pch=20, lwd=3)
lines(v, predict(holtmodel, data.frame(x=v)), col="blue", pch=20, lwd=3)
lines(v2, predict(holtmodel, data.frame(x=v2)), col="red", pch=20, lwd=3)
dev.off()
This is the error which shows up
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I am a bit confused as to what x and y refer to here. The objects x and y which are in the Environment (R Studio Environment) both have length 44.
The code appears to error on both lines starting with lines.
Here's a copy of the input data...
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
Edit
I tried an alternative method as well. I noticed that the object holtmodel contains two objects which might be useful. They are fitted and mean. As far as I can tell this is the fitted timeseries and the mean timeseries for the next 10 steps/predictions.
I tried plotting these objects with
lines(holtmodel$fitted, col="orange", lwd=2)
lines(holtmodel$mean, col="blue", lwd=2)
however the second of these fails to plot anything, despite no error being produced in the console. The first line plots an orange timeseries as expected.
Your issue
The objects you are trying to add as lines don't have the same length:
length(predict(holtmodel, data.frame(x=v)))
# 10
length(v)
# 44
length(predict(holtmodel, data.frame(x=v2)))
# 10
length(v2)
# 12
This means you can't add them as new lines.
Also, you can't really predict the same way you would with a linear regression by using say, older data as point to prepare the model. Exponential smoothing methods use historical data points to build future data points, you can't really display them for past events.
Also, you are not specifying the parameter for the number of periods you are trying to predict (h), I'll let you refer to the documentation on the holt function. It is already a prediction of future events that is the output, so the use of predict() on it doesn't change the result:
holt_predict <- predict(holtmodel)
length(setdiff(holt_predict, holtmodel))
# 0 which means they are the same objects
Solution
What you could do is use directly mean and fitted and plot them with lines, by also expanding the area to plot the chat with xlim and ylim to view the predicted values. You can directly plot holtmodel$fitted and holtmodel$mean on your chart, since they are time series objects:
plot(df, xlim=c(0, 60), ylim=c(2.5, 10))
lines(holtmodel$fitted, col="blue", pch=20, lwd=3)
lines(holtmodel$mean, col="red", pch=20, lwd=3)
And the result:
Easy alternative
To save you the hassle of having to go through this kind of solution there are easier methods. Have you tried the autoplot function included in the package forecast ? It is from ggplot2 and will give you what you want directly (unless you don't want the confidence intervals). It is very straightforward and will probably yield results close to what you want:
autoplot(holtmodel)

Plot a regression equation with mean standard error in R

I would like to plot values from the image from a model regression, with R.
https://imgur.com/DDbP29T
my plan is to create the equation and plot with the curve function
eqn = function(x){ZZZ}
curve(eqn, from=0, to=50, n=50)
I expect a logarithmic growth curve
Something along these "lines" as it were; Consult ?curve, ?plot for further details about the arguments used, and your stats books for why I picked 1.96 as a multiplier for the sum (roughly) of the se values
eqn = function(x){62.259+11.395*log(x) -10.268}
curve(eqn, from=0, to=50, n=50, ylim=c(0,100))
lines(x=seq(0,50),y=eqn(seq(0,50))+ 1.96*8.5, lty=3)
lines(x=seq(0,50),y=eqn(seq(0,50))- 1.96*8.5, lty=3)
(I'm guessing you are trying to reproduce a plot you have seen constructed in the SAS product named "JMP". I used it for a while around 2008-2009, but found it too limited in its capabilities, not to mention becoming quite expensive.)

How do I plot an abline() when I don't have any data points (in R)

I have to plot a few different simple linear models on a chart, the main point being to comment on them. I have no data for the models. I can't get R to create a plot with appropriate axes, i.e. I can't get the range of the axes correct. I think I'd like my y-axis to 0-400 and x to be 0-50.
Models are:
$$
\widehat y=108+0.20x_1
$$$$
\widehat y=101+2.15x_1
$$$$
\widehat y=132+0.20x_1
$$$$
\widehat y=119+8.15x_1
$$
I know I could possibly do this much more easily in a different software or create a dataset from the model and estimate and plot the model from that but I'd love to know if there is a better way in R.
As #Glen_b noticed, type = "n" in plot produces a plot with nothing on it. As it demands data, you have to provide anything as x - it can be NA, or some data. If you provide actual data, the plot function will figure out the plot margins from the data, otherwise you have to choose the margins by hand using xlim and ylim arguments. Next, you use abline that has parameters a and b for intercept and slope (or h and v if you want just a horizontal or vertical line).
plot(x=NA, type="n", ylim=c(100, 250), xlim=c(0, 50),
xlab=expression(x[1]), ylab=expression(hat(y)))
abline(a=108, b=0.2, col="red")
abline(a=101, b=2.15, col="green")
abline(a=132, b=0.2, col="blue")
abline(a=119, b=8.15, col="orange")

How do I plot the 'inverse' of a survival function?

I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))

R: ylim and xlab/ylab in plot() for grofit package not working

I am trying to plot a growth spline for the variable "ipen" against "year" in this data. The code I am using is:
grofit <- read.csv('http://dl.dropbox.com/u/1791181/grofit.csv')
# install.packages(c("grofit"), dependencies = TRUE)
require(grofit)
growth = gcFitSpline(grofit$year, grofit$ipen)
plot(growth)
This works fine, and produces the plot below. But the problem is that (a) I can't change the default labels with xlab or ylab options, and (b) I can't change the scale of x-axis to be (0,100) with ylim=c(0,100) in the plot() statement. I could not find any pointers in the grofit()documentation. .
Don't directly use plot on the gcFitSpline result, but rather use str(growth) to explore the structure of the curve fit object and determine what you want to plot, or look at plot.gcFitSpline code (just type the name of the function without parentheses in the console and press Enter!)
For instance:
growth = gcFitSpline(grofit$year, grofit$ipen)
plot (grofit$year, grofit$ipen, pch=20, cex=0.8, col="gray", ylim=c(0, 100),
ylab="Growth", xlab="Year", las=1)
points(growth$fit.time, growth$fit.data, t="l", lwd=2)
This gives:
Actually, I think the grofit documentation does have a pointer about this:
... Other graphical parameters may also passed as arguments [sic]. This has currently no effect and is only meant to fulfill the requirements of a generic function.
You may have to rewrite the plot.gcFitSpline function specific to the graphical parameters you want, but there might be a reason that the function is the way it is.
BTW, I'm not sure it's a great idea to give your data the same name as the package you're using.

Resources