R: pROC package: plot ROC curve across specific range? - r

I would like to plot a segment of an ROC curve over a specific range of x values, instead of plotting the entire curve. I don't want to change the range of the x axis itself. I just want to plot only part of the ROC curve, within a range of x values that I specify.
library(pROC)
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$wfns)
plot(rocobj)
That code plots the whole ROC curve. Let's say I just wanted to plot the curve from x=1 to x=.5. How could I do that? Thank you.

The default plot function for roc objects plots the rocobj$sensitivities as a function of rocobj$specificities.
So
plot(rocobj$specificities,rocobj$sensitivities,type="l",xlim=c(1.5,-0.5))
abline(1,-1)
achieves the same as
plot(rocobj)
And
plot(rocobj$specificities[2:6],rocobj$sensitivities[2:6],type="l",xlim=c(1.5,-0.5),ylim=c(0,1))
abline(1,-1)
Gets close to what I think you are after (plots from 0.514 to 1.0). I don't know enough about the package to know if the sensitivities can be calculated at a specific range, or resolution of specificities.

The plot function of pROC uses the usual R semantics for plotting, so you can use the xlim argument as you would for any other plot:
plot(rocobj, xlim = c(1, .5))

Related

Line style in q-q plot in R

Why I did get lines instead of standard bubbles in my q-q plot?
My code:
data <- read.csv("C:\\Users\\anton\\SanFrancisco.csv")
x <- data$ï..San.Francisco
head(x)
library("fitdistrplus")
fitnor <- fitdist(x, "norm")
fitlogis <- fitdist(x, "logis")
qqcomp(list(fitnor, fitlogis), legendtext=c("Normal", "Logistic"))
From the documentation for qqcomp - get to it by ?qqcomp.
qqcomp provides a plot of the quantiles of each theoretical
distribution (x-axis) against the empirical quantiles of the data
(y-axis), by default defining probability points as (1:n - 0.5)/n for
theoretical quantile calculation (data are assumed continuous). For
large dataset (n > 1e4), lines are drawn instead of points and
customized with the fitpch parameter.
This is a design feature. Your data must have more than 10000 values. If that is the case, the bubbles on the q-q plot would be difficulty to individually distinguish. Additionally, they are large enough that the bubbles for one model would cover those for the other.

How to plot exponential function on barplot R?

So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.

How do I plot the 'inverse' of a survival function?

I am trying to plot the inverse of a survival function, as the data I'm is actually an increase in proportion of an event over time. I can produce Kaplan-Meier survival plots, but I want to produce the 'opposite' of these. I can kind of get what I want using the following fun="cloglog":
plot(survfit(Surv(Days_until_workers,Workers)~Queen_Number+Treatment,data=xdata),
fun="cloglog", lty=c(1:4), lwd=2, ylab="Colonies with Workers",
xlab="Days", las=1, font.lab=2, bty="n")
But I don't understand quite what this has done to the time (i.e. doesn't start at 0 and distance decreases?), and why the survival lines extend above the y axis.
Would really appreciate some help with this!
Cheers
Use fun="event" to get the desired output
fit <- survfit(Surv(time, status) ~ x, data = aml)
par(mfrow=1:2, las=1)
plot(fit, col=2:3)
plot(fit, col=2:3, fun="event")
The reason for fun="cloglog" screwing up the axes is that it does not plot a fraction at all. It is instead plotting this according to ?plot.survfit:
"cloglog" creates a complimentary log-log survival plot (f(y) = log(-log(y)) along with log scale for the x-axis)
Moreover, the fun argument is not limited to predefined functions like "event" or "cloglog", so you can easily give it your own custom function.
plot(fit, col=2:3, fun=function(y) 3*sqrt(1-y))

Plot histogram and density function curve on one chart

I have a density function f, and I do MCMC sampling for it. To evaluate the goodness of the sampling, I need to plot the hist and curve within the same chart. The problem of
hist(samples);
curve(dfun,add=TRUE);
is that they are on the different scale: the frequency of a certain bin is usually hundreds, while the maximum of a density function is about 1 or so. What I want to do is to configure two plots at the same height, with one y-axis on the left and the other on the right. Can anyone help? Thank you.
Use the prob=TRUE argument to hist:
hist(samples, prob=TRUE)
curve(dfun,add=TRUE)
Also see this SO question

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources