Plot histogram and density function curve on one chart - r

I have a density function f, and I do MCMC sampling for it. To evaluate the goodness of the sampling, I need to plot the hist and curve within the same chart. The problem of
hist(samples);
curve(dfun,add=TRUE);
is that they are on the different scale: the frequency of a certain bin is usually hundreds, while the maximum of a density function is about 1 or so. What I want to do is to configure two plots at the same height, with one y-axis on the left and the other on the right. Can anyone help? Thank you.

Use the prob=TRUE argument to hist:
hist(samples, prob=TRUE)
curve(dfun,add=TRUE)
Also see this SO question

Related

how to create a wormplot (detrended Q-Q plot) in ggplot?

Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.
blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5

R: pROC package: plot ROC curve across specific range?

I would like to plot a segment of an ROC curve over a specific range of x values, instead of plotting the entire curve. I don't want to change the range of the x axis itself. I just want to plot only part of the ROC curve, within a range of x values that I specify.
library(pROC)
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$wfns)
plot(rocobj)
That code plots the whole ROC curve. Let's say I just wanted to plot the curve from x=1 to x=.5. How could I do that? Thank you.
The default plot function for roc objects plots the rocobj$sensitivities as a function of rocobj$specificities.
So
plot(rocobj$specificities,rocobj$sensitivities,type="l",xlim=c(1.5,-0.5))
abline(1,-1)
achieves the same as
plot(rocobj)
And
plot(rocobj$specificities[2:6],rocobj$sensitivities[2:6],type="l",xlim=c(1.5,-0.5),ylim=c(0,1))
abline(1,-1)
Gets close to what I think you are after (plots from 0.514 to 1.0). I don't know enough about the package to know if the sensitivities can be calculated at a specific range, or resolution of specificities.
The plot function of pROC uses the usual R semantics for plotting, so you can use the xlim argument as you would for any other plot:
plot(rocobj, xlim = c(1, .5))

How to hide the undrlying data in a ggplot scatterplot

I am trying to make a scatter plot in ggplot and would like to loose the points and show only the smooth line and the confidence interval. I checked geom_point() function and there is no option for turning it off such that the points/underlying data is hidden. Any suggestions? much appreciated.
Joseph
To plot smooth line with confidence interval around the line you should use geom_smooth(). This will smoothed line using loess if there are less than 1000 observations and gam if more. But you can change smoothing method with argument method=.
ggplot(mtcars,aes(wt,mpg))+geom_smooth()

Plotting histograms with R; y axis keeps changing to frequency from proportion/probability

I try to overlay two histograms in the same plane but the option Probability=TRUE (relative frequencies) in hist() is not effective with the code below. It is a problem because the two samples have very different sizes (length(cl1)=9 and length(cl2)=339) and, with this script, I cannot vizualize differences between both histograms because each shows frequencies. How can I overlap two histograms with the same bin width, showing relative frequencies?
c1<-hist(dataList[["cl1"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(1,0,0,1/4),main=paste(paramlab,"Group",groupnum,"cl1",sep=" "),xlab="",probability=TRUE)
c2<-hist(dataList[["cl2"]],xlim=range(minx,maxx),breaks=seq(minx,maxx,pasx),col=rgb(0,0,1,1/4),main=paste(paramlab,"Group",groupnum,"cl2",sep=" "),xlab="",probability=TRUE)
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="")# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T)
cl1Col <- rgb(1,0,0,1/4)
cl2Col <- rgb(0,0,1,1/4)
legend('topright',c('Cl1','Cl2'),
fill = c(cl1Col , cl2Col ), bty = 'n',
border = NA)
Thanks in advance for your help!
When you call plot on an object of class histogram (like c1), it calls the S3 method for the histogram. Namely, plot.histogram. You can see the code for this function if you type graphics:::plot.histogram and you can see its help under ?plot.histogram. The help file for that function states:
freq logical; if TRUE, the histogram graphic is to present a
representation of frequencies, i.e, x$counts; if FALSE, relative
frequencies (probabilities), i.e., x$density, are plotted. The default
is true for equidistant breaks and false otherwise.
So, when plot renders a histogram it doesn't use the previously specified probability or freq arguments, it tries to figure it out for itself. The reason for this is obvious if you dig around inside c1, it contains all of the data necessarily for the plot, but does not specify how it should be rendered.
So, the solution is to reiterate the argument freq=FALSE when you run the plot functions. Notably, freq=FALSE works whereas probability=TRUE does not because plot.histogram does not have a probability option. So, your plot code will be:
plot(c1, col=rgb(1,0,0,1/4), xlim=c(minx,maxx), main=paste(paramlab,"Group",groupnum,sep=" "),xlab="",freq=FALSE)# first histogram
plot(c2, col=rgb(0,0,1,1/4), xlim=c(minx,maxx), add=T, freq=FALSE)
This all seems like a oversight/idiosyncratic decision (or lack thereof) on the part of the R devs. To their credit it is appropriately documented and is not "unexpected behavior" (although I certainly didn't expect it). I wonder where such oddness should be reported, if it should be reported at all.

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources