how to create a wormplot (detrended Q-Q plot) in ggplot? - r

Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.

blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5

Related

How to plot histogram axis in a marginal plot in R?

I have created a series of marginal plots looking at two variables using ggMarginal and ggplot2. However, I want to include an axis on the histogram that has the scale so I know roughly how many values fall into each bin.
I have already checked the documentation for ggMarginal and am at a loss.

Different lowess curves in plot and qplot in R

I am comparing two graphs with a non-parametric lo(w)ess curve superimposed in each case. The problem is that the curves look very different, despite the fact that their arguments, such as span, are identical.
y<-rnorm(100)
x<-rgamma(100,2,2)
qplot(x,y)+stat_smooth(span=2/3,se=F)+theme_bw()
plot(x,y)
lines(lowess(y~x))
There seems to be a lot more curvatute in the graph generated by qplot(). As you know detecting curvature is very important in the diagnostics of regression analysis and I fear that If I am to use ggplot2, I would reach erroneous conclusions.
Could you please tell me how I could produce the same curve in ggplot2?
Thank you
Or, you can use loess(..., degree=1). This produces a very similar, but not quite identical result to lowess(...)
set.seed(1) # for reproducibility
y<-rnorm(100)
x<-rgamma(100,2,2)
plot(x,y)
points(x,loess(y~x,data.frame(x,y),degree=1)$fitted,pch=20,col="red")
lines(lowess(y~x))
With ggplot
qplot(x,y)+stat_smooth(se=F,degree=1)+
theme_bw()+
geom_point(data=as.data.frame(lowess(y~x)),aes(x,y),col="red")
Here is a new stat function for use with ggplot2 that uses lowess(): https://github.com/harrelfe/Hmisc/blob/master/R/stat-plsmo.r. You need to load the proto package for this to work. I like using lowess because it is fast for any sample size and allows outlier detection to be turned off for binary Y. But it doesn't provide confidence bands.

R - logistic curve plot with aggregate points

Let's say I have the following dataset
bodysize=rnorm(20,30,2)
bodysize=sort(bodysize)
survive=c(0,0,0,0,0,1,0,1,0,0,1,1,0,1,1,1,0,1,1,1)
dat=as.data.frame(cbind(bodysize,survive))
I'm aware that the glm plot function has several nice plots to show you the fit,
but I'd nevertheless like to create an initial plot with:
1)raw data points
2)the loigistic curve and both
3)Predicted points
4)and aggregate points for a number of predictor levels
library(Hmisc)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
All fine up to here.
Now I want to plot the real data survival rates for a given levels of x1
dat$bd<-cut2(dat$bodysize,g=5,levels.mean=T)
AggBd<-aggregate(dat$survive,by=list(dat$bd),data=dat,FUN=mean)
plot(AggBd,add=TRUE)
#Doesn't work
I've tried to match AggBd to the dataset used for the model and all sort of other things but I simply can't plot the two together. Is there a way around this?
I basically want to overimpose the last plot along the same axes.
Besides this specific task I often wonder how to overimpose different plots that plot different variables but have similar scale/range on two-dimensional plots. I would really appreciate your help.
The first column of AggBd is a factor, you need to convert the levels to numeric before you can add the points to the plot.
AggBd$size <- as.numeric (levels (AggBd$Group.1))[AggBd$Group.1]
to add the points to the exisiting plot, use points
points (AggBd$size, AggBd$x, pch = 3)
You are best specifying your y-axis. Also maybe using par(new=TRUE)
plot(bodysize,survive,xlab="Body size",ylab="Probability of survival")
g=glm(survive~bodysize,family=binomial,dat)
curve(predict(g,data.frame(bodysize=x),type="resp"),add=TRUE)
points(bodysize,fitted(g),pch=20)
#then
par(new=TRUE)
#
plot(AggBd$Group.1,AggBd$x,pch=30)
obviously remove or change the axis ticks to prevent overlap e.g.
plot(AggBd$Group.1,AggBd$x,pch=30,xaxt="n",yaxt="n",xlab="",ylab="")
giving:

Plot histogram and density function curve on one chart

I have a density function f, and I do MCMC sampling for it. To evaluate the goodness of the sampling, I need to plot the hist and curve within the same chart. The problem of
hist(samples);
curve(dfun,add=TRUE);
is that they are on the different scale: the frequency of a certain bin is usually hundreds, while the maximum of a density function is about 1 or so. What I want to do is to configure two plots at the same height, with one y-axis on the left and the other on the right. Can anyone help? Thank you.
Use the prob=TRUE argument to hist:
hist(samples, prob=TRUE)
curve(dfun,add=TRUE)
Also see this SO question

Problem with axis limits when plotting curve over histogram [duplicate]

This question already has an answer here:
How To Avoid Density Curve Getting Cut Off In Plot
(1 answer)
Closed 6 years ago.
newbie here. I have a script to create graphs that has a bit that goes something like this:
png(Test.png)
ht=hist(step[i],20)
curve(insert_function_here,add=TRUE)
I essentially want to plot a curve of a distribution over an histogram. My problem is that the axes limits are apparently set by the histogram instead of the curve, so that the curve sometimes gets out of the Y axis limits. I have played with par("usr"), to no avail. Is there any way to set the axis limits based on the maximum values of either the histogram or the curve (or, in the alternative, of the curve only)?? In case this changes anything, this needs to be done within a for loop where multiple such graphs are plotted and within a series of subplots (par("mfrow")).
Inspired by other answers, this is what i ended up doing:
curve(insert_function_here)
boundsc=par("usr")
ht=hist(A[,1],20,plot=FALSE)
par(usr=c(boundsc[1:2],0,max(boundsc[4],max(ht$counts))))
plot(ht,add=TRUE)
It fixes the bounds based on the highest of either the curve or the histogram.
You could determine the mx <- max(curve_vector, ht$counts) and set ylim=(0, mx), but I rather doubt the code looks like that since [] is not a proper parameter passing idiom and step is not an R plotting function, but rather a model selection function. So I am guessing this is code in Matlab or some other idiom. In R, try this:
set.seed(123)
png("Test.png")
ht=hist(rpois(20,1), plot=FALSE, breaks=0:10-0.1)
# better to offset to include discrete counts that would otherwise be at boundaries
plot(round(ht$breaks), dpois( round(ht$breaks), # plot a Poisson density
mean(ht$counts*round(ht$breaks[-length(ht$breaks)]))),
ylim=c(0, max(ht$density)+.1) , type="l")
plot(ht, freq=FALSE, add=TRUE) # plot the histogram
dev.off()
You could plot the curve first, then compute the histogram with plot=FALSE, and use the plot function on the histogram object with add=TRUE to add it to the plot.
Even better would be to calculate the the highest y-value of the curve (there may be shortcuts to do this depending on the nature of the curve) and the highest bar in the histogram and give this value to the ylim argument when plotting the histogram.

Resources