Q-Q plot is a useful graphical device used to check for example normality of residuals. Q-Q plot is constructed by putting theoretical quantiles on x-axis and observed quantiles on the y-axis. In ggplot, this can be easily done using geom_qq and stat_qq. I would like to produce a wormplot, which is like a Q-Q plot, but on the y-axis, it has a difference between theoretical and observed quantiles (see the figure).
Is there a way to do this in ggplot? For example, is there a simple way to change the y-axis of the geom_qq to show the difference between theoretical and observed quantiles? I know it should be possible to calculate observed quantiles manually, but this would not work well if I would like to create plots of multiple groups or using facets, since then I would also need to calculate the observed quantiles manually for each group separately.
blogpost mentioned in comments contains a guide to code your own statfunctions to create such plots yourself
Otherwise, library qqplotr https://aloy.github.io/qqplotr/index.html contains an option detrend=True which basically produce wormplots with accompanying confidence bands.
If you want lines, and not a band, just do fill=NA, color='black', size=0.5
I would like to plot ridgeline plots of 3 different timeseries with same axes with actual values, but NOT a density plot as ridgeplots generally show.
Tried using Henrik Lindberg's code here : https://github.com/halhen/viz-pub/tree/master/sports-time-of-day
It does what it is supposed to do, but can not produce smoothing.
Also tried the ggridges manual codes (below)
ggplot(df,aes(x = time, y = activity, height = p)) + geom_density_ridges()
ggridges produces density plots, not as a timeseries as I want it to be. Henrik's code produces desired timeseries, but without the smoothing as I wanted from a ridgeplot.
Long story short, I decided to try a simulation in order to provide some insight upon the reproducability of my data.
However the plot seems pretty awful and I would like to smooth the lines a bit.
The plot is as follows:
Scatter: actual data
Black Line: Simulated means
Red: +/- 2 Standard Deviations
You may try ggplot's geom_smooth().
ggplot(iris, aes(Sepal.Width, Sepal.Length))+
geom_point()+
geom_smooth(method="loess")
However, this code won't give you the standard deviations, as I haven't got sample data resembling your data structure. Still, you might go from here and set geom_smooth(se=FALSE) to get rid of the confidence-intervall area and plot a geom_area() with your standard deviations instead.
I am trying to make a scatter plot in ggplot and would like to loose the points and show only the smooth line and the confidence interval. I checked geom_point() function and there is no option for turning it off such that the points/underlying data is hidden. Any suggestions? much appreciated.
Joseph
To plot smooth line with confidence interval around the line you should use geom_smooth(). This will smoothed line using loess if there are less than 1000 observations and gam if more. But you can change smoothing method with argument method=.
ggplot(mtcars,aes(wt,mpg))+geom_smooth()
I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)