Second layer in ggplot2 is shifted by one - r

I'm trying to plot a scatter-plot with two layers. The reason is I want to represent the size of the points by its number of answers. Then I need to have a smooth-curve layed over it. So I use two datasets to achieve this.
The problem is, when I lay the second layer with the smoother using the original dataset, then the smoother is shifted by one point on the x-scale to the left.
Does anyone know, how to correct this in the R code? Is there maybe something wrong in it?
I thought about to add 1 to the x variable, but I don't want to have to go this far.
library(ggplot2)
q.tab <- xtabs(~x + y, mydata)
q.df <- as.data.frame(q.tab)
pointsize <- q.df$Freq
qplot(x, y, data=q.df) + geom_point(aes(size=as.factor(pointsize)))
+ geom_smooth(data=mydata, method="loess", span=1))

With ggplot2 , when you think in terms of layer it is better to use ggplot function and not qplot.
I generate your data (sample function is very convenient to generate data)
mydata$x <- sample(1:10,100,replace=TRUE)
mydata$y <- sample(1:10,100,replace=TRUE)
q.tab <- xtabs(~x + y, mydata)
q.df <- as.data.frame(q.tab)
ggplot version:
library(ggplot2)
ggplot(data=mydata,aes(x,y,size=Freq)) +
geom_point() +
geom_smooth( method="loess", span=1)
qplot version:
qplot(data=mydata,x=x,y=y,size=Freq,geom='point')+
geom_smooth( method="loess", span=1)

Related

How to add geom_point() to autolayer() line?

Trying to add geom_points to an autolayer() line ("fitted" in pic), which is a wrapper part of autoplot() for ggplot2 in Rob Hyndmans forecast package (there's a base autoplot/autolayer in ggplot2 too so same likely applies there).
Problem is (I'm no ggplot2 expert, and autoplot wrapper makes it trickier) the geom_point() applies fine to the main call, but how do I apply similar to the autolayer (fitted values)?
Tried type="b" like normal geom_line() but it's not an object param in autolayer().
require(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
forecast::autoplot(mdeaths) +
forecast::autolayer(model.ses.fc$fitted, series="Fitted") + # cannot set to show points, and type="b" not allowed
geom_point() # this works fine against the main autoplot call
This seems to work:
library(forecast)
library(fpp2)
model.ses <- ets(mdeaths, model="ANN", alpha=0.4)
model.ses.fc <- forecast(model.ses, h=5)
# Pre-compute the fitted layer so we can extract the data out of it with
# layer_data()
fitted_layer <- forecast::autolayer(model.ses.fc$fitted, series="Fitted")
fitted_values <- fitted_layer$layer_data()
plt <- forecast::autoplot(mdeaths) +
fitted_layer +
geom_point() +
geom_point(data = fitted_values, aes(x = timeVal, y = seriesVal))
There might be a way to make forecast::autolayer do what you want directly but this solution works. If you want the legend to look right, you'll want to merge the input data and fitted values into a single data.frame.

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

How to reverse axis order and use a predefined scale in ggplot?

I've read a past post asking about using scale_reverse and scale_log10 at the same time. I have a similar issue, except my scale I'm seeking to "reverse" is a pre-defined scale in the "scales" package. Here is my code:
##Defining y-breaks for probability scale
ybreaks <- c(1,2,5,10,20,30,40,50,60,70,80,90,95,98,99)/100
#Random numbers, and their corresponding weibull probability valeus (which I'm trying to plot)
x <- c(.3637, .1145, .8387, .9521, .330, .375, .139, .662, .824, .899)
p <- c(.647, .941, .255, .059, .745, .549, .853, .451, .352, .157)
df <- data.frame(x, p)
require(scales)
require(ggplot2)
ggplot(df)+
geom_point(aes(x=x, y=p, size=2))+
stat_smooth(method="lm", se=FALSE, linetype="dashed", aes(x=x, y=p))+
scale_x_continuous(trans='probit',
breaks=ybreaks,
minor_breaks=qnorm(ybreaks))+
scale_y_log10()
Resulting plot:
For more information, the scale I'm trying to achieve is the probability plotting scale, which has finer resolution on either end of the scale (at 0 and 1) to show extreme events, with ever-decreasing resolution toward the median value (0.5).
I want to be able to use scale_x_reverse concurrently with my scale_x_continuous probability scale, but I don't know how to build that in any sort of custom scale. Any guidance on this?
Arguments in scale_(x|y)_reverse() are passed to scale_(x|y)_continuous() so you should simply do:
scale_x_reverse(trans='probit', breaks = ybreaks, minor_breaks=qnorm(ybreaks))
Rather than try to combine two transformations, why not transform your existing data and then plot it?
The following looks like it should be right.
#http://r.789695.n4.nabble.com/Inverse-Error-Function-td802691.html
erf.inv <- function(x) qnorm((x + 1)/2)/sqrt(2)
#http://en.wikipedia.org/wiki/Probit#Computation
probit <- function(x) sqrt(2)*erf.inv((2*x)-1)
# probit(0.3637)
df$z <- probit(df$x)
ggplot(df)+
geom_point(aes(x=z, y=p), size=2)+
stat_smooth(method="lm", se=FALSE, linetype="dashed", aes(x=z, y=p))+
scale_x_reverse(breaks = ybreaks,
minor_breaks=qnorm(ybreaks))+
scale_y_log10()

Using ggplot2: Create faceted scatterplot with scaled and moved density

I would like to plot some data as a scatter plot using facet_wrap, while superimposing some information such as a linear regression and the density.
I managed to do all that, but the density values are out of proportion with respect to my points, which is a normal thing since these points are far away. Nevertheless, I'd like to scale and move my density curve so that it is clearly visible; I don't care about it's real values but more about its shape.
Here is an exaggerated minimum working example of what I have:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x')
Which results in:
What I would like resembles to this ugly hack:
stat_density(aes(x=value,y=..scaled..*100+200),position='identity',geom='line')
Ideally, I would use y=..scaled..* diff(range(value)) + min(value) but when I do this I get an error saying that 'value' was not found. I suspect the problem is related to the faceting, but I would prefer to keep my facets.
How can I scale and move the density curve in this case?
I suggest to make two plots and combine them with grid.arrange:
p1 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
facet_wrap(~variable,scale='free_x') +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin = unit(c(1, 1, 0, 0.5), "lines"))
p2 <- ggplot(data=mydf.wide,aes(x=value,y=var)) +
stat_density(aes(x=value,y=..scaled..),position='identity',geom='line') +
facet_wrap(~variable,scale='free_x') +
theme(strip.background=element_blank(),
strip.text=element_blank(),
plot.margin = unit(c(-1, 1, 0.5, 0.35), "lines"))
library(gridExtra)
grid.arrange(p1, p2, heights = c(2,1))
I'm not sure if this completely answers your question, but it was too long to put in a comment, so... In response to your second chunk of code in your question, since you've already defined x=value, you can use x instead of value in your definition of y.
stat_density(aes(x=value,y=..scaled..*diff(range(x)) +
min(x)),position='identity',geom='line')
This seems to fix your error and produces the following plot:
The only problem is, of course, if you have data with low y-values, then you're still going to overlap your density curves with your scatterplot. But, if this isn't the case, I personally think this is a fairly informative figure, as long as you can communicate effectively that the y axis values aren't important in interpreting the density curves--only the shapes of the curves are important.
I appreciate the answers of everyone, which led me to better understand ggplot underlying mechanisms. I also realize how awkward my requirement is; ggplot is not going to solve my problem.
I managed to do what I wanted not by using ggplot stat_density but to directly calculate my densities in another data frame:
set.seed(48151623)
mydf <- data.frame(x1=rnorm(mean=5,n=100),x2=rnorm(n=100,mean=10),x3=rnorm(n=100,mean=20,sd=3))
mydf$var <- mydf$x1 + mydf$x2 * mydf$x3
mydf.wide <- melt(mydf,id.vars='var',measure.vars=c(1:3))
mydf.densities <- do.call('rbind',lapply(unique(mydf.wide$variable), function(var) {
tmp <- mydf.wide[which(mydf.wide$variable==var),c('var','value')]
dfit <- density(tmp$value,cut=0)
scaledy <-dfit$y/max(dfit$y) * diff(range(tmp$var)) + min(tmp$var)
data.frame(x=dfit$x,y=scaledy,variable=rep(var,length(dfit$x)))
}))
ggplot(data=mydf.wide,aes(x=value,y=var)) +
geom_point(colour='red') +
geom_smooth(method='lm') +
geom_line(aes(x=x,y=y),data=mydf.densities) +
facet_wrap(~variable,scale='free_x')
(I know that the construction of mydf.densities is a bit obfuscated, but I will work on that later).
I'm giving out the bounty to the most voted solution at the end of the day, for your troubles.

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Resources