Why is facet_grid placing the distributions in the wrong quadrants? - r

When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?

Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.

Related

R - Bar Plot with transparency based on values?

I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale

Plot with two different x axis for the same variable in R

I am trying to create a plot that displays a line with two x axis, one is a continuous numeric and the other is discrete.
This an example of the data:
df <-cbind.data.frame("Category"=c("A","A","A","A","A","B","B","B","B","B"),
"Y"=c(5,6,4,8,9,4,5,3,7,8),
"X1"=c(0,10,20,30,40,0,10,20,30,40),
"X2"=c(0,0,1,1,2,0,1,2,2,3))
I tried to add a secondary axis and re-scale it, but since my two variables are not proportional I don't know how to re-scale so the same Y point in the line will fit both x axis.
ggplot(data=df) +
geom_path(aes(y=Y,x=X1),color="red")+
geom_path(aes(y=Y,x=X2*10),color="blue")+
facet_wrap(~Category)+
scale_y_continuous("Y")+
scale_x_continuous("X1",sec.axis = sec_axis(~ .*1/10, "X2"))
I read different problems with two axis, but was not able to find a solution for my problem.
I am looking for something like this:
I will appreciate a lot any help on this!
The plot you provide does not evidence a clear algebraic relationship, so I'm going to give you an example of a completely-arbitrary second x-axis.
library(ggplot2)
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
scale_x_continuous(sec.axis=sec_axis(~., breaks=c(15,20,30), labels=c('a','b','c')))
The first argument is the transformation "~." (essentially x2=x1) and is required, so in this case it's a 1-for-1 transformation. The other two are relatively clear, you place 'a' at x=15, 'b' at x=20, etc. I don't think there's a way to put both on the same axis (with ggplot2 alone).

Dual y axis (second axis) use in ggplot2

I come to encounter a problem that using two different data with the help of second axis function as described in this previous post how-to-use-facets-with-a-dual-y-axis-ggplot.
I am trying to use geom_point and geom_bar but the since the geom_bar data range is different it is not seen on the graph.
Here is what I have tried;
point_data=data.frame(gr=seq(1,10),point_y=rnorm(10,0.25,0.1))
bar_data=data.frame(gr=seq(1,10),bar_y=rnorm(10,5,1))
library(ggplot2)
sec_axis_plot <- ggplot(point_data, aes(y=point_y, x=gr,col="red")) + #Enc vs Wafer
geom_point(size=5.5,alpha=1,stat='identity')+
geom_bar(data=bar_data,aes(x = gr, y = bar_y, fill = gr),stat = "identity") +
scale_y_continuous(sec.axis = sec_axis(trans=~ .*15,
name = 'bar_y',breaks=seq(0,10,0.5)),breaks=seq(0.10,0.5,0.05),limits = c(0.1,0.5),expand=c(0,0))+
facet_wrap(~gr, strip.position = 'bottom',nrow=1)+
theme_bw()
as it can be seen that bar_data is removed. Is is possible to plot them together in this context ??
thx
You're running into problems here because the transformation of the second axis is only used to create the second axis -- it has no impact on the data. Your bar_data is still being plotted on the original axis, which only goes up to 0.5 because of your limits. This prevents the bars from appearing.
In order to make the data show up in the same range, you have to normalize the bar data so that it falls in the same range as the point data. Then, the axis transformation has to undo this normalization so that you get the appropriate tick labels. Like so:
# Normalizer to bring bar data into point data range. This makes
# highest bar equal to highest point. You can use a different
# normalization if you want (e.g., this could be the constant 15
# like you had in your example, though that's fragile if the data
# changes).
normalizer <- max(bar_data$bar_y) / max(point_data$point_y)
sec_axis_plot <- ggplot(point_data,
aes(y=point_y, x=gr)) +
# Plot the bars first so they're on the bottom. Use geom_col,
# which creates bars with specified height as y.
geom_col(data=bar_data,
aes(x = gr,
y = bar_y / normalizer)) + # NORMALIZE Y !!!
# stat="identity" and alpha=1 are defaults for geom_point
geom_point(size=5.5) +
# Create second axis. Notice that the transformation undoes
# the normalization we did for bar_y in geom_col.
scale_y_continuous(sec.axis = sec_axis(trans= ~.*normalizer,
name = 'bar_y')) +
theme_bw()
This gives you the following plot:
I removed some of your bells and whistles to make the axis-specific stuff more clear, but you should be able to add it back in no problem. A couple of notes though:
Remember that the second axis is created by a 1-1 transformation of the primary axis, so make sure they cover the same limits under the transformation. If you have bars that should go to zero, the primary axis should include the untransformed analogue of zero.
Make sure that the data normalization and the axis transformation undo each other so that your axis lines up with the values you're plotting.

ggplot stat_density with facet_wrap and single stat_density doesn't match [duplicate]

When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?
Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.

How can I plot the relative proportions of two groups using a fill aesthetic in ggplot2?

How can I plot the relative proportions of two groups using a fill aesthetic in ggplot2?
I am asking this question here because several other answers on this topic seem incorrect (ex1, ex2, and ex3), but Cross Validated seems to have functionally banned R specific questions (CV meta). ..density.. is conceptually related to, but distinct from proportions (ex4 and ex5). So the correct answer does not seem to involve density.
Example:
set.seed(1200)
test <- data.frame(
test1 = factor(sample(letters[1:2], 100, replace = TRUE,prob=c(.25,.75)),ordered=TRUE,levels=letters[1:2]),
test2 = factor(sample(letters[3:8], 100, replace = TRUE),ordered=TRUE,levels=letters[3:8])
)
ggplot(test, aes(test2)) + geom_bar(aes(y = ..density.., group=test1, fill=test1) ,position="dodge")
#For example, the plotted data shows level a x c as being slightly in excess of .15, but a manual calculation shows a value of .138
counts <- with(test,table(test1,test2))
counts/matrix(rowSums(counts),nrow=2,ncol=6)
The answer that seems to yield an output that is correct resorts to a solution that doesn't use ggplot2 (calculating it outside of ggplot2) or requires that a panel be used rather than a fill aesthetic.
Edit: Digging into stat_bin yields that the function ultimately called is bin, but bin only gets passed the values in the x aes. Without rewriting stat_bin (or making another stat_) the hack that was applied in the above referenced answer can be generalized to the fill aes in the absence of the group aes with the following code for the y aes: y = ..count../sapply(fill, FUN=function(x) sum(count[fill == x])). This just replaces PANEL (the hidden column that is present at the end of StatBin) with fill). Presumably other hidden variables could get the same treatment.
This is an aweful hack, but it seems to do what you want...
ggplot(test, aes(test2)) + geom_bar(aes(y = ..count../rep(c(sum(..count..[1:6]), sum(..count..[7:12])), each=6),
group=test1, fill=test1) ,position="dodge") +
scale_y_continuous(name="proportion")

Resources