stat_summary_hex coloured by ratio - r

Let's say I have a data frame with the following columns: x, y, num, denom, and I would like to produce a hex plot with the colours of the hexagons set by sum(num)/sum(denom).
I assumed that the answer would involve stat_summary_hex so I naively tried:
ggplot(data, aes(x=x, y=y)) + stat_summary_hex(fun=function(d) {sum(d$num)/sum(d$denom) })
but the output is:
Error: stat_summaryhex requires the following missing aesthetics: z
and I understand why (I didn't give it a z aesthetic), but I'm not sure what to try next: how can I pass in 2 z aesthetics (i.e. num and denom)?

I ended up finding a hack to do what I wanted, which I will record here:
ggplot(data, aes(x=x,y=y,z=complex(0,num,denom))) +
stat_summary_hex(fun= function(x) { sum(Re(x)) / sum(Im(x)) })
Essentially, I did provide a z parameter, which was a column of complex numbers. Complex numbers are numbers, so ggplot lets them through, and they have two parts, a real and an imaginary part, so the aggregation function is able to compute the ratio I wanted.

Related

R - Bar Plot with transparency based on values?

I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale

geom_histogram does not show all values in x axis

Trying to run a simple and quick analysis of some variables. I run this code:
ggplot(data, aes(var1)) +
geom_bar()
Resulting in a Histogram however in spite of having only 6 possible values in var1, x Axis only shows 2,4,6. Is it possible to easily include all 6 possible values as labels?
You want to have frequency bar plot for six individual numbers. However, you wish to see all of these numbers on the X axis, which makes me think that you actually treat them as categorical data rather then numeric data, so you actually would prefer a categorical X axis which shows all the data. Turning the x into a factor should do the trick:
data <- data.frame(var1=floor(6*runif(200) + 1))
ggplot(data, aes(factor(var1))) + geom_bar()
Below: left - without factor, right - with factor.
What does your data look like?
Assuming you have a numeric x, adding scale_x_continuous(breaks = seq(1,6, by = 1))should work.
Of course this would only work if the x values go from 1 to 6... Otherwise you can replace the seq call with a vector that contains the values you want.

ggplot stat_density with facet_wrap and single stat_density doesn't match [duplicate]

When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?
Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.

Why is facet_grid placing the distributions in the wrong quadrants?

When using facet_grid(x ~ y) with ggplot2 I've seen in various examples and read in the documentation that the x variable is laid out vertically and the y variable horizontally. However, when I run the following:
set.seed(1)
b = c(rnorm(10000,mean=0,sd=0.5),rnorm(10000,mean=5,sd=0.5),
rnorm(10000,mean=7,sd=0.5),rnorm(10000,mean=10,sd=0.5))
x = c(rep('xL', 20000), rep('xR',20000))
y = c(rep('yL',10000), rep('yR',20000), rep('yL',10000))
foo = data.frame(x=x,y=y,b=b)
ggplot(data=foo, aes(foo$b)) +
geom_histogram(aes(y=..density..),breaks=seq(-5,12,by=.2),col='steelblue',fill='steelblue2') +
geom_density(col='black') +
facet_grid(x ~ y, scales='free_y')
I get the below (sorry for the quality). And even though, from above, the distribution with mean 10 is the one with (x,y) of 'xR,xL' that one appears in the bottom right quadrant which has labels 'xR,yR'. What am I doing wrong?
Change aes(foo$b) to aes(x = b) to make sure the aesthetics are mapping correctly.
You want to make sure ggplot is finding the column labelled b from the correct scope i.e. from the data that it has been passed. For example, it may be the case that ggplot rearranged your data when you passed it, so mapping the variable foo$b no longer aligns with what you want.
I'm not saying this is what happened - just an example of why calling the aesthetic from the correct scope is important.

Simple analog for plotting a line from a table object in ggplot2

I have been unable to find a simple analog for plotting a line graph from a table object in ggplot2. Given the elegance and utility of the package, I feel I must be missing something quite obvious. As an illustration consider a data frame with yearly observations:
dat<-data.frame(year=sample(c("2001":"2010"),1000, replace=T))
And a quick time series plot in base R:
plot(table(dat$year), type="l")
Switching to qplot, returns the error "attempt to apply a non-function":
qplot(table(dat$year), geom="line")
ggplot2 requires a data frame. Fair enough. But this returns the same error.
qplot(year, data=dat, geom="line")
After some searching and fiddling, I abandoned qplot, and came up with the following approach which involves specifying a line geometry, binning the counts, and dropping final values to avoid plotting zeros.
ggplot(dat, aes(year) ) + geom_line(stat = "bin", binwidth=1, drop=TRUE)
It seems like rather a long walk around the block. And it is still not entirely satisfactory, since the bins don't align precisely with the mid-year values on the x-axis. Where have I gone wrong?
Maybe still more complicated than you want, but:
qplot(Var1,Freq,data=as.data.frame(table(dat$year)),geom="line",group=1)
(the group=1 is necessary because the Year variable (Var1) is returned as a factor ...)
If you didn't need it as a one-liner you could use ytab <- as.data.frame(table(dat$year)) first to extract the table and convert it to a data frame ...
Following Brian Diggs's answer, if you're willing to construct a bit more fortify machinery you can condense this a bit more:
A utility function that converts a factor to numeric if possible:
conv2num <- function(x) {
xn <- suppressWarnings(as.numeric(as.character(x)))
if (!all(is.na(xn))) xn else x
}
And a fortify method that turns the table into a data frame and then tries to make the columns numeric:
fortify.table <- function(x,...) {
z <- as.data.frame(x)
facs <- sapply(z,is.factor)
z[facs] <- lapply(z[facs],conv2num)
z
}
Now this works almost as you would like it to:
qplot(Var1,Freq,data=table(dat$year),geom="line")
(It would be nice/easier if there were a table option to preserve the numeric nature of cross-classifying factors ...)
Expanding on Ben's answer, the "standard" approach would be to create the data frame from the table, at which point you can covert the years back into numbers.
ytab <- as.data.frame(table(dat$year))
ytab$Var1 <- as.numeric(as.character(ytab$Var1))
The either of the following will work:
ggplot(ytab, aes(Var1, Freq)) + geom_line()
qplot(Var1, Freq, data=ytab, geom="line")
The other approach is to create a fortify function which will transform the table into a data frame, and use that.
fortify.table <- as.data.frame.table
Then you can pass the table directly instead of a data frame. But Var1 is now still a factor and so you need group=1 to connect the line across years.
ggplot(table(dat$year), aes(Var1, Freq)) + geom_line(aes(group=1))
qplot(Var1, Freq, data=table(dat$year), geom="line", group=1)

Resources