Showing multiple axis labels using ggplot2 with facet_wrap in R - r

I've got a nice facet_wrap density plot that I have created with ggplot2. I would like for each panel to have x and y axis labels instead of only having the y axis labels along the left side and the x labels along the bottom. What I have right now looks like this:
library(ggplot2)
myGroups <- sample(c("Mo", "Larry", "Curly"), 100, replace=T)
myValues <- rnorm(300)
df <- data.frame(myGroups, myValues)
p <- ggplot(df) +
geom_density(aes(myValues), fill = alpha("#335785", .6)) +
facet_wrap(~ myGroups)
p
Which returns:
(source: cerebralmastication.com)
It seems like this should be simple, but my Google Fu has been too poor to find an answer.

You can do this by including the scales="free" option in your facet_wrap call:
myGroups <- sample(c("Mo", "Larry", "Curly"), 100, replace=T)
myValues <- rnorm(300)
df <- data.frame(myGroups, myValues)
p <- ggplot(df) +
geom_density(aes(myValues), fill = alpha("#335785", .6)) +
facet_wrap(~ myGroups, scales="free")
p

Short answer: You can't do that. It might make sense with 3 graphs, but what if you had a big lattice of 32 graphs? That would look noisy and bad. GGplot's philosophy is about doing the right thing with a minimum of customization, which means, naturally, that you can't customize things as much as other packages.
Long answer: You could fake it by constructing three separate ggplot objects and combining them. But it's not a very general solution. Here's some code from Hadley's book that assumes you've created ggplot objects a, b, and c. It puts a in the top row, with b and c in the bottom row.
grid.newpage()
pushViewport(viewport(layout=grid.layout(2,2)))
vplayout<-function(x,y)
viewport(layout.pos.row=x,layout.pos.col=y)
print(a,vp=vplayout(1,1:2))
print(b,vp=vplayout(2,1))
print(c,vp=vplayout(2,2))

Related

log-scaled density plot: ggplot2 and freqpoly, but with points instead of lines

What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))

Scatterplot with ugly margins when using log scale

I have a somewhat "weird" two-dimensional distribution (not normal with some uniform values, but it kinda looks like this.. this is just a minimal reproducible example), and want to log-transform the values and plot them.
library("ggplot2")
library("scales")
df <- data.frame(x = c(rep(0,200),rnorm(800, 4.8)), y = c(rnorm(800, 3.2),rep(0,200)))
Without the log transformation, the scatterplot (incl. rug plot which I need) works (quite) well, apart from a marginally narrower rug plot on the x axis:
p <- ggplot(df, aes(x, y)) + geom_point() + geom_rug(alpha = I(0.5)) + theme_minimal()
p
When plotting the same with a log10-transform though, the points at the margin (at x = 0 and y = 0, respectively) are plotted outside the rug plot or just on the axis (with other data, and only one half side of a point is visible).
p + scale_x_log10() + scale_y_log10()
How can I "rescale" the axes so that all the points are contained fully within the grid and the rug plots are unaffected, as in the first example?
Maybe you want
p + scale_x_log10(oob=squish_infinite) + scale_y_log10(oob=squish_infinite)
I don't really know what you expect to happen for those values that can be negative or infinite, but one general advice when transformations don't do what you want is to perform them outside of ggplot2. Something like this might be useful,
library(plyr)
df2 <- colwise(log10)(df) # log transform columns
df2 <- colwise(squish_infinite)(df2) # do something with infinites
p %+% df2 # plot the transformed data

How to plot stacked point histograms?

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)

Different Plottypes in facet_grid

once again Im confronted with a complicated ggplot. I want to plot different plottypes within one plot using facet grid.
I hope I can make my point clear using the following example:
I want to produce a plot similar to the first picture but the upper plot should look like the second picture.
I already found the trick using the subset function but I can't add vertical lines to only one plot let alone two or three (or specify the color).
CODE:
a <- rnorm(100)
b <- rnorm(100,8,1)
c <- rep(c(0,1),50)
dfr <- data.frame(a=a,b=b,c=c,d=seq(1:100))
dfr_melt <- melt(dfr,id.vars="d")
#I want only two grids, not three
ggplot(dfr_melt,aes(x=d,y=value)) + facet_grid(variable~.,scales="free")+
geom_line(subset=.(variable=="a")) + geom_line(subset=.(variable=="b"))
#Upper plot should look like this
ggplot(dfr,aes(x=d,y=a)) + geom_line() + geom_line(aes(y=c,color="c"))+
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
If I understand your question correctly, you just need to a variable column to dfr in order to allow the faceting to work:
dfr$variable = "a"
ggplot(subset(dfr_melt, variable=="a"),aes(x=d,y=value)) +
facet_grid(variable~.,scales="free")+
geom_line(data=subset(dfr_melt,variable=="a")) +
geom_line(data=subset(dfr_melt, variable=="b")) +
geom_line(data=dfr, aes(y=c, colour=factor(c))) +
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
Notice that my plot doesn't have the zig-zig line, this is because I changed:
#This is almost certainly not what you want
geom_line(data=dfr, aes(y=c, colour="c"))
to
#I made c a factor since it only takes the values 0 or 1
geom_line(data=dfr, aes(y=c, colour=factor(c)))
##Alternatively, you could have
geom_line(data=dfr, aes(y=c), colour="red") #or
geom_line(data=dfr, aes(y=c, colour=c)) #or
To my knowledge, you can't put multiple plot types in a single plot using facet.grid(). Your two options, as far as I can see, are
to put empty data in the first facet, so the lines are 'there' but not displayed, or
to combine multiple plots into one using viewports.
I think the second solution is more general, so that's what I did:
#name each of your plots
p2 <- ggplot(subset(dfr_melt, variable=="a"),aes(x=d,y=value)) + facet_grid(variable~.,scales="free")+
geom_line(subset=.(variable=="a")) + geom_line(subset=.(variable=="b"))
#Upper plot should look like this
p1 <- ggplot(dfr,aes(x=d,y=a)) + geom_line() + geom_line(aes(y=c,color="c"))+
geom_hline(aes(yintercept=1),linetype="dashed")+
geom_hline(aes(yintercept=-2),linetype="dashed")
#From Wickham ggplot2, p154
vplayout <- function(x,y) {
viewport(layout.pos.row=x, layout.pos.col=y)
}
require(grid)
png("myplot.png", width = 600, height = 300) #or use a different device, e.g. quartz for onscreen display on a mac
grid.newpage()
pushViewport(viewport(layout=grid.layout(2, 1)))
print(p1, vp=vplayout(1, 1))
print(p2, vp=vplayout(2, 1))
dev.off()
You might need to fiddle a bit to get them to line up exactly right. Turning off the faceting on the upper plot, and moving the legend on the lower plot to the bottom, should do the trick.

ggplot2 clustering in R

can someone point me in the right direction to making a plot like this one w/ ggplot2? even just the function type.
I've been looking around in ggplot2 and can't find anything like this.
I'm assuming that the essential features of the plot are that:
a.) the x-axis is categorical, and
b.) the x-positions of the points are varied slightly,
c.) some summary statistic (I used medians).
If that's what you're looking for,
require(ggplot2)
require(plyr)
#define the data
lev <- gl(2, 10, 20, labels=c("I", "II"))
y <- runif(20)
df <- data.frame(lev, y)
#calculate the medians - I'm guessing that's what the horiz lines are?
meds <- ddply(df, .(lev), summarise, med = median(y))
ggplot(df, aes(x=lev, y=y, colour=lev)) +
geom_point(position="jitter") +
theme_bw() +
scale_colour_manual(values=c("red", "darkblue")) +
geom_errorbar(data=meds, aes(x=lev, y=med, ymin=med, ymax=med))
You can use annotate() to add the numbers and the little bracket if that is important.

Resources