Add text to ggplot with facetted densities - r

I'm encountering a problem when trying to make a density plot with ggplot.
The data look a bit like in the example here.
require(ggplot2)
require(plyr)
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")), 100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")), 100, replace=TRUE))
mms.cor <- ddply(.data=mms, .(type, color), summarize, n=paste("n =", length(deliciousness)))
plot <- ggplot(data=mms, aes(x=deliciousness)) + geom_density() + facet_grid(type ~ color) + geom_text(data=mms.cor, aes(x=1.8, y=5, label=n), colour="black", inherit.aes=FALSE, parse=FALSE)
Labelling each facet with the labels work quite well unless the scales for each facet vary. Does anyone have an idea how I could achieve putting the labels at the same location when the scales per facet differ?
Best,
daniel

Something like this?
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density(aes(y=..scaled..)) + facet_grid(type ~ color) +
geom_text(data=mms.cor, aes(x=1.2, y=1.2, label=n), colour="black")
plot
There is a way to get the limits set internally by ggplot with scales="free", but it involves hacking the grob (graphics object). Since you seem to want the density plots to have equal height (???), you can do that with aes(y=..scaled...). Then setting the location for the labels is straightforward.
EDIT (Response to OP's comment)
This is what I meant by hacking the grob. Note that this takes advantage of the internal structure used by gglpot. The problem is that this could change at any time with a new version (and in fact it is already different from older versions). So there is no guarantee this code will work in the future.
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density() +
facet_grid(type ~ color, scales="free")
panels <- ggplot_build(plot)[["panel"]]
limits <- do.call(rbind,lapply(panels$ranges,
function(range)c(range$x.range,range$y.range)))
colnames(limits) <- c("x.lo","x.hi","y.lo","y.hi")
mms.cor <- cbind(mms.cor,limits)
plot +
geom_text(data=mms.cor, aes(x=x.hi, y=y.hi, label=n), hjust=1,colour="black")
The basic idea is to generate plot without the text, then build the graphics object using ggplot_build(plot). From this we can extract the x- and y-limits, and bind those to the labels in your mms.cor data frame. Now render the plot with the text, using these limits.
Note that the plots are different from my earlier answer because you did not use set.seed(...) in your code to generate the dataset (and I forgot to add it...).

Related

Plotting normal curve over histogram using ggplot2: Code produces straight line at 0

this forum already helped me a lot for producing the code, which I expected to return a histogram of a specific variable overlayed with its empirical normal curve. I used ggplot2 and stat_function to write the code.
Unfortunately, the code produced a plot with the correct histogram but the normal curve is a straight line at zero (red line in plot produced by the following code).
For this minimal example I used the mtcars dataset - the same behavior of ggplot and stat_function is observed with my original data set.
This is the code is wrote and used:
library(ggplot2)
mtcars
hist_staff <- ggplot(mtcars, aes(x = mtcars$mpg)) +
geom_histogram(binwidth = 2, colour = "black", aes(fill = ..count..)) +
scale_fill_gradient("Count", low = "#DCDCDC", high = "#7C7C7C") +
stat_function(fun = dnorm, colour = "red")
print(hist_staff)
I also tried to specify dnorm:
stat_function(fun = dnorm(mtcars$mpg, mean = mean(mtcars$mpg), sd = sd(mtcars$mpg))
That did not work out either - an error message returned stating that the arguments are not numerical.
I hope you people can help me! Thanks a lot in advance!
Best, Jannik
Your curve and histograms are on different y scales and you didn't check the help page on stat_function, otherwise you'd've put the arguments in a list as it clearly shows in the example. You also aren't doing the aes right in your initial ggplot call. I sincerely suggest hitting up more tutorials and books (or at a minimum the help pages) vs learn ggplot piecemeal on SO.
Once you fix the stat_function arg problem and the ggplot``aes issue, you need to tackle the y axis scale difference. To do that, you'll need to switch the y for the histogram to use the density from the underlying stat_bin calculated data frame:
library(ggplot2)
gg <- ggplot(mtcars, aes(x=mpg))
gg <- gg + geom_histogram(binwidth=2, colour="black",
aes(y=..density.., fill=..count..))
gg <- gg + scale_fill_gradient("Count", low="#DCDCDC", high="#7C7C7C")
gg <- gg + stat_function(fun=dnorm,
color="red",
args=list(mean=mean(mtcars$mpg),
sd=sd(mtcars$mpg)))
gg

multiple histogram color count with facet_grid

So my problem is, I have a data frame that I am plotting with geom_hex from ggplots. and my command looks like this:
ggplot(data, aes(x=var1,y=var2))+geom_hex(bins=20)+facet_grid(fac1 ~ fac2,scales="free")
The problem I am having is that the colouring scheme for the counts is shared across all graphs. I am wondering if there is any quick way to generate a count color scheme per row (or column) of graphs. I tried playing with scales, but seems that that this only works on the scales on y and x axis, and not with the histogram colors and histogram color legend. thnx!
Here is an example of the data:
fac1<-c(rep(1, 6000), rep(2, 1000))
fac2<-c(rep("a", 3000), rep("b", 3000),rep("a", 500), rep("b", 500))
var1<-rnorm(7000)
var2<-rnorm(7000)
data<-data.frame(fac1,fac2,var1,var2)
ggplot(data, aes(x=var1,y=var2))+geom_hex(bins=20)+facet_grid(fac1 ~ fac2,scales="free")
Because there is so much more data from one factor, the color scheme is dominated by the first row of graphs, and would like to have the same coloring scheme but adjusted by the counts of every row.
Here's my comment expanded into an answer. Your safest bet is the following approach:
library(gridExtra)
p1 <- ggplot(data[data$fac1==1, ], aes(x=var1,y=var2)) +
geom_hex(bins=20) + facet_grid(fac1 ~ fac2,scales="free") + xlab("")
p2 <- ggplot(data[data$fac1==2, ], aes(x=var1,y=var2)) +
geom_hex(bins=20) + facet_grid(fac1 ~ fac2,scales="free") +
scale_fill_gradient(low = "red", high = "white")
grid.arrange(p1, p2)
Good question; based on this answer from 2010 Different legends and fill colours for facetted ggplot? Hadley Wickham indicates that you cannot have multiple legend scales per plot.
A simple way to get around this issue in your case would be to use the gridExtra package.
require('gridExtra')
p1<-ggplot(data[data$fac1==1 & data$fac2=="a",], aes(x=var1,y=var2))+geom_hex(bins=20)
p2<-ggplot(data[data$fac1==2 & data$fac2=="a",], aes(x=var1,y=var2))+geom_hex(bins=20)
p3<-ggplot(data[data$fac1==1 & data$fac2=="b",], aes(x=var1,y=var2))+geom_hex(bins=20)
p4<-ggplot(data[data$fac1==2 & data$fac2=="b",], aes(x=var1,y=var2))+geom_hex(bins=20)
grobframe <- arrangeGrob(p1,p2,p3,p4 ,ncol=2, nrow=2,
main = textGrob("Plots", gp = gpar(fontsize=12, fontface="bold.italic", fontsize=12)))
printing grobframe produces the following plot, which I believe is what you want.

easiest way to discretize continuous scales for ggplot2 color scales?

Suppose I have this plot:
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, colour=Sepal.Length)) + scale_colour_gradient()
what is the correct way to discretize the color scale, like the plot shown below the accepted answer here (gradient breaks in a ggplot stat_bin2d plot)?
ggplot correctly recognizes discrete values and uses discrete scales for these, but my question is if you have continuous data and you want a discrete colour bar for it (with each square corresponding to a value, and squares colored in a gradient still), what is the best way to do it? Should the discretizing/binning happen outside of ggplot and get put in the dataframe as a separate discrete-valued column, or is there a way to do it within ggplot? an example of what I'm looking for is similar to the scale shown here:
except I'm plotting a scatter plot and not something like geom_tile/heatmap.
thanks.
The solution is slightly complicated, because you want a discrete scale. Otherwise you could probably simply use round.
library(ggplot2)
bincol <- function(x,low,medium,high) {
breaks <- function(x) pretty(range(x), n = nclass.Sturges(x), min.n = 1)
colfunc <- colorRampPalette(c(low, medium, high))
binned <- cut(x,breaks(x))
res <- colfunc(length(unique(binned)))[as.integer(binned)]
names(res) <- as.character(binned)
res
}
labels <- unique(names(bincol(iris$Sepal.Length,"blue","yellow","red")))
breaks <- unique(bincol(iris$Sepal.Length,"blue","yellow","red"))
breaks <- breaks[order(labels,decreasing = TRUE)]
labels <- labels[order(labels,decreasing = TRUE)]
ggplot(iris) +
geom_point(aes(x=Sepal.Width, y=Sepal.Length,
colour=bincol(Sepal.Length,"blue","yellow","red")), size=4) +
scale_color_identity("Sepal.Length", labels=labels,
breaks=breaks, guide="legend")
You could try the following, I have your example code modified appropriately below:
#I am not so great at R, so I'll just make a data frame this way
#I am convinced there are better ways. Oh well.
df<-data.frame()
for(x in 1:10){
for(y in 1:10){
newrow<-c(x,y,sample(1:1000,1))
df<-rbind(df,newrow)
}
}
colnames(df)<-c('X','Y','Val')
#This is the bit you want
p<- ggplot(df, aes(x=X,y=Y,fill=cut(Val, c(0,100,200,300,400,500,Inf))))
p<- p + geom_tile() + scale_fill_brewer(type="seq",palette = "YlGn")
p<- p + guides(fill=guide_legend(title="Legend!"))
#Tight borders
p<- p + scale_x_continuous(expand=c(0,0)) + scale_y_continuous(expand=c(0,0))
p
Note the strategic use of cut to discretize the data followed by the use of color brewer to make things pretty.
The result looks as follows.

Splitting distribution visualisations on the y-axis in ggplot2 in r

The most commonly cited example of how to visualize a logistic fit using ggplot2 seems to be something very much like this:
data("kyphosis", package="rpart")
ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
geom_point() +
stat_smooth(method="glm", family="binomial")
This visualisation works great if you don't have too much overlapping data, and the first suggestion for crowded data seems to be to use injected jitter in the x and y coordinates of the points then adjust the alpha value of the points. When you get to the point where individual points aren't useful but distributions of points are, is it possible to use geom_density(), geom_histogram(), or something else to visualise the data but continue to split the categorical variable along the y-axis as it is done with geom_point()?
From what I have found, geom_density() and geom_histogram() can easily be split/grouped by the categorical variable and both levels can easily be reversed using scale_y_reverse() but I can't figure out if it is even possible to move only one of the categorical variable distributions to the top of the plot. Any help/suggestions would be appreciated.
The annotate() function in ggplot allows you to add geoms to a plot with properties that "are not mapped from the variables of a data frame, but are instead in as vectors," meaning that you can add layers that are unrelated to your data frame. In this case your two density curves are related to the data frame (since the variables are in it), but because you're trying to position them differently, using annotate() is useful.
Here's one way to go about it:
data("kyphosis", package="rpart")
model.only <- ggplot(data=kyphosis, aes(x=Age, y = as.numeric(Kyphosis) - 1)) +
stat_smooth(method="glm", family="binomial")
absents <- subset(kyphosis, Kyphosis=="absent")
presents <- subset(kyphosis, Kyphosis=="present")
dens.absents <- density(absents$Age)
dens.presents <- density(presents$Age)
scaling.factor <- 10 # Make the density plots taller
model.only + annotate("line", x=dens.absents$x, y=dens.absents$y*scaling.factor) +
annotate("line", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1)
This adds two annotated layers with scaled density plots for each of the kyphosis groups. For the presents variable, y is scaled and increased by 1 to shift it up.
You can also fill the density plots instead of just using a line. Instead of annotate("line"...) you need to use annotate("polygon"...), like so:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green", colour="black", alpha=0.4)
Technically you could use annotate("density"...), but that won't work when you shift the present plot up by one. Instead of shifting, it fills the whole plot:
model.only + annotate("density", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red") +
annotate("density", x=dens.presents$x, y=dens.presents$y*scaling.factor + 1, fill="green")
The only way around that problem is to use a polygon instead of a density geom.
One final variant: flipping the top density plot along y-axis = 1:
model.only + annotate("polygon", x=dens.absents$x, y=dens.absents$y*scaling.factor, fill="red", colour="black", alpha=0.4) +
annotate("polygon", x=dens.presents$x, y=(1 - dens.presents$y*scaling.factor), fill="green", colour="black", alpha=0.4)
I am not sure I get your point, but here an attempt:
dat <- rbind(kyphosis,kyphosis)
dat$grp <- factor(rep(c('smooth','dens'),each = nrow(kyphosis)),
levels = c('smooth','dens'))
ggplot(dat,aes(x=Age)) +
facet_grid(grp~.,scales = "free_y") +
#geom_point(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1)) +
stat_smooth(data=subset(dat,grp=='smooth'),aes(y = as.numeric(Kyphosis) - 1),
method="glm", family="binomial") +
geom_density(data=subset(dat,grp=='dens'))

ggplot2 multiple stat_binhex() plots with different color gradients in one image

I'd like to use ggplot2's stat_binhex() to simultaneously plot two independent variables on the same chart, each with its own color gradient using scale_colour_gradientn().
If we disregard the fact that the x-axis units do not match, a reproducible example would be to plot the following in the same image while maintaining separate fill gradients.
d <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some file>,height=6,width=8))
d <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some other file>,height=6,width=8))
I found some conversation of a related issue in ggplot2 google groups here.
Here is another possible solution: I have taken #mnel's idea of mapping bin count to alpha transparency, and I have transformed the x-variables so they can be plotted on the same axes.
library(ggplot2)
# Transforms range of data to 0, 1.
rangeTransform = function(x) (x - min(x)) / (max(x) - min(x))
dat = diamonds
dat$norm_carat = rangeTransform(dat$carat)
dat$norm_depth = rangeTransform(dat$depth)
p1 = ggplot(data=dat) +
theme_bw() +
stat_binhex(aes(x=norm_carat, y=price, alpha=..count..), fill="#002BFF") +
stat_binhex(aes(x=norm_depth, y=price, alpha=..count..), fill="#FFD500") +
guides(fill=FALSE, alpha=FALSE) +
xlab("Range Transformed Units")
ggsave(plot=p1, filename="plot_1.png", height=5, width=5)
Thoughts:
I tried (and failed) to display a sensible color/alpha legend. Seems tricky, but should be possible given all the legend-customization features of ggplot2.
X-axis unit labeling needs some kind of solution. Plotting two sets of units on one axis is frowned upon by many, and ggplot2 has no such feature.
Interpretation of cells with overlapping colors seems clear enough in this example, but could get very messy depending on the datasets used, and the chosen colors.
If the two colors are additive complements, then wherever they overlap equally you will see a neutral gray. Where the overlap is unequal, the gray would shift to more yellow, or more blue. My colors are not quite complements, judging by the slightly pink hue of the gray overlap cells.
I think what you want goes against the principles of ggplot2 and the grammar of graphics approach more generally. Until the issue is addressed (for which I would not hold my breath), you have a couple of choices
Use facet_wrap and alpha
This is will not produce a nice legend, but takes you someway to what you want.
You can set the alpha value to scale by the computed Frequency, accessed by ..Frequency..
I don't think you can merge the legends nicely though.
library(reshape2)
# in long format
dm <- melt(diamonds, measure.var = c('depth','carat'))
ggplot(dm, aes(y = price, fill = variable, x = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_x') +
stat_binhex(aes(alpha = ..count..), colour = 'grey80') +
scale_alpha(name = 'Frequency', range = c(0,1)) +
theme_bw() +
scale_fill_manual('Variable', values = setNames(c('darkblue','yellow4'), c('depth','carat')))
Use gridExtra with grid.arrange or arrangeGrob
You can create separate plots and use gridExtra::grid.arrange to arrange on a single image.
d_carat <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
d_depth <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
library(gridExtra)
grid.arrange(d_carat, d_depth, ncol =1)
If you want this to work with ggsave (thanks to #bdemarest comment below and #baptiste)
replace grid.arrange with arrangeGrob something like.
ggsave(plot=arrangeGrob(d_carat, d_depth, ncol=1), filename="plot_2.pdf", height=12, width=8)

Resources