So my problem is, I have a data frame that I am plotting with geom_hex from ggplots. and my command looks like this:
ggplot(data, aes(x=var1,y=var2))+geom_hex(bins=20)+facet_grid(fac1 ~ fac2,scales="free")
The problem I am having is that the colouring scheme for the counts is shared across all graphs. I am wondering if there is any quick way to generate a count color scheme per row (or column) of graphs. I tried playing with scales, but seems that that this only works on the scales on y and x axis, and not with the histogram colors and histogram color legend. thnx!
Here is an example of the data:
fac1<-c(rep(1, 6000), rep(2, 1000))
fac2<-c(rep("a", 3000), rep("b", 3000),rep("a", 500), rep("b", 500))
var1<-rnorm(7000)
var2<-rnorm(7000)
data<-data.frame(fac1,fac2,var1,var2)
ggplot(data, aes(x=var1,y=var2))+geom_hex(bins=20)+facet_grid(fac1 ~ fac2,scales="free")
Because there is so much more data from one factor, the color scheme is dominated by the first row of graphs, and would like to have the same coloring scheme but adjusted by the counts of every row.
Here's my comment expanded into an answer. Your safest bet is the following approach:
library(gridExtra)
p1 <- ggplot(data[data$fac1==1, ], aes(x=var1,y=var2)) +
geom_hex(bins=20) + facet_grid(fac1 ~ fac2,scales="free") + xlab("")
p2 <- ggplot(data[data$fac1==2, ], aes(x=var1,y=var2)) +
geom_hex(bins=20) + facet_grid(fac1 ~ fac2,scales="free") +
scale_fill_gradient(low = "red", high = "white")
grid.arrange(p1, p2)
Good question; based on this answer from 2010 Different legends and fill colours for facetted ggplot? Hadley Wickham indicates that you cannot have multiple legend scales per plot.
A simple way to get around this issue in your case would be to use the gridExtra package.
require('gridExtra')
p1<-ggplot(data[data$fac1==1 & data$fac2=="a",], aes(x=var1,y=var2))+geom_hex(bins=20)
p2<-ggplot(data[data$fac1==2 & data$fac2=="a",], aes(x=var1,y=var2))+geom_hex(bins=20)
p3<-ggplot(data[data$fac1==1 & data$fac2=="b",], aes(x=var1,y=var2))+geom_hex(bins=20)
p4<-ggplot(data[data$fac1==2 & data$fac2=="b",], aes(x=var1,y=var2))+geom_hex(bins=20)
grobframe <- arrangeGrob(p1,p2,p3,p4 ,ncol=2, nrow=2,
main = textGrob("Plots", gp = gpar(fontsize=12, fontface="bold.italic", fontsize=12)))
printing grobframe produces the following plot, which I believe is what you want.
Related
I have a rather long timeseries that I want to plot in ggplot, but it's sufficiently long that even using the full width of the page it's barely readable.
What I want to do instead is to divide the plot into 2 (or more, in the general case) panels one on top of each other.
I could do it manually but not only it's cumbersome but also it's hard to get the axis to have the same scale. Ideally I would like to have something like this:
ggplot(data, aes(time, y)) +
geom_line() +
facet_time(time, n = 2)
And then get something like this:
(This plot was made using facet_wrap(~(year(as.Date(time)) > 2000), ncol = 1, scales = "free_x"), which messes up x axis scale, it works only for 2 panels, and doesn't work well with geom_smooth())
Also, ideally it would also handle summary statistics correctly. For example, using the correct data for geom_smooth() (so facetting wouldn't do it, because at the beginning of every facet it would not use the data in the last chunk of the previous one).
Is there a way to do this?
Thank you!
Below I create two separate plots, one for the period 1982-1999 and one for 1999-2016 and then lay them out using grid.arrange from the gridExtra package. The horizontal axes are scaled equivalently in both plots.
I also generate regression lines outside of ggplot using the loess function so that it can be added using geom_line (you can of course use any regression function here, such as lm, gam, splines, etc). With this approach the regression can be run on the entire time series, ensuring continuity of the regression line across the two panels, even though we break the time series into two halves for plotting.
library(dplyr) # For the chaining (%>%) operator
library(purrr) # For the map function
library(gridExtra) # For the grid.arrange function
Function to extract a legend from a ggplot. We'll use this to get one legend across two separate plots.
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Fake data
set.seed(255)
dat = data.frame(time=rep(seq(1982,2016,length.out=500),2),
value= c(arima.sim(list(ar=c(0.4, 0.05, 0.5)), n=500),
arima.sim(list(ar=c(0.3, -0.3, 0.6)), n=500)),
group=rep(c("A","B"), each=500))
Generate smoother lines using loess: We want a separate regression line for each level of group, so we use group_by with the chaining operator from dplyr:
dat = dat %>% group_by(group) %>%
mutate(smooth = predict(loess(value ~ time, span=0.1)))
Create a list of two plots, one for each time period: We use map to create separate plots for each time period and return a list with the two plot objects as elements (you can also use base lapply for this instead of map):
pl = map(list(c(1982,1999), c(1999,2016)),
~ ggplot(dat %>% filter(time >= .x[1], time <= .x[2]),
aes(colour=group)) +
geom_line(aes(time, value), alpha=0.5) +
geom_line(aes(time, smooth), size=1) +
scale_x_continuous(breaks=1982:2016, expand=c(0.01,0)) +
scale_y_continuous(limits=range(dat$value)) +
theme_bw() +
labs(x="", y="", colour="") +
theme(strip.background=element_blank(),
strip.text=element_blank(),
axis.title=element_blank()))
# Extract legend as a separate graphics object
leg = g_legend(pl[[1]])
Finally, we lay out both plots (after removing legends) plus the extracted legend:
grid.arrange(arrangeGrob(grobs=map(pl, function(p) p + guides(colour=FALSE)), ncol=1),
leg, ncol=2, widths=c(10,1), left="Value", bottom="Year")
You can do this by storing the plot object, then printing it twice. Each time add an option coord_cartesian:
orig_plot <- ggplot(data, aes(time, y)) +
geom_line()
early <- orig_plot + coord_cartesian(xlim = c(1982, 2000))
late <- orig_plot + coord_cartesian(xlim = c(2000, 2016))
That makes sure that both plots use all the data.
To plot them on the same page, use grid (I got this from the ggplot2 book, which is probably around as a pdf somewhere):
library(grid)
vp1 <- viewport(width = 1, height = .5, just = c("center", "bottom"))
vp2 <- viewport(width = 1, height = .5, just = c("center", "top"))
print(early, vp = vp1)
print(late, vp = vp2)
I have data from 2 populations.
I'd like to get the histogram and density plot of both on the same graphic.
With one color for one population and another color for the other one.
I've tried this (example):
library(ggplot2)
AA <- rnorm(100000, 70,20)
BB <- rnorm(100000,120,20)
valores <- c(AA,BB)
grupo <- c(rep("AA", 100000),c(rep("BB", 100000)))
todo <- data.frame(valores, grupo)
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram(aes(y=..density..), binwidth=3)+ geom_density(aes(color=grupo))
But I'm just getting a graphic with a single line and a single color.
I would like to have different colors for the the two density lines. And if possible the histograms as well.
I've done it with ggplot2 but base R would also be OK.
or I don't know what I've changed and now I get this:
ggplot(todo, aes(x=valores, fill=grupo, color=grupo)) +
geom_histogram( position="identity", binwidth=3, alpha=0.5)+
geom_density(aes(color=grupo))
but the density lines were not plotted.
or even strange things like
I suggest this ggplot2 solution:
ggplot(todo, aes(valores, color=grupo)) +
geom_histogram(position="identity", binwidth=3, aes(y=..density.., fill=grupo), alpha=0.5) +
geom_density()
#skan: Your attempt was close but you plotted the frequencies instead of density values in the histogram.
A base R solution could be:
hist(AA, probability = T, col = rgb(1,0,0,0.5), border = rgb(1,0,0,1),
xlim=range(AA,BB), breaks= 50, ylim=c(0,0.025), main="AA and BB", xlab = "")
hist(BB, probability = T, col = rgb(0,0,1,0.5), border = rgb(0,0,1,1), add=T)
lines(density(AA))
lines(density(BB), lty=2)
For alpha I used rgb. But there are more ways to get it in. See alpha() in the scales package for instance. I added also the breaks parameter for the plot of the AAs to increase the binwidth compared to the BB group.
If I have a dataframe like this:
obs<-rnorm(20)
d<-data.frame(year=2000:2019,obs=obs,pred=obs+rnorm(20,.1))
d$pup<-d$pred+.5
d$plow<-d$pred-.5
d$obs[20]<-NA
d
And I want the observation and model prediction error bars to look something like:
(p1<-ggplot(data=d)+aes(x=year)
+geom_point(aes(y=obs),color='red',shape=19)
+geom_point(aes(y=pred),color='blue',shape=3)
+geom_errorbar(aes(ymin=plow,ymax=pup))
)
How do I add a legend/scale/key identifying the red points as observations and the blue plusses with error bars as point predictions with ranges?
Here is one solution melting pred/obs into one column. Can't post image due to rep.
library(ggplot2)
obs <- rnorm(20)
d <- data.frame(dat=c(obs,obs+rnorm(20,.1)))
d$pup <- d$dat+.5
d$plow <- d$dat-.5
d$year <- rep(2000:2019,2)
d$lab <- c(rep("Obs", 20), rep("Pred", 20))
p1<-ggplot(data=d, aes(x=year)) +
geom_point(aes(y = dat, colour = factor(lab), shape = factor(lab))) +
geom_errorbar(data = d[21:40,], aes(ymin=plow,ymax=pup), colour = "blue") +
scale_shape_manual(name = "Legend Title", values=c(6,1)) +
scale_colour_manual(name = "Legend Title", values=c("red", "blue"))
p1
edit: Thanks for the rep. Image added
Here is a ggplot solution that does not require melting and grouping.
set.seed(1) # for reproducible example
obs <- rnorm(20)
d <- data.frame(year=2000:2019,obs,pred=obs+rnorm(20,.1))
d$obs[20]<-NA
library(ggplot2)
ggplot(d,aes(x=year))+
geom_point(aes(y=obs,color="obs",shape="obs"))+
geom_point(aes(y=pred,color="pred",shape="pred"))+
geom_errorbar(aes(ymin=pred-0.5,ymax=pred+0.5))+
scale_color_manual("Legend",values=c(obs="red",pred="blue"))+
scale_shape_manual("Legend",values=c(obs=19,pred=3))
This creates a color and shape scale wiith two components each ("obs" and "pred"). Then uses scale_*_manual(...) to set the values for those scales ("red","blue") for color, and (19,3) for scale.
Generally, if you have only two categories, like "obs" and "pred", then this is a reasonable way to go use ggplot, and avoids merging everything into one data frame. If you have more than two categories, or if they are integral to the dataset (e.g., actual categorical variables), then you are much better off doing this as in the other answer.
Note that your example left out the column year so your code does not run.
I'm encountering a problem when trying to make a density plot with ggplot.
The data look a bit like in the example here.
require(ggplot2)
require(plyr)
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")), 100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")), 100, replace=TRUE))
mms.cor <- ddply(.data=mms, .(type, color), summarize, n=paste("n =", length(deliciousness)))
plot <- ggplot(data=mms, aes(x=deliciousness)) + geom_density() + facet_grid(type ~ color) + geom_text(data=mms.cor, aes(x=1.8, y=5, label=n), colour="black", inherit.aes=FALSE, parse=FALSE)
Labelling each facet with the labels work quite well unless the scales for each facet vary. Does anyone have an idea how I could achieve putting the labels at the same location when the scales per facet differ?
Best,
daniel
Something like this?
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density(aes(y=..scaled..)) + facet_grid(type ~ color) +
geom_text(data=mms.cor, aes(x=1.2, y=1.2, label=n), colour="black")
plot
There is a way to get the limits set internally by ggplot with scales="free", but it involves hacking the grob (graphics object). Since you seem to want the density plots to have equal height (???), you can do that with aes(y=..scaled...). Then setting the location for the labels is straightforward.
EDIT (Response to OP's comment)
This is what I meant by hacking the grob. Note that this takes advantage of the internal structure used by gglpot. The problem is that this could change at any time with a new version (and in fact it is already different from older versions). So there is no guarantee this code will work in the future.
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density() +
facet_grid(type ~ color, scales="free")
panels <- ggplot_build(plot)[["panel"]]
limits <- do.call(rbind,lapply(panels$ranges,
function(range)c(range$x.range,range$y.range)))
colnames(limits) <- c("x.lo","x.hi","y.lo","y.hi")
mms.cor <- cbind(mms.cor,limits)
plot +
geom_text(data=mms.cor, aes(x=x.hi, y=y.hi, label=n), hjust=1,colour="black")
The basic idea is to generate plot without the text, then build the graphics object using ggplot_build(plot). From this we can extract the x- and y-limits, and bind those to the labels in your mms.cor data frame. Now render the plot with the text, using these limits.
Note that the plots are different from my earlier answer because you did not use set.seed(...) in your code to generate the dataset (and I forgot to add it...).
I'd like to use ggplot2's stat_binhex() to simultaneously plot two independent variables on the same chart, each with its own color gradient using scale_colour_gradientn().
If we disregard the fact that the x-axis units do not match, a reproducible example would be to plot the following in the same image while maintaining separate fill gradients.
d <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some file>,height=6,width=8))
d <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some other file>,height=6,width=8))
I found some conversation of a related issue in ggplot2 google groups here.
Here is another possible solution: I have taken #mnel's idea of mapping bin count to alpha transparency, and I have transformed the x-variables so they can be plotted on the same axes.
library(ggplot2)
# Transforms range of data to 0, 1.
rangeTransform = function(x) (x - min(x)) / (max(x) - min(x))
dat = diamonds
dat$norm_carat = rangeTransform(dat$carat)
dat$norm_depth = rangeTransform(dat$depth)
p1 = ggplot(data=dat) +
theme_bw() +
stat_binhex(aes(x=norm_carat, y=price, alpha=..count..), fill="#002BFF") +
stat_binhex(aes(x=norm_depth, y=price, alpha=..count..), fill="#FFD500") +
guides(fill=FALSE, alpha=FALSE) +
xlab("Range Transformed Units")
ggsave(plot=p1, filename="plot_1.png", height=5, width=5)
Thoughts:
I tried (and failed) to display a sensible color/alpha legend. Seems tricky, but should be possible given all the legend-customization features of ggplot2.
X-axis unit labeling needs some kind of solution. Plotting two sets of units on one axis is frowned upon by many, and ggplot2 has no such feature.
Interpretation of cells with overlapping colors seems clear enough in this example, but could get very messy depending on the datasets used, and the chosen colors.
If the two colors are additive complements, then wherever they overlap equally you will see a neutral gray. Where the overlap is unequal, the gray would shift to more yellow, or more blue. My colors are not quite complements, judging by the slightly pink hue of the gray overlap cells.
I think what you want goes against the principles of ggplot2 and the grammar of graphics approach more generally. Until the issue is addressed (for which I would not hold my breath), you have a couple of choices
Use facet_wrap and alpha
This is will not produce a nice legend, but takes you someway to what you want.
You can set the alpha value to scale by the computed Frequency, accessed by ..Frequency..
I don't think you can merge the legends nicely though.
library(reshape2)
# in long format
dm <- melt(diamonds, measure.var = c('depth','carat'))
ggplot(dm, aes(y = price, fill = variable, x = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_x') +
stat_binhex(aes(alpha = ..count..), colour = 'grey80') +
scale_alpha(name = 'Frequency', range = c(0,1)) +
theme_bw() +
scale_fill_manual('Variable', values = setNames(c('darkblue','yellow4'), c('depth','carat')))
Use gridExtra with grid.arrange or arrangeGrob
You can create separate plots and use gridExtra::grid.arrange to arrange on a single image.
d_carat <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
d_depth <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
library(gridExtra)
grid.arrange(d_carat, d_depth, ncol =1)
If you want this to work with ggsave (thanks to #bdemarest comment below and #baptiste)
replace grid.arrange with arrangeGrob something like.
ggsave(plot=arrangeGrob(d_carat, d_depth, ncol=1), filename="plot_2.pdf", height=12, width=8)