I am trying to make a manual colour scale for my bar graph using plyr to summarize the data and ggplot2 to present the graph.
The data has two variables:
Region (displayed on the X-axis)
Genotype (displayed by the fill)
I have managed to do this already, however, I have not been able to find a way to personalize the colours - it simply gives me two randomly assigned colours.
Could someone please help me figure out what I am missing here?
I have included my code and an image of the graph below. The graph basically has the appearance I want it to, except that I can't personalize the colours.
ggplotdata <- summarySE(data, measurevar="Density", groupvars=c("Genotype", "Region"))
ggplotdata
#Plot the data
ggplotdata$Genotype <- factor(ggplotdata$Genotype, c("WT","KO"))
Mygraph <-ggplot(ggplotdata, aes(x=Region, y=Density, fill=Genotype)) +
geom_bar(position=position_dodge(), stat="identity",
colour="black",
size=.2) +
geom_errorbar(aes(ymin=Density-se, ymax=Density+se),
width=.2,
position=position_dodge(.9)) +
xlab(NULL) +
ylab("Density (cells/mm2)") +
scale_colour_manual(name=NULL,
breaks=c("KO", "WT"),
labels=c("KO", "WT"),
values=c("#FFFFFF", "#3366FF")) +
ggtitle("X") +
scale_y_continuous(breaks=0:17*500) +
theme_minimal()
Mygraph
The answer here was to use scale_fill_manual instead, thank you #dc37
Problem
I have 4 graphs that I want to display using grid.arrange(). When I display them individually, they look like this:
But when I use grid.arrange(), they become distorted
with them individually looking like
Specific Issues:
The x-axis labels do not scale and overlap, making them unreadable.
The subtitles get cutoff.
Goal
I want to reproduce each plot exactly like the first ideal case in a grid with grid.arrange(). One possible way might be to convert each plot to an image and then use grid.arrange() but I don't know how to do this.
Reproducible Example
Below is an example reproducible code that shows the problem I am having.
p1 <- ggplot(subset(mtcars, cyl = 4), aes(wt, mpg, colour = cyl)) + geom_point() + labs(title = "TITLE-TITLE-TITLE-TITLE-TITLE-TITLE", subtitle = "-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-") +theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))
p2 <- ggplot(subset(mtcars, cyl = 4), aes(wt, mpg, colour = cyl)) + geom_point() + labs(title = "TITLE-TITLE-TITLE-TITLE-TITLE-TITLE", subtitle = "-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-subtitle-") +theme(plot.title = element_text(hjust = 0.5),plot.subtitle = element_text(hjust = 0.5))
grid.arrange(p1, p2, ncol = 2)
When you display those graphs individually they simply have more space. So, those are natural distortions and there are perhaps only three ways to solve that.
When exporting the combined graph, make it big enough. If the individual one looks good in 6x5 inches, then surely the combined one will look good in 12x10 inches.
Give correspondingly less space for the problematic parts: x-axis labels and the subtitle. For instance, use something like element_text(size = 6) for plot.subtitle and axis.title.x, add \n to the subtitles and even x-axis labels, try something like element_text(angle = 30) for the latter as well.
Get rid of something unnecessary. As #Richard Telford suggests in the comments, using facet_wrap should work better. That would be due to, e.g., not repeating the y-axis labels and, hence, giving more horizontal space.
I am creating a number of plots using ggplot2 in R and want a way to standardize implementation of a cutoff line. I have data on a number of different measures for four cities over a ~10 year time period. I've plotted them as line graphs with each city a different color within a given graph. I will be creating a plot for each of the different measures I have (around 20).
On each of these graphs, I need to put two cutoff lines (with a word next to them) representing implementation of some policy so that people reading the graphs can easily identify the difference between performance before and after the implementation. Below is approximately the code I'm currently using.
gg_plot1<- ggplot(data=ggdata, aes(x=Year, y=measure1, group=Area, color=Area)) +
geom_vline(xintercept=2011, color="#EE0000") +
geom_text(aes(x=2011, label="City1\n", y=0.855), color="#EE0000", angle=90, hjust=0, family="serif") +
geom_vline(xintercept=2007, color="#000099") +
geom_text(aes(x=2007, label="City2", y=0.855), color="#000099", angle=0, hjust=1, family="serif") +
geom_line(size=.75) +
geom_point(size=1.5) +
scale_y_continuous(breaks=round(seq(min(ggdata$measure1, na.rm=T), max(ggdata$measure1, na.rm=T), by=0.01), 2)) +
scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
theme(axis.text.x = element_text(angle=90, vjust=1),
panel.background = element_rect(fill="white", color="white"),
panel.grid.major = element_line(color="grey95"),
text = element_text(size=11, family="serif"))
The problem with this implementation is that it relies on placing the two geom_text() on a particular place on the specific graph. These different measures all have different ranges so in order to do this I'd need to go plot by plot and find a spot to place them. What I'd prefer to do is something like force the range of each plot down by X% and put the geom_text() aligned to the bottom of the range. The lines shouldn't need adjusting (same year in every plot), just the position of the text. I found some similar questions here but none that had to do with the specific problem of placing something in the same position on different graphs with different ranges.
Is there a way to do what I'm looking for? If I had to guess, it'd something like using relative positioning rather than absolute but I haven't been able to find away to do that within ggplot. For the record, I'm aware the two geom_text()s are oriented differently. I did that to compare which we prefered but left it for you all. We will ultimately be going with the one that has the text rotated 90deg. Additionally, some of these will be faceted together so that might provide an extra layer of difficulty. Haven't gotten to that point yet.
Sidebar: an alternative way to visualize this would be to change the line from solid to dotted at the cutoff year. Is this possible? I'm not sure the client would want that but I'd love to present it as an option if anyone can point me in the direction of where to learn about how to do that.
Edit to add:
Sample data which shows what happens when running it with different y-ranges
ggdata <- data.frame(Area=rep(c("City1", "City2", "City3", "City4"), times=7),
Year=c(rep(2006,4), rep(2007,4), rep(2008,4), rep(2009,4), rep(2010,4), rep(2011,4), rep(2012,4)),
measure1=rnorm(28,10,2),
measure2=rnorm(28,50,10))
Sample plot which has the geom_text()s in the proper position, but this was done using the code above with a fixed position within the plot. When I replicate the code using a different measure that has a differnet y-range it ends up stretching the plot window.
You can use the y-range of the data to position to the text labels. I've set the y-limits explicitly in the example below, but that's not absolutely necessary unless you want to change them from the defaults. You can also adjust the x-position of the text labels using the x-range of the data. The code below will position the labels at the bottom of the plot, regardless of the y-range of the data.
I've also switched from geom_text to annotate. geom_text overplots the text labels multiple times, once for each row in the data. annotate plots the label once.
ypos = min(ggdata$measure1) + 0.005*diff(range(ggdata$measure1))
xv = 0.02
xh = 0.01
xadj = diff(range(ggdata$Year))
ggplot(data=ggdata, aes(x=Year, y=measure1, group=Area, color=Area)) +
geom_vline(xintercept=2011, color="#EE0000") +
geom_vline(xintercept=2007, color="#000099") +
geom_line(size=.75) +
geom_point(size=1.5) +
annotate(geom="text", x=2011 - xv*xadj, label="City1", y=ypos, color="#EE0000", angle=90, hjust=0, family="serif") +
annotate(geom="text", x=2007 - xh*xadj, label="City2", y=ypos, color="#000099", angle=0, hjust=1, family="serif") +
scale_y_continuous(limits=range(ggdata$measure1),
breaks=round(seq(min(ggdata$measure1, na.rm=T), max(ggdata$measure1, na.rm=T), by=1), 0)) +
scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
theme(axis.text.x = element_text(angle=90, vjust=1),
panel.background = element_rect(fill="white", color="white"),
panel.grid.major = element_line(color="grey95"),
text = element_text(size=11, family="serif"))
UPDATE: To respond to your comment, here's how you can create a separate plot for each "measure" column in your data frame.
First, we create reproducible data with three measure columns:
library(ggplot2)
library(gridExtra)
library(scales)
set.seed(4)
ggdata <- data.frame(Year=rep(2006:2012,each=4),
Area=rep(paste0("City",1:4), 7),
measure1=rnorm(28,10,2),
measure2=rnorm(28,50,10),
measure3=rnorm(28,-50,5))
Now, we take the code from above and package it in a function. The function take an argument called measure_var. This is the data column, provided as a character_string, that will provide the y-values for the plot. Note that we now use aes_string instead of aes inside ggplot.
plot_func = function(measure_var) {
ypos = min(ggdata[ , measure_var]) + 0.005*diff(range(ggdata[ , measure_var]))
xv = 0.02
xh = 0.01
xadj = diff(range(ggdata$Year))
ggplot(data=ggdata, aes_string(x="Year", y=measure_var, group="Area", color="Area")) +
geom_vline(xintercept=2011, color="#EE0000") +
geom_vline(xintercept=2007, color="#000099") +
geom_line(size=.75) +
geom_point(size=1.5) +
annotate(geom="text", x=2011 - xv*xadj, label="City1", y=ypos,
color="#EE0000", angle=90, hjust=0, family="serif") +
annotate(geom="text", x=2007 - xh*xadj, label="City2", y=ypos,
color="#000099", angle=0, hjust=1, family="serif") +
scale_y_continuous(limits=range(ggdata[ , measure_var]),
breaks=pretty_breaks(5)) +
scale_x_continuous(breaks=min(ggdata$Year):max(ggdata$Year)) +
scale_color_manual(values=c("#EE0000", "#00DDFF", "#009900", "#000099")) +
theme(axis.text.x = element_text(angle=90, vjust=1),
panel.background = element_rect(fill="white", color="white"),
panel.grid.major = element_line(color="grey95"),
text = element_text(size=11, family="serif")) +
ggtitle(paste("Plot of", measure_var))
}
We can now run the function once like this: plot_func("measure1"). However, let's run it on all the measure columns in one go by using lapply. We give lapply a vector with the names of the measure columns (names(ggdata)[grepl("measure", names(ggdata))]), and it runs plot_func on each of these columns in turn, storing the resulting plots in the list plot_list.
plot_list = lapply(names(ggdata)[grepl("measure", names(ggdata))], plot_func)
Now if we wish, we can lay them all out together using grid.arrange. In this case, we only need one legend, rather than a separate legend for each plot, so we extract the legend as a separate graphical object and lay it out beside the three plots.
# Function to get legend from a ggplot as a separate graphical object
# Source: https://github.com/tidyverse/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs/047381b48b0f0ef51a174286a595817f01a0dfad
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
# Get legend
leg = g_legend(plot_list[[1]])
# Lay out all of the plots together with a single legend
grid.arrange(arrangeGrob(grobs=lapply(plot_list, function(x) x + guides(colour=FALSE))),
leg,
ncol=2, widths=c(10,1))
I'm encountering a problem when trying to make a density plot with ggplot.
The data look a bit like in the example here.
require(ggplot2)
require(plyr)
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")), 100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")), 100, replace=TRUE))
mms.cor <- ddply(.data=mms, .(type, color), summarize, n=paste("n =", length(deliciousness)))
plot <- ggplot(data=mms, aes(x=deliciousness)) + geom_density() + facet_grid(type ~ color) + geom_text(data=mms.cor, aes(x=1.8, y=5, label=n), colour="black", inherit.aes=FALSE, parse=FALSE)
Labelling each facet with the labels work quite well unless the scales for each facet vary. Does anyone have an idea how I could achieve putting the labels at the same location when the scales per facet differ?
Best,
daniel
Something like this?
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density(aes(y=..scaled..)) + facet_grid(type ~ color) +
geom_text(data=mms.cor, aes(x=1.2, y=1.2, label=n), colour="black")
plot
There is a way to get the limits set internally by ggplot with scales="free", but it involves hacking the grob (graphics object). Since you seem to want the density plots to have equal height (???), you can do that with aes(y=..scaled...). Then setting the location for the labels is straightforward.
EDIT (Response to OP's comment)
This is what I meant by hacking the grob. Note that this takes advantage of the internal structure used by gglpot. The problem is that this could change at any time with a new version (and in fact it is already different from older versions). So there is no guarantee this code will work in the future.
plot <- ggplot(data=mms, aes(x=deliciousness)) +
geom_density() +
facet_grid(type ~ color, scales="free")
panels <- ggplot_build(plot)[["panel"]]
limits <- do.call(rbind,lapply(panels$ranges,
function(range)c(range$x.range,range$y.range)))
colnames(limits) <- c("x.lo","x.hi","y.lo","y.hi")
mms.cor <- cbind(mms.cor,limits)
plot +
geom_text(data=mms.cor, aes(x=x.hi, y=y.hi, label=n), hjust=1,colour="black")
The basic idea is to generate plot without the text, then build the graphics object using ggplot_build(plot). From this we can extract the x- and y-limits, and bind those to the labels in your mms.cor data frame. Now render the plot with the text, using these limits.
Note that the plots are different from my earlier answer because you did not use set.seed(...) in your code to generate the dataset (and I forgot to add it...).
I am new to ggplot, and using ggplot to show box plots of my data corresponding to different types like this. There are four types. I found that I can use facet_wrap to generate four different graphs.
ggplot(o.xp.sample, aes(power, reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot() +
facet_wrap(~type)
My question is, I want to combine all the four graphs into one graph such that each type has a different color (and slightly transparent to show other plots through). Is this possible?
Here is the data https://gist.github.com/anonymous/9589729
Try this:
library(ggplot2)
o.xp.sample = read.csv("C:\\...\\data.csv",sep=",")
ggplot(o.xp.sample, aes(factor(power), reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot() +
theme_bw() +
guides(fill = guide_legend(ncol = 3)) #added line as suggested by Paulo Cardoso