I have this data.frame which I want to plot in facets using ggplot + facet_wrap:
set.seed(1)
df <- data.frame(val=rnorm(36),
gt=c(sapply(c("wt","pd","md","bd"),function(x) rep(x,9))),
ts=rep(c(sapply(c("cb","hp","ac"),function(x) rep(x,3))),4),
col=c(sapply(c("darkgray","darkblue","darkred","darkmagenta"),function(x) rep(x,9))),
index=rep(1:9,4),
stringsAsFactors=F)
df$xlab <- paste(df$ts,df$index,sep=".")
df$gt <- factor(df$gt,levels=c("wt","pd","md","bd"))
Here's how I'm trying to plot:
require(ggplot2)
ggplot(df,aes(x=index,y=val,color=gt))+geom_point(size=3)+facet_wrap(~gt,ncol=4)+
scale_fill_manual(values=c("darkgray","darkblue","darkred","darkmagenta"),labels=levels(df$gt),name="gt",guide=F)+
scale_colour_manual(values=c("darkgray","darkblue","darkred","darkmagenta"),labels=levels(df$gt),name="gt",guide=F)+
labs(x="replicate",y="val")+scale_x_continuous(breaks=df$index,labels=df$xlab)+
theme_bw()+theme(axis.text=element_text(size=6),axis.title=element_text(size=7),legend.text=element_text(size=6),legend.key=element_blank(),panel.border=element_blank(),strip.background=element_blank())
Which gives:
The problem is that the x0axis tick labels repeat themselves, sinceI'm calling scale_x_continuous. How do I get it right with facet_wrap?
Use the actual x-values in xlab as the x aesthetic, along with scales="free_x" in facet_wrap and delete the call to scale_x_continuous. Note, however, that the axis labels are still the same in each panel, because they are the same for each level of gt in the data.
ggplot(df,aes(x=xlab, y=val, color=gt)) +
geom_point(size=3, show.legend=FALSE) +
facet_wrap(~gt, ncol=4, scales="free_x") +
# scale_fill_manual(values=c("darkgray","darkblue","darkred","darkmagenta"), labels=levels(df$gt), name="gt", guide=F) +
scale_colour_manual(values=c("darkgray","darkblue","darkred","darkmagenta")) +
labs(x="replicate", y="val") +
#scale_x_continuous(breaks=df$index, labels=df$xlab)+
theme_bw() +
theme(axis.text=element_text(size=8),
axis.title=element_text(size=7),
legend.text=element_text(size=6),
legend.key=element_blank(),
panel.border=element_blank(),
strip.background=element_blank())
Now let's change xlab, just to see how this works when different panels really do have different labels:
df$xlab[10:20] = LETTERS[1:11]
Now run the same plot code again to get the following:
One more contingency is the case where not all the panels have the same number of x-values. In that case, you can switch to facet_grid and add space="free_x" if you want the width of each panel to be proportional to the number of x-values in each panel.
ggplot(df[-c(1:5),], aes(x=xlab, y=val, color=gt)) +
geom_point(size=3, show.legend=FALSE) +
facet_grid(.~gt, space="free_x", scales="free_x") +
scale_colour_manual(values=c("darkgray","darkblue","darkred","darkmagenta")) +
labs(x="replicate", y="val") +
theme_bw() +
theme(axis.text=element_text(size=8),
axis.title=element_text(size=7),
legend.text=element_text(size=6),
legend.key=element_blank(),
panel.border=element_blank(),
strip.background=element_blank())
A few other things:
You don't need to add color names to your data frame. If you want to change the default color, you can just set the them using one of the scale_colour_*** functions (as you did in your code).
For future reference this c(sapply(c("darkgray","darkblue","darkred","darkmagenta"),function(x) rep(x,9))) can be changed to this rep(c("darkgray","darkblue","darkred","darkmagenta"), each=9).
You can remove the scale_fill_manual line, as you don't have a fill aesthetic in your graph.
Related
I'm trying to figure out how to add legends to my R ggplot2 graphs, but clearly I'm not getting the syntax right.
# basic plot layout
ggplot() +
labs(x="random values", y="frequency", title="Examples for F-Test") +
theme_minimal() +
# histogram of distributions
geom_histogram(data=data.frame(random.data.1), aes(x=random.data.1), fill="forestgreen", color="grey", alpha=0.5, binwidth=0.5) +
geom_histogram(data=data.frame(random.data.2), aes(x=random.data.2), fill="orange", color="black", alpha=0.5, binwidth=0.5) +
# manual text annotations
annotate("text", x=10, y=5, label=paste("F-Test p-value =", signif(F.test[[3]], digits=3)), color="firebrick", fontface="bold") +
# add legend?
scale_color_manual(name="Distributions", values=c("grey", "black"))
ggplot2 usually works better if you concatenate your data into long-form columns, as I've done here, with one or more additional columns that indicate the variables or datasets that you want to use to group formatting options. In this case, since you wanted to split by dataset, I just used "1" and "2" for the fake datasets. That column should be a factor (if it's not, then R will assume that the variable is continuous). The command you are specifically looking for is guides(), I think.
Reshaping data can be done easily with either the "reshape2" package or the "tidyr" package. This post compares them.
library(ggplot2)
random.data.1 = runif(10)
random.data.2 = runif(10)
df = data.frame(vals = c(random.data.1,random.data.2))
df$dset<-c(rep(1,10),rep(2,10)) #Indicates the dataset
df$dset<-factor(df$dset)
df
ggplot(data=df,aes(x=vals,color=dset,fill=dset,group=dset)) +
labs(x="random values", y="frequency", title="Examples for F-Test") +
#theme_minimal() +
# histogram of distributions (now you only need one line!)
geom_histogram(position="stack",alpha=0.5, binwidth=0.5) +
# manual text annotations
annotate("text", x=10, y=5, label=paste("F-Test p-value =", signif(F.test[[3]], digits=3)), color="firebrick", fontface="bold") +
# add legend?
#These lines set the colors
scale_color_manual(values=c("grey", "black")) +
scale_fill_manual(values=c("forest green","orange")) +
#and these set the legend manually
guides(color = guide_legend(title = "Distributions")) +
guides(fill=FALSE) #don't show the fill legend
I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
I have two geom_point commands applied to different data frames and would like to have a legend to specify them. However, I am not sure how to group them right for the legend. I appreciate it if you can take a look at the simple example below and help me figure out why no legend appears on the figure. Thanks!
df1=data.table(x1=c(-1,0,1), y1=c(-1,0,1))
df2=data.table(x2=c(-1,0,1), y2=c(-2,0,2))
ggplot()+
geom_point(data=df1, aes(x=x1, y=y1), color='red', group=1) +
geom_point(data=df2, aes(x=x2, y=y2), color='blue', group=2) +
xlab("X Label")+ylab("Y Label") +
scale_colour_manual(name = "My Legend",
values = group,
labels = c("database1", "database2"))
As suggested, ggplot2 likes a "tidy" way of dealing with data. In this case, it involves combining the data with an additional variable to differentiate the groups:
colnames(df2) <- c("x1","y1")
df <- rbind(transform(df1, grp='red'), transform(df2, grp='blue'))
ggplot()+
geom_point(data=df, aes(x=x1, y=y1, color=grp), group=1) +
xlab("X Label")+ylab("Y Label") +
scale_color_identity(guide="legend")
I used scale_color_identity for simplicity here, but it isn't hard to use where you started going with scale_colour_manual and relabeling them.
I'm creating a plot with ggplot that uses colored points, vertical lines, and horizontal lines to display the data. Ideally, I'd like to use two different color or linetype scales for the geom_vline and geom_hline layers, but ggplot discourages/disallows multiple variables mapped to the same aesthetic.
# Create example data
library(tidyverse)
library(lubridate)
set.seed(1234)
example.df <- data_frame(dt = seq(ymd("2016-01-01"), ymd("2016-12-31"), by="1 day"),
value = rnorm(366),
grp = sample(LETTERS[1:3], 366, replace=TRUE))
date.lines <- data_frame(dt = ymd(c("2016-04-01", "2016-10-31")),
dt.label = c("April Fools'", "Halloween"))
value.lines <- data_frame(value = c(-1, 1),
value.label = c("Threshold 1", "Threshold 2"))
If I set linetype aesthetics for both geom_*lines, they get put in the
linetype legend together, which doesn't necessarily make logical sense
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, linetype=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
Alternatively, I could set one of the lines to use a colour aesthetic,
but then that again puts the legend lines in an illogical legend
grouping
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
The only partial solution I've found is to use a fill aesthetic instead
of colour in geom_pointand setting shape=21 to use a fillable shape,
but that forces a black border around the points. I can get rid of the
border by manually setting color="white, but then the white border
covers up points. If I set colour=NA, no points are plotted.
ggplot(example.df, aes(x=dt, y=value, fill=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(shape=21, size=2, colour="white") +
scale_x_date() +
theme_minimal()
This might be a case where ggplot's "you can't have two variables mapped
to the same aesthetic" rule can/should be broken, but I can't figure out clean way around it. Using fill with geom_point shows the most promise, but there's no way to remove the point borders.
Any ideas for plotting two different color or linetype aesthetics here?
I have data that plots over time with four different variables. I would like to combine them in one plot using facet_grid, where each variable gets its own sub-plot. The following code resembles my data and the way I'm presenting it:
require(ggplot2)
require(reshape2)
subm <- melt(economics, id='date', c('psavert','uempmed','unemploy'))
mcsm <- melt(data.frame(date=economics$date, q=quarters(economics$date)), id='date')
mcsm$value <- factor(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line() +
facet_grid(variable~., scale='free_y') +
geom_step(data=mcsm, aes(date, value)) +
scale_y_discrete(breaks=levels(mcsm$value))
If I leave out scale_y_discrete, R complains that I'm trying to combine discrete value with continuous scale. If I include scale_y_discreate my continuous series miss their scale.
Is there any neat way of solving this issue ie. getting all scales correct ? I also see that the legend is alphabetically sorted, can I change that so the legend is ordered in the same order as the sub-plots ?
Problem with your data is that that for data frame subm value is numeric (continuous) but for the mcsm value is factor (discrete). You can't use the same scale for numeric and continuous values and you get y values only for the last facet (discrete). Also it is not possible to use two scale_y...() functions in one plot.
My approach would be to make mcsm value as numeric (saved as value2) and then use them - it will plot quarters as 1,2,3 and 4. To solve the problem with legend, use scale_color_discrete() and provide breaks= in order you need.
mcsm$value2<-as.numeric(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
UPDATE - solution using grobs
Another approach is to use grobs and library gridExtra to plot your data as separate plots.
First, save plot with all legends and data (code as above) as object p. Then with functions ggplot_build() and ggplot_gtable() save plot as grob object gp. Extract from gp only part that plots legend (saved as object gp.leg) - in this case is list element number 17.
library(gridExtra)
p<-ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
gp<-ggplot_gtable(ggplot_build(p))
gp.leg<-gp$grobs[[17]]
Make two new plot p1 and p2 - first plots data of subm and second only data of mcsm. Use scale_color_manual() to set colors the same as used for plot p. For the first plot remove x axis title, texts and ticks and with plot.margin= set lower margin to negative number. For the second plot change upper margin to negative number. faced_grid() should be used for both plots to get faceted look.
p1 <- ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(0.5,0.5,-0.25,0.5), "lines"),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
scale_color_manual(values=c("#F8766D","#00BFC4","#C77CFF"),guide="none")
p2 <- ggplot(data=mcsm, aes(date, value,group=1,col=variable)) + geom_step() +
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(-0.25,0.5,0.5,0.5), "lines"))+ylab("")+
scale_color_manual(values="#7CAE00",guide="none")
Save both plots p1 and p2 as grob objects and then set for both plots the same widths.
gp1 <- ggplot_gtable(ggplot_build(p1))
gp2 <- ggplot_gtable(ggplot_build(p2))
maxWidth = grid::unit.pmax(gp1$widths[2:3],gp2$widths[2:3])
gp1$widths[2:3] <- as.list(maxWidth)
gp2$widths[2:3] <- as.list(maxWidth)
With functions grid.arrange() and arrangeGrob() arrange both plots and legend in one plot.
grid.arrange(arrangeGrob(arrangeGrob(gp1,gp2,heights=c(3/4,1/4),ncol=1),
gp.leg,widths=c(7/8,1/8),ncol=2))