I need to add whisker (or horizontal lines) to my multiple box-plots.
You can find my dataset here: link to data is broken...
In other words, I am plotting three variables (Mat, Ita, and Log) divided by Gender (F and M), in order to compare their box plots. I need to add an horizontal line at the end of both vertical lines in each box plot.
I am using ggplot2 package and the code I am using so far is (this code allows me to create the box plots as I need them, I only need to add the horizontal lines):
ggplot(newdata,aes(x=variable,y=value)) +
geom_boxplot(aes(fill=Gender)) +
xlab("Subject") +
ylab("Quiz score") +
ggtitle("Boxplots for quiz score and gender") +
scale_fill_manual(values=c("pink","lightblue"),labels=c("Female","Male")) +
theme(plot.title = element_text(face="bold"))
You could use stat_boxplot(geom ='errorbar')
I provide an example:
bp <- ggplot(iris, aes(factor(Species), Sepal.Width, fill = Species))
bp + geom_boxplot() + stat_boxplot(geom ='errorbar')
Result:
Related
I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
I want to overlay two sets of ggplot panels (each panel is a different country) into one, single ggplot panel, without any rescaling of any of the two plots, but ggplot rescales either one or the other.
I have tried using only one ggplot to include both variables, by doing ggplot(df, aes(x=t, y=a)), and, within that ggplot, then using geom_point and geom_smooth for the second variable (y=b), but this rescales variable a.
# plot 1
g <-ggplot(df, aes(x=year, y=a))
p <-g + geom_point(alpha=0.7) + geom_smooth(method="auto") + facet_wrap(~country, scales="free") + theme_bw() +
xlab("Year") + ylab(bquote('a')) +
scale_x_continuous(breaks=seq(1960, 2020, 15))
# plot 2
a <-ggplot(df, aes(x=year, y=b))
b <-a + geom_point(alpha=0.7, color="green") + geom_smooth(method="auto", color="darkgreen") +
facet_wrap(~country, scales="free") + theme_bw() +
xlab("Year") + ylab(bquote('b')) +
scale_x_continuous(breaks=seq(1960, 2020, 15))
I expect to be able to overlay these two ggplots into a single set of panels, with both y-axes appearing exactly as they appear when they're plotted alone (including units). I would then need to somehow make one of the y-axis appear to the right of the panels, so I have two y-axes, one at each side.
Image 1. ggplot rescales left y-axis. I don't want this to happen.
Image 2. What I want instead is to be able to somehow merge each of these images to get a single panel per country, displaying both the green and the blue lines with the scales that appear here.
I'm trying to construct a 5 x 6 matrix of plots in R using ggplot2 and gridExtra. For simplicity, I can show my issue with a 2 x 2 matrix and some fake data.
#Load libraries
library(ggplot2); library(gridExtra)
#Data
data = rbind(data.frame(x=rnorm(100,0,1),ALP='A',NUM=1),data.frame(x=rnorm(100,20000,1000),ALP='A',NUM=2),data.frame(x=rnorm(100,100,10),ALP='B',NUM=1),data.frame(x=rnorm(5000,1000),ALP='B',NUM=2))
#Ggplot2 facet_grid
ggplot(data,aes(x=x,y=..scaled..,fill='red')) + geom_density() + facet_grid(ALP~NUM,scales='free') + guides(fill=FALSE)
The result doesn't look good, as the x-scale is so different across the faceting labels. I tried to do it manually with gridExtra.
#Assemble grobs
plt1 = ggplot(subset(data,ALP=='A'&NUM==1),aes(x=x,y=..scaled..,fill=ALP)) + geom_density() + facet_grid(.~NUM,scales='free') + guides(fill=FALSE) + theme(axis.title.x=element_blank(),axis.title.y=element_blank())
plt2 = ggplot(subset(data,ALP=='A'&NUM==2),aes(x=x,y=..scaled..,fill=ALP)) + geom_density() + facet_grid(ALP~NUM,scales='free') + guides(fill=FALSE) + theme(axis.text.y=element_blank(),axis.ticks.y=element_blank(),axis.title.y=element_blank(),axis.title.x=element_blank())
plt3 = ggplot(subset(data,ALP=='B'&NUM==1),aes(x=x,y=..scaled..,fill=ALP)) + geom_density() + guides(fill=FALSE) + theme(axis.title.x=element_blank(),axis.title.y=element_blank())
plt4 = ggplot(subset(data,ALP=='B'&NUM==2),aes(x=x,y=..scaled..,fill=ALP)) + geom_density() + facet_grid(ALP~.,scales='free') + guides(fill=FALSE) + theme(axis.text.y=element_blank(),axis.ticks.y=element_blank(),axis.title.y=element_blank(),axis.title.x=element_blank())
#Plot it out
grid.arrange(plt1,plt2,plt3,plt4,nrow=2,ncol=2,left=textGrob("scaled",rot=90,vjust=1),bottom=textGrob("x"))
I'm almost there, unfortunately the plotting panel (x,y) in the upper, right-hand corner is smaller than all the rest. Similarly, the plotting panel (x,y) in the lower, left-hand corner is bigger than all the rest. I would like all of the plotting panels (x,y) to be the same height/width. I found some code using gtable, but it only seems to work consistently when the grobs don't have facet labels. The effect is even more exaggerated when the number of rows/columns increases.
as an alternative to facetting, you could work with gtable,
plt <- lapply(list(plt1,plt2, plt3,plt4), ggplotGrob)
left <- rbind(plt[[1]], plt[[3]])
right <- rbind(plt[[2]], plt[[4]])
all <- cbind(left, right)
grid.newpage()
grid.draw(all)
the panel sizes should all be equal (1null) with this layout.
I am trying to change the style settings of this kind of chart and hope you can help me.
R code:
set_theme(theme_bw)
cglac$pred2<-as.factor(cglac$pred)
ggplot(cglac, aes(x=depth, colour=pred2))
+ geom_bar(aes(y=..density..),binwidth=3, alpha=.5, position="stack")
+ geom_density(alpha=.2)
+ xlab("Depth (m)")
+ ylab("Counts & Density")
+ coord_flip()
+ scale_x_reverse()
+ theme_bw()
which produces this graph:
Here some points:
What I want is to have the density line as black and white lines separated by symbols rather than colour (dashed line, dotted line etc).
The other thing is the histogram itself. How do I get rid of the grey background in the bars?
Can I change the bars also to black and white symbol lines (shaded etc)? So that they would match the density lines?
Last but not least I want to add a second x or in this case y axis, because of flip_coord(). The one I see right now is for the density. The other one I need would then be the count data from the pred2 variable.
Thanks for helping.
Best,
Moritz
Have different line types: inside aes(), put linetype = pred2. To make the line color black, inside geom_density, add an argument color = "black".
The "background" of the bars is called "fill". Inside geom_bar, you can set fill = NA for no fill. A more common approach is to fill in the bars with the colors, inside aes() specify fill = pred2. You might consider faceting by your variable, + facet_wrap(~ pred2, nrow = 1) might look very nice.
Shaded bars in ggplot? No, you can't do that easily. See the answers to this question for other options and hacks.
Second y-axis, similar to the shaded symbol lines, the ggplot creator thinks a second y-axis is a terrible design choice, so you can't do it at all easily. Here's a related question, including Hadley's point of view:
I believe plots with separate y scales (not y-scales that are transformations of each other) are fundamentally flawed.
It's definitely worth considering his point of view, and asking yourself if those design choices are really what you want.
Different linetypes for densities
Here's my built-in data version of what you're trying to do:
ggplot(mtcars, aes(x = hp,
linetype = cyl,
group = cyl,
color = cyl)) +
geom_histogram(aes(y=..density.., fill = cyl),
alpha=.5, position="stack") +
geom_density(color = "black") +
coord_flip() +
theme_bw()
And what I think you should do instead. This version uses facets instead of stacking/colors/linetypes. You seem to be aiming for black and white, which isn't a problem at all in this version.
ggplot(mtcars, aes(x = hp,
group = cyl)) +
geom_histogram(aes(y=..density..),
alpha=.5) +
geom_density() +
facet_wrap(~ cyl, nrow = 1) +
coord_flip() +
theme_bw()
I am new to ggplot, and using ggplot to show box plots of my data corresponding to different types like this. There are four types. I found that I can use facet_wrap to generate four different graphs.
ggplot(o.xp.sample, aes(power, reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar')+
geom_boxplot() +
facet_wrap(~type)
My question is, I want to combine all the four graphs into one graph such that each type has a different color (and slightly transparent to show other plots through). Is this possible?
Here is the data https://gist.github.com/anonymous/9589729
Try this:
library(ggplot2)
o.xp.sample = read.csv("C:\\...\\data.csv",sep=",")
ggplot(o.xp.sample, aes(factor(power), reduction, fill=interaction(type,power), dodge=type)) +
stat_boxplot(geom ='errorbar') +
geom_boxplot() +
theme_bw() +
guides(fill = guide_legend(ncol = 3)) #added line as suggested by Paulo Cardoso