I am trying to produce a bar graph that has multiple groupings of factors. An example from excel of what I am attempting to create, subgrouped by Variety and Irrigation treatment:
I know I could produce multiple graphs using facet_wrap(), but I would like to produce multiple graphs for this same type of data for multiple years of similar data. An example of the data I used in this example:
Year Trt Variety geno yield SE
2010-2011 Irr Variety.2 1 6807 647
2010-2011 Irr Variety.2 2 5901 761
2010-2011 Irr Variety.1 1 6330 731
2010-2011 Irr Variety.1 2 5090 421
2010-2011 Dry Variety.2 1 3953 643
2010-2011 Dry Variety.2 2 3438 683
2010-2011 Dry Variety.1 1 3815 605
2010-2011 Dry Variety.1 2 3326 584
Is there a way to create multiple groupings in ggplot2? I have searched for quite some time and have yet to see an example of something like the example graph above.
Thanks for any help you may have!
This may be a start.
dodge <- position_dodge(width = 0.9)
ggplot(df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2)
Update: labelling of x axis
I have added:
coord_cartesian, to set limits of y axis, mainly the lower limit to avoid the default expansion of the axis.
annotate, to add the desired labels. I have hard-coded the x positions, which I find OK in this fairly simple example.
theme_classic, to remove the gray background and the grid.
theme, increase lower plot margin to have room for the two-row label, remove default labels.
Last set of code: Because the text is added below the x-axis, it 'disappears' outside the plot area, and we need to remove the 'clipping'. That's it!
library(grid)
g1 <- ggplot(data = df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2) +
coord_cartesian(ylim = c(0, 7500)) +
annotate("text", x = 1:4, y = - 400,
label = rep(c("Variety 1", "Variety 2"), 2)) +
annotate("text", c(1.5, 3.5), y = - 800, label = c("Irrigated", "Dry")) +
theme_classic() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank())
# remove clipping of x axis labels
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)
Related
I have a dataframe df1 as below.
df1<-data.frame(Hour=c(0,3,6,9,12,15,18,21),
n=c(18426,19345,20123,21450,23456,23510,21453,18456),
mean=c(23.9234, 34.9456,28.9891,44.6452,47.4567,38.9483,34.9632,29.8765),
ci=c(4.2345,6.3345,12.1345,17.3445,13.1545,12.1745,10.1945,28.2445))
df1$Hour<-as.factor(df1$Hour)
df1
Hour n mean ci
1 0 18426 23.9234 4.2345
2 3 19345 34.9456 6.3345
3 6 20123 28.9891 12.1345
4 9 21450 44.6452 17.3445
5 12 23456 47.4567 13.1545
6 15 23510 38.9483 12.1745
7 18 21453 34.9632 10.1945
8 21 18456 29.8765 28.2445
I created the plot that I show below after some search on the internet, with one x-axis and two y-axis. The left y-axis showing the variable n and the right y-axis showing the variable mean. The "orange" line represents the confidence interval (ci) for the mean (right y-axis).
ggplot() +
geom_bar(mapping = aes(x = df1$Hour, y = df1$n), stat = "identity", fill = "grey") +
geom_line(mapping = aes(x = df1$Hour, y = df1$mean*494.2611, group=1 ), size = 1, color = "blue") + # 525.3868 result from the division of 23456/44.6452
geom_point(mapping = aes(x = df1$Hour, y = df1$mean*494.2611 )) +
scale_y_continuous(name = "n",
sec.axis = sec_axis(~./494.2611, name = "Mean")) +
theme(
axis.title.y = element_text(color = "darkgrey"),
axis.title.y.right = element_text(color = "blue")) +
labs(x = "Hour") +
geom_errorbar(mapping= aes(x = df1$Hour, ymin = df1$mean*494.2611 - df1$ci, ymax = df1$mean*494.2611 + df1$ci),
position = position_dodge(0.9),
width = 0.4, colour = "orange",
alpha = 0.9, size = 0.5)
The problem is that the errorbars appear too small compare to the scale of the variable mean, and hence all look the same way. I think that the errorbars are scaled to the left y-axis.
Does anyone know where is the mistake?
You forgot your parentheses. Instead of
ymin = df1$mean*494.2611 - df1$ci
use
ymin = (df1$mean - df1$ci) * 494.2611
(same for ymax)
Explanation: dual scales in ggplot is a hack. You have to manually rescale the data to fit the primary scale. For error bars, you need ymin and ymax, and these are the values that you need to scale. But ymin=df1$mean - df1$ci, so you need to multiply the whole value, not just the mean.
I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!
I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()
Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.
I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)
I have the following stripped-down version of my real use-case where I create a donuts plot and would like to have those long labels fitting inside the whole plot but outside the donuts:
library(ggplot2)
df <- data.frame(group = c("Cars", "Trucks", "Motorbikes"),n = c(25, 25, 50),
label2=c("Cars are blah blah blah", "Trucks some of the best in town", "Motorbikes are great if you ..."))
df$ymax = cumsum(df$n)
df$ymin = cumsum(df$n)-df$n
df$ypos = df$ymin+df$n/2
df$hjust = c(0,0,1)
df
#> df
# group n label2 ymax ymin ypos hjust
#1 Cars 25 Cars are blah blah blah 25 0 12.5 0
#2 Trucks 25 Trucks some of the best in town 50 25 37.5 0
#3 Motorbikes 50 Motorbikes are great if you ... 100 50 75.0 1
bp <- ggplot(df, aes(x="", y=n, fill=group)) + geom_rect(aes_string(ymax="ymax", ymin="ymin", xmax="2.5", xmin="2.0")) +
theme(axis.title=element_blank(),axis.text=element_blank(),panel.background = element_rect(fill = "white", colour = "grey50"),
panel.grid=element_blank(),axis.ticks.length=unit(0,"cm"),axis.ticks.margin=unit(0,"cm"),
legend.position="none",panel.spacing=unit(0,"lines"),plot.margin=unit(c(0,0,0,0),"lines"),complete=TRUE) + geom_text(ggplot2::aes_string(label="label2",x="2.6",y="ypos",hjust="hjust"))
pie <- bp + coord_polar("y", start=0) + scale_x_discrete(labels=df$label2)
pie
but no matter what I do this is what I get:
I have tried lowering the xmax="2.5" in an attempt to decrease the size of the Donuts plot and updating the x of the custom text accordingly e.g. +0.1 x="2.6". However, this leads somehow to the whole plot re-scaling back to the same situation.
I have played with various plot sizes and margins but can't find the way ...
If I understand correctly, you wish to shrink the donut relative to the entire plot area, so that there's more space to fit long annotations? I got the following to work based on the sample data. You may wish to tweak the parameters for your actual use case:
library(dplyr); library(stringr)
ggplot(df %>%
mutate(label2 = str_wrap(label2, width = 10)), #change width to adjust width of annotations
aes(x="", y=n, fill=group)) +
geom_rect(aes_string(ymax="ymax", ymin="ymin", xmax="2.5", xmin="2.0")) +
expand_limits(x = c(2, 4)) + #change x-axis range limits here
# no change to theme
theme(axis.title=element_blank(),axis.text=element_blank(),
panel.background = element_rect(fill = "white", colour = "grey50"),
panel.grid=element_blank(),
axis.ticks.length=unit(0,"cm"),axis.ticks.margin=unit(0,"cm"),
legend.position="none",panel.spacing=unit(0,"lines"),
plot.margin=unit(c(0,0,0,0),"lines"),complete=TRUE) +
geom_text(aes_string(label="label2",x="3",y="ypos",hjust="hjust")) +
coord_polar("y", start=0) +
scale_x_discrete()
this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.
How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)
I have a data set where I need to represent a stacked bar chart for two cohorts over three time periods. Currently, I am faceting by year, and filling based on probability values for my DV (# of times,t, that someone goes to a nursing home; pr that t=0, t=1, ... t >= 5). I am trying to figure out if it is possible to introduce another color scale, so that each of the "Comparison" bars would be filled with a yellow gradient, and the treatmetn bars would be filled with a blue gradient. I figure the best way to do this may to be to overlay the two plots, but I'm not sure if it is possible to do this in ggplot (or some other package.) Code and screenshot are below:
tempPlot <- ggplot(tempDF,aes(x = HBPCI, y = margin, fill=factor(prob))) +
scale_x_continuous(breaks=c(0,1), labels=c("Comparison", "Treatment"))+
scale_y_continuous(labels = percent_format())+
ylab("Prob snf= x")+
xlab("Program Year")+
ggtitle(tempFlag)+
geom_bar(stat="identity")+
scale_fill_brewer(palette = "Blues")+ #can change the color scheme here.
theme(axis.title.y =element_text(vjust=1.5, size=11))+
theme(axis.title.x =element_text(vjust=0.1, size=11))+
theme(axis.text.x = element_text(size=10,angle=-45,hjust=.5,vjust=.5))+
theme(axis.text.y = element_text(size=10,angle=0,hjust=1,vjust=0))+
facet_grid(~yearQual, scales="fixed")
You may want to consider using interaction() -- here's a reproducible solution:
year <- c("BP", "PY1", "PY2")
type <- c("comparison", "treatment")
df <- data.frame(year = sample(year, 100, T),
type = sample(type, 100, T),
marg = abs(rnorm(100)),
fact = sample(1:5, 100, T))
head(df)
# year type marg fact
# 1 BP comparison 0.2794279 3
# 2 PY2 comparison 1.6776371 1
# 3 BP comparison 0.8301721 2
# 4 PY1 treatment 0.6900511 1
# 5 PY2 comparison 0.6857421 3
# 6 PY1 treatment 1.4835672 3
library(ggplot2)
blues <- RColorBrewer::brewer.pal(5, "Blues")
oranges <- RColorBrewer::brewer.pal(5, "Oranges")
ggplot(df, aes(x = type, y = marg, fill = interaction(factor(fact), type))) +
geom_bar(stat = "identity") +
facet_wrap(~ year) +
scale_fill_manual(values = c(blues, oranges))