Implementing paired lines into boxplot.ggplot2 - r

I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!

I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()

Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.

I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)

Related

Reordering data in ggplot creates mismatch between data and bars

I have the following data based off responses to a survey:
1 ABSENCE OF BULLYING 0.665
2 SENSE OF SAFETY 0.614
3 FAIRNESS OF DISCIPLINE 0.677
4 FEELINGS OF EMOTIONAL SAFETY 0.585
5 PERCEPTION OF EQUITABLE TREATMENT 0.691
6 COMFORT OF PHYSICAL ENVIRONMENT 0.509
I want to create a bar graph graph with this data and to cut down the excess text in the label, so I did the following:
fig4 = read_excel(data)
order = c("bullying", "safety", "discipline", "emotional", "equity", "environment")
fig4$domain = as.factor(order)
levels(fig4$domain) = order
ggplot(fig4, aes(x = reorder(domain, desc(domain)), y = pct_responses)) +
geom_bar(position = "dodge", stat="identity", width=0.5, fill="#3bbae0") +
vertical_theme +
labs(y = "% of Affirmative Responses", x = "") +
scale_y_continuous(expand = c(0,0),
limits = c(0,1),
breaks = seq(0,1,0.2),
labels = function(x) paste0(x*100, "%")) +
coord_flip() +
geom_text(aes(label= paste0(round(pct_responses*100),"%")), position = position_dodge(width = 0.5), size=scale_factor * 6, hjust=-0.25)
However, this created a graph where the bars don't match with the right axis label. Rows 2, 3, and 6 have the wrong numbers.
I fixed the problem by using this code instead:
fig4$domain = c("bullying", "safety", "discipline", "emotional", "equity", "environment")
fig4$domain = factor(fig4$domain, levels = fig4$domain)
This solved the issue. However, I'm not sure why the first way I did it messed up my graph. Can someone please explain what happened?

Restrict stat_ellipse to certain data points

I have a dataset with four categories of vowel, akin to the following:
speaker vowel_category f1 f2
1 a x x
1 b x x
1 c x x
1 d x x
2 a x x
2 b x x...
This geom_point code plots them all one one graph with stat_ellipse and is 90% what I need:
ggplot(data = topicsubset_ikf, aes(x = F2, y = F1, shape = CATEGORY)) +
geom_point() +
scale_y_reverse() +
scale_x_reverse() +
xlab("F2") +
ylab("F1") +
labs(title = "All speakers with KIT and FLEECE tokens") +
coord_cartesian(xlim = c(1.9, 1.1), ylim = c(0.3, 1.5)) +
facet_wrap(~ SPEAKER) +
scale_color_manual(values = c("#000000", "#FF8F00", "#000000", "#A200FF")) +
stat_ellipse(geom = "polygon", alpha = 1 / 2, aes(fill = CATEGORY))
However, it would be ideal if I could draw ellipses round just two of the four categories (say, a and b), rather than all 4, so I can look at the spread of c & d relative to a and b. I haven't been able to find a way so far - I've tried combining multiple datasets on one graph to no avail. Any suggestions?
I had the same problem, I found that now you can just specify the group in stat_ellipse like this:
stat_ellipse(geom = "polygon", alpha = 1 / 2, aes(fill = CATEGORY, group = CATEGORY))

R, ggplot: Change linetype within a series

I am using ggplot geom_smooth to plot turnover data of a customer group from previous year against the current year (based on calendar weeks). As the last week is not complete, I would like to use a dashed linetype for the last week. However, I can't figure out how to that. I can either change the linetype for the entire plot or an entire series, but not within a series (depending on the value of x):
To keep it simple, let's just use the following example:
set.seed(42)
frame <- data.frame(series = rep(c('a','b'),50),x = 1:100, y = runif(100))
ggplot(frame,aes(x = x,y = y, group = series, color=series)) +
geom_smooth(size=1.5, se=FALSE)
How would I have to change this to get dashed lines for x >= 75?
The goal would be something like this:
Thx very much for any help!
Edit, 2016-03-05
Of course I fail when trying to use this method on the original plot. The Problem lies with the ribbon, which is calculated using stat_summary and a predefined function. I tried to use use stat_summary on the original data (mdf), and geom_line on the smooth_data. Even when I comment out everything else, I still get "Error: Continuous value supplied to discrete scale". I believe the problem comes from the fact that the original x value (Kalenderwoche) was discrete, whereas the new, smoothed x is continuous. Do I have to somehow transform one into the other? What else could I do?
Here is what I tried (condensed to the essential lines):
quartiles <- function(x) {
x <- na.omit(x) # remove NULL
median <- median(x)
q1 <- quantile(x,0.25)
q3 <- quantile(x,0.75)
data.frame(y = median, ymin = median, ymax = q3)
}
g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)
# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
geom_line(data=smooth_data, aes(x=x, y=y, group=group, colour=group, fill=group))
mdf looks like this:
str(mdf)
'data.frame': 280086 obs. of 5 variables:
$ konto_id : int 1 1 1 1 1 1 1 1 1 1 ...
$ Kalenderwoche: Factor w/ 14 levels "2015-48","2015-49",..: 4 12 1 3 7 13 10 6 5 9 ...
$ variable : Factor w/ 2 levels "Umsatz","Umsatz Vorjahr": 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 428.3 97.8 76 793.1 ...
There are many accounts (konto_id), and for each account and calendar week (Kalenderwoche), there is a current turnover value (Umsatz) and a turnover value from last year (Umsatz Vorjahr). I can provide a smaller version of the data.frame and the entire code, if required.
Thx very much for any help!
P.S. I am a total novice in R, so my code probably looks rather stupid to pros, sorry for that :(
Edit, 2016-03-06
I have uploaded a subset of the data (mdf):
mdf
The full code of the original graph is the following (looking somewhat weird with so little data, but that's not the point ;)
library(dtw)
library(reshape2)
library(ggplot2)
library(RODBC)
library(Cairo)
# custom breaks for X axis
breaks.custom <- unique(mdf$Kalenderwoche)[c(TRUE,rep(FALSE,0))]
# function called by stat_summary
quartiles <- function(x) {
x <- na.omit(x)
median <- median(x)
q1 <- quantile(x,0.25)
q3 <- quantile(x,0.75)
data.frame(y = median, ymin = median, ymax = q3)
}
# Positions for guidelines and labels
horizontal.center <- (length(unique(mdf$Kalenderwoche))+1)/2
kw.horizontal.center <- as.vector(sort(unique(mdf$Kalenderwoche))[c(horizontal.center-0.5,horizontal.center+0.5)])
vpos.P75.label <- max(quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]],0.75)
,quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]],0.75))+10
# use the higher P75 value of the two weeks around the center
vpos.mean.label <- min(mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
,mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
vpos.median.label <- min(median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
,median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
hpos.vline <- which(as.vector(sort(unique(mdf$Kalenderwoche))=="2016-03"))
# custom colour palette (2 colors)
cbPaletteLine <- c("#DA2626", "#2626DA")
cbPaletteFill <- c("#F0A8A8", "#7C7CE9")
# ggplot
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)+
# SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
scale_x_discrete(breaks=breaks.custom)+
scale_colour_manual(values=cbPaletteLine)+
scale_fill_manual(values=cbPaletteFill)+
#coord_cartesian(ylim = c(0, 250)) +
theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
#scale_color_brewer(palette="Dark2")+
labs(title = "Tranche 1", x = "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
geom_vline(xintercept = hpos.vline, linetype=2)+
annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+
annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))
Edit, 2016-03-06
The final plot now looks like this (thx, Jason!!)
I am not so sure how to smooth all data and use different line types for subsets by geom_smooth function. My idea is to pull out the data which ggplot used to construct the plot and use geom_line to reproduce it. This was the way I did it:
set.seed(42)
frame <- data.frame(series=rep(c('a','b'), 50),
x = 1:100, y = runif(100))
library(ggplot2)
g <- ggplot(frame, aes(x=x, y=y, color=series)) + geom_smooth(se=FALSE)
# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]
ggplot(smooth_data[smooth_data$x <= 76, ], aes(x=x, y=y, color=as.factor(group), group=group)) +
geom_line(size=1.5) +
geom_line(data=smooth_data[smooth_data$x >= 74, ], linetype="dashed", size=1.5) +
scale_color_discrete("Series", breaks=c("1", "2"), labels=c("a", "b"))
You're right. The problem is that you add a continuous x to a discrete x in the original layer. One way to deal with it is to create a lookup table which in this case, it is easy because x is a sequence from 1 to 14. We can transform discrete x by indexing. In your code, it should work if you add:
level <- levels(mdf$Kalenderwoche)
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25) +
geom_line(data=smooth_data, aes(x=level[x], y=y, group=group, colour=as.factor(group), fill=NA))
Here is my attempt for the question:
g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable)) +
geom_smooth(size=1.5, method="auto", se=FALSE) +
# SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)
smooth_data <- ggplot_build(g)$data[[1]]
ribbon_data <- ggplot_build(g)$data[[2]]
# Use them as lookup table
level <- levels(mdf$Kalenderwoche)
clevel <- levels(mdf$variable)
ggplot(smooth_data[smooth_data$x <= 13, ], aes(x=level[x], y=y, group=group, color=as.factor(clevel[group]))) +
geom_line(size=1.5) +
geom_line(data=smooth_data[smooth_data$x >= 13, ], linetype="dashed", size=1.5) +
geom_ribbon(data=ribbon_data,
aes(x=x, ymin=ymin, ymax=ymax, fill=as.factor(clevel[group]), color=NA), alpha=0.25) +
scale_x_discrete(breaks=breaks.custom) +
scale_colour_manual(values=cbPaletteLine) +
scale_fill_manual(values=cbPaletteFill) +
#coord_cartesian(ylim = c(0, 250)) +
theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
#scale_color_brewer(palette="Dark2")+
labs(title = "Tranche 1", x = "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
geom_vline(xintercept = hpos.vline, linetype=2)+
annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+
annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))
Note that the legend has borderline.

ggplot2 plot two data sets into one picture

this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.
How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)

geom_lines not linking what they should with error bars plot in ggplot

I have the following dataset ready to plot an error bars and lines graph
> growth
treatment class variable N value sd se ci
1 elevated Dominant RBAI2012 18 0.014127713 0.009739951 0.002295728 0.004843564
2 elevated Dominant RBAI2013 18 0.021869978 0.013578741 0.003200540 0.006752549
3 elevated Codominant RBAI2012 40 0.011564725 0.013718591 0.002169100 0.004387418
4 elevated Codominant RBAI2013 41 0.011471512 0.011091167 0.001732149 0.003500804
5 elevated Subordinate RBAI2012 24 0.004419784 0.009286883 0.001895677 0.003921507
6 elevated Subordinate RBAI2013 24 0.004397105 0.008704831 0.001776866 0.003675728
7 ambient Dominant RBAI2012 13 0.025836265 0.011880315 0.003295007 0.007179203
8 ambient Dominant RBAI2013 13 0.025992636 0.015162901 0.004205432 0.009162850
9 ambient Codominant RBAI2012 26 0.018067329 0.011830940 0.002320238 0.004778620
10 ambient Codominant RBAI2013 26 0.015595275 0.012467140 0.002445007 0.005035587
11 ambient Subordinate RBAI2012 33 0.006073904 0.008287442 0.001442658 0.002938599
12 ambient Subordinate RBAI2013 35 0.003239033 0.006846507 0.001157271 0.002351857
I've tried the following code, resulting this plot:
p <- ggplot(growth,aes(class,value,colour=treatment,group=variable))
pd<-position_dodge(.9)
# se= standard error; ci=confidence interval
p + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") + geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1))
The lines should link the points of their same color within each x-axis category, but clearly they don't. Please, could you help me draw the lines properly (e.g blue with blue and red with red within "Dominant" class, different lines for "codominant" class.
Also, do you know how to include in the x-labels the variables I am grouping with (i.e. "RBAI2012","RBAI2013"?
Many thanks
To distinguish also between different of levels of 'variable' you may introduce a fourth aesstetic: shape. First define a new grouping variable, a combination of 'treatment' and 'variable', which has four levels. Map group, colours and shape to this variable. Then use scale_colour_manual and scale_shape_manual to set two levels of colours, which corresponds to the two levels of 'treatment'. Similarly, define two 'variable' shapes.
growth$grp <- paste0(growth$treatment, growth$variable)
ggplot(data = growth, aes(x = class, y = value, group = grp,
colour = grp, shape = grp)) +
geom_point(size = 4, position = pd) +
geom_line(position = pd) +
geom_errorbar(aes(ymin = value - se, ymax = value + se), colour = "black",
position = pd, width = 0.1) +
scale_colour_manual(name = "Treatment:Variable",
values = c("red", "red","blue", "blue")) +
scale_shape_manual(name = "Treatment:Variable",
values = c(19, 17, 19, 17))
theme_bw() +
theme(legend.position = c(1,1), legend.justification = c(1,1))
One option is using a facet plot like so:
p <- ggplot(growth, aes(x = class, y = value, group = treatment, color = treatment))
p + geom_point(size = 4) + facet_grid(. ~ variable) + geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,colour="black") + geom_line()
If you want it on one graph, another option is defining a new variable that combines treatment and variable:
growth$treatment_variable <- paste(growth$treatment, growth$variable)
p <- ggplot(growth, aes(x = class, y = value, group = treatment_variable, colour = treatment_variable))
pd<-position_dodge(.2)
p + geom_point(size = 4, position=pd) + geom_errorbar(aes(ymin=value-se, ymax=value+se), width=.1, position=pd, colour="black") + geom_line(position=pd)
You have too many grouping variables (variable and treatment) and including them in a single plot may be a bit confusing. You might want to use faceting, like this:
p <- ggplot(growth,aes(class,value,colour=treatment,group=treatment))
pd<-position_dodge(.9)
p +
geom_errorbar(aes(ymin=value-se,ymax=value+se),width=.1,position=pd,colour="black") +
geom_point(position=pd,size=4) + geom_line(position=pd) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
facet_grid(variable~treatment)
It is possible to do this, but you need to hack it since you're essentially plotting a geom_line() on different groupings (variable + treatment) than with the geom_point() and geom_errorbar() calls.
You need to use ggplot_build() to get back the rendered data and draw a geom_line(), based on the existing points data, grouped by colour:
p <- ggplot(growth) # move the aes() into the individual charts
pd<-position_dodge(.9) # leave dodge as is
se<-0.01 # faked this
p <- p +
geom_point(aes(x=factor(class),y=value,colour=treatment,group=variable),position=pd,size=4) +
theme_bw() + theme(legend.position=c(1,1),legend.justification=c(1,1)) +
geom_errorbar(aes(x=factor(class),ymin=value-se,ymax=value+se,colour=treatment,group=variable),position=pd,width=.1,colour="black")
b<-ggplot_build(p)$data[[1]] # get the ggpolt rendered data for this panel
p + geom_line(data=b,aes(x,y,group=colour), color=b$colour) # plot the lines

Resources