ggplot2 plot two data sets into one picture - r

this must be a FAQ, but I can't find an exactly similar example in the other answers (feel free to close this if you can point a similar Q&A). I'm still a newbie with ggplot2 and can't seem to wrap my head around it quite so easily.
I have 2 data.frames (that come from separate mixed models) and I'm trying to plot them both into the same graph. The data.frames are:
newdat
id Type pred SE
1 1 15.11285 0.6966029
2 1 13.68750 0.9756909
3 1 13.87565 0.6140860
4 1 14.61304 0.6187750
5 1 16.33315 0.6140860
6 1 16.19740 0.6140860
1 2 14.88805 0.6966029
2 2 13.46270 0.9756909
3 2 13.65085 0.6140860
4 2 14.38824 0.6187750
5 2 16.10835 0.6140860
6 2 15.97260 0.6140860
and
newdat2
id pred SE
1 14.98300 0.6960460
2 13.25893 0.9872502
3 13.67650 0.6150701
4 14.39590 0.6178266
5 16.37662 0.6171588
6 16.08426 0.6152017
As you can see, the second data.frame doesn't have Type, whereas the first does, and therefore has 2 values for each id.
What I can do with ggplot, is plot either one, like this:
fig1
fig2
As you can see, in fig 1 ids are stacked by Type on the x-axis to form two groups of 6 ids. However, in fig 2 there is no Type, but instead just the 6 ids.
What I would like to accomplish is to plot fig2 to the left/right of fig1 with similar grouping. So the resulting plot would look like fig 1 but with 3 groups of 6 ids.
The problem is also, that I need to label and organize the resulting figure so that for newdat the x-axis would include a label for "model1" and for newdat2 a label for "model2", or some similar indicator that they are from different models. And to make things even worse, I need some labels for Type in newdat.
My (hopefully) reproducible (but obviously very bad) code for fig 1:
library(ggplot2)
pd <- position_dodge(width=0.6)
ggplot(newdat,aes(x=Type,y=newdat$pred,colour=id))+
geom_point(position=pd, size=5)
geom_linerange(aes(ymin=newdat$pred-1.96*SE,ymax=newdat$pred+1.96*SE), position=pd, size=1.5, linetype=1) +
theme_bw() +
scale_colour_grey(start = 0, end = .8, name="id") +
coord_cartesian(ylim=c(11, 18)) +
scale_y_continuous(breaks=seq(10, 20, 1)) +
scale_x_discrete(name="Type", limits=c("1","2"))
Code for fig 2 is identical, but without the limits in the last line and with id defined for x-axis in ggplot(aes())
As I understand it, defining stuff at ggplot() makes that stuff "standard" along the whole graph, and I've tried to remove the common stuff and separately define geom_point and geom_linerange for both newdat and newdat2, but no luck so far... Any help is much appreciated, as I'm completely stuck.

How about adding first adding some new variables to each dataset and then combining them:
newdat$model <- "model1"
newdat2$model <- "model2"
newdat2$Type <- 3
df <- rbind(newdat, newdat2)
# head(df)
Then we can plot with:
library(ggplot2)
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5, linetype = 1)
Alternatively, you pass an additional aesthetic to geom_linerange to further delineate the model type:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE, linetype = model),
position = position_dodge(width = 0.6),
size = 1.5)
Finally, you may want to considered facets:
ggplot(df, aes(x = interaction(model, factor(Type)), y = pred, color = factor(id))) +
geom_point(position = position_dodge(width = 0.6), size = 5) +
geom_linerange(aes(ymin = pred - 1.96 * SE, ymax = pred + 1.96 * SE),
position = position_dodge(width = 0.6),
size = 1.5) +
facet_wrap(~ id)

Related

How to smooth line plot and extend graph beyond data --> recreating jamovi plot

I'm currently trying to recreate an estimated marginal means plot I created using the stats software Jamovi. Jamovi uses ggplot2 to create all its graphs and it spit out this one for me:
The problem is that the journal I'm trying to submit my manuscript to does not publish colored images and I will need to recreate the graphs in black and white (jamovi offers a grey color palette but it's hard to differentiate between the greys).
I managed to use ggplot2 in R to recreate this so far
with the following code:
ggplot(data, aes(x = xvar, y = yvar)) +
geom_line(aes(linetype = class), size = 1, show.legend = FALSE) +
facet_grid() +
geom_point(aes(shape = class), size = 2.5) +
theme(axis.text = element_text(size = 16, colour = "black"), axis.title = element_text(size=16),
title = element_text(size = 14), legend.text = element_text(size = 14)) +
labs(x = "", y = "", title = "") +
xlim(c(2,4)) + ylim(c(0,1)) +
stat_smooth(aes(x = xvar, y = yvar), method="lm", formula = y ~ x, se = FALSE, fullrange = TRUE)
I'm trying to smooth out the lines between the data points like in the first graph and also have the lines extend out beyond the current range it has (2.70 - 3.7) but I don't know how to do it in ggplot. It shouldn't be a data issue because I created the data using the data frame that jamovi's own R package gave me (which is the same data frame it uses to create its plots in ggplot).
I tried following the directions listed in this question but it only gives me that blue line you see in the second graph when I use stat_smooth. When I tried to follow the instructions precisely and used this code:
ggplot(data) +
geom_plot(aes(x = xvar, y = yvar), size = 2) +
stat_smooth(aes(x = xvar, y = yvar), method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
coord_cartesian(ylim = c(0,1))
I instead got the following error message:
Error in geom_plot(aes(x = xvar, y = yvar), size = 2) :
could not find function "geom_plot"
My three questions are thus as follows:
1) How do I smooth out the lines between the data points?
2) How do I extend the lines out to the edge of the x-axes (from say, 1 to 5) like in the first graph?
3) Is there a way to anti-alias the lines like in the first graph?
Here's some sample data (not the data I used to create the plots but similar in the same format as what was given to me by jamovi where "class" is the grouping variable),
xvar class yvar
2 1 0.25
2 2 0.3
2 3 0.2
2 4 0.13
3 1 0.15
3 2 0.35
3 3 0.18
3 4 0.24
4 1 0.1
4 2 0.45
4 3 0.14
4 4 0.27
I managed to figure it out after a few trial and error. I managed to fix it by putting in the following code:
geom_smooth(aes(group = class, linetype = class), colour = "black", method="lm", formula = y ~ poly(x,2), se=FALSE, fullrange=TRUE)

How to Add Extra Labels on y-axis without Data in ggplot2

I am making a plot showing two sets of regression coefficients and standard errors and the graph is as follow:
What I want to do further is to add extra variables without any data on the y-axis. For instance, put a label FeatGender on top of the label FeatGenderMale, or for another example, put a label FeatEU in between the label of FeatPartyIDLiberal Democrats and the label of FeatEUIntegrationSupportEUIntegration. Below is the reduced version of data:
coef se low high sex
1 -0.038848364 0.02104994 -0.080106243 0.002409514 Female
2 0.095831201 0.02793333 0.041081877 0.150580526 Female
3 0.050972670 0.02828353 -0.004463052 0.106408391 Female
4 -0.183558492 0.02454943 -0.231675377 -0.135441606 Female
5 0.044879447 0.02712518 -0.008285914 0.098044808 Female
6 -0.003858672 0.03005477 -0.062766024 0.055048681 Male
7 0.003048763 0.04687573 -0.088827676 0.094925203 Male
8 0.015343897 0.03948959 -0.062055700 0.092743494 Male
9 -0.132600259 0.04146323 -0.213868197 -0.051332322 Male
10 -0.029764559 0.04600719 -0.119938650 0.060409533 Male
Here are my codes:
v_name <- c("FeatGenderMale", "FeatPartyIDLabourParty", "FeatPartyIDLiberalDemocrats",
"FeatEUIntegrationOpposeEUIntegration", "FeatEUIntegrationSupportEUIntegration")
t <- ggplot(temp, aes(x=c(v_name,v_name), y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
scale_x_discrete(limits = rev(v_name)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")
Thanks for the help!
Here's an approach that first applies the v_name into the source data frame, but then uses a longer appended version of the v_name vector for the axis.
library(ggplot2); library(dplyr)
# Add the v_name into the table
temp2 <- temp %>% group_by(sex) %>% mutate(v_name = v_name) %>% ungroup()
# Make the dummy label for axis with add'l entries
v_name2 <- append(v_name, "FeatGender", after = 0)
v_name2 <- append(v_name2, "FeatEU", after = 4)
# Plot using the new table
t <- ggplot(temp2, aes(x=v_name, y=coef, group=sex, colour=sex))
t +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(aes(ymin = low, ymax = high, width = 0), position = position_dodge(0.3)) +
coord_flip() +
# ... but use the larger list of axis names
scale_x_discrete(limits = rev(v_name2)) +
geom_hline(yintercept = 0.0, linetype = "dotted") +
theme(legend.position = "bottom")

Implementing paired lines into boxplot.ggplot2

I have a set of paired data, and I'm using ggplot2.boxplot (of the easyGgplot2 package) with added (jittered) individual data points:
ggplot2.boxplot(data=INdata,xName='condition',yName='vicarious_pain',groupName='condition',showLegend=FALSE,
position="dodge",
addDot=TRUE,dotSize=3,dotPosition=c("jitter", "jitter"),jitter=0.2,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired")
INdata:
ID,condition,pain
1,Treatment,4.5
3,Treatment,12.5
4,Treatment,16
5,Treatment,61.75
6,Treatment,23.25
7,Treatment,5.75
8,Treatment,5.75
9,Treatment,5.75
10,Treatment,44.5
11,Treatment,7.25
12,Treatment,40.75
13,Treatment,17.25
14,Treatment,2.75
15,Treatment,15.5
16,Treatment,15
17,Treatment,25.75
18,Treatment,17
19,Treatment,26.5
20,Treatment,27
21,Treatment,37.75
22,Treatment,26.5
23,Treatment,15.5
25,Treatment,1.25
26,Treatment,5.75
27,Treatment,25
29,Treatment,7.5
1,No Treatment,34.5
3,No Treatment,46.5
4,No Treatment,34.5
5,No Treatment,34
6,No Treatment,65
7,No Treatment,35.5
8,No Treatment,48.5
9,No Treatment,35.5
10,No Treatment,54.5
11,No Treatment,7
12,No Treatment,39.5
13,No Treatment,23
14,No Treatment,11
15,No Treatment,34
16,No Treatment,15
17,No Treatment,43.5
18,No Treatment,39.5
19,No Treatment,73.5
20,No Treatment,28
21,No Treatment,12
22,No Treatment,30.5
23,No Treatment,33.5
25,No Treatment,20.5
26,No Treatment,14
27,No Treatment,49.5
29,No Treatment,7
The resulting plot looks like this:
However, since this is paired data, I want to represent this in the plot - specifically to add lines between paired datapoints. I've tried adding
... + geom_line(aes(group = ID))
..but I am not able to implement this into the ggplot2.boxplot code. Instead, I get this error:
Error in if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
argument is not interpretable as logical
In addition: Warning message:
In if (addMean) p <- p + stat_summary(fun.y = mean, geom = "point", :
the condition has length > 1 and only the first element will be used
Grateful for any input on this!
I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.
The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:
b <- runif(nrow(df), -0.1, 0.1)
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")
First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.
I do agree with eipi10's comment that the plot works better without jitter:
ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()
and the updated plot with jittered points eipi10 style:
ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()
Although I like the oldschool way of plotting with ggplot as shown by #missuse's answer, I wanted to check whether using your ggplot2.boxplot-based code this was also possible.
I loaded your data:
'data.frame': 52 obs. of 3 variables:
$ ID : int 1 3 4 5 6 7 8 9 10 11 ...
$ condition: Factor w/ 2 levels "No Treatment",..: 2 2 2 2 2 2 2 2 2 2 ...
$ pain : num 4.5 12.5 16 61.8 23.2 ...
And called your code, adding geom_line at the end as you suggested your self:
ggplot2.boxplot(data = INdata,xName = 'condition', yName = 'pain', groupName = 'condition',showLegend = FALSE,
position = "dodge",
addDot = TRUE, dotSize = 3, dotPosition = c("jitter", "jitter"), jitter = 0,
ylim = c(0,100),
backgroundColor = "white",xtitle = "",ytitle = "Pain intenstity", mainTitle = "Pain intensity",
brewerPalette = "Paired") + geom_line(aes(group = ID))
Note that I set jitter to 0. The resulting graph looks like this:
If you don't set jitter to 0, the lines still run from the middle of each boxplot, ignoring the horizontal location of the dots.
Not sure why your call gives an error. I thought it might be a factor issue, but I see that my ID variable is not factor class.
I implemented missuse's jitter solution into the ggplot2.boxplot approach in order to align the dots and lines. Instead of using "addDot", I had to instead add dots using geom_point (and lines using geom_line) after, so I could apply the same jitter vector to both dots and lines.
b <- runif(nrow(df), -0.2, 0.2)
ggplot2.boxplot(data=df,xName='condition',yName='pain',groupName='condition',showLegend=FALSE,
ylim=c(0,100),
backgroundColor="white",xtitle="",ytitle="Pain intenstity",mainTitle="Pain intensity",
brewerPalette="Paired") +
geom_point(aes(x=as.numeric(condition) + b, y=pain),colour="black",size=3, alpha=0.7) +
geom_line(aes(x=as.numeric(condition) + b, y=pain, group=ID), colour="grey30", linetype="11", alpha=0.7)

Modyfing the Legend in ggplot2

I've got a problem interacting with the labels in ggplot2.
I have two data sets (Temperature vs. Time) from two experiments but recorded at different timesteps. I've managed to merge the data frames and put them in a long fashion to plot them in the same graph, using the melt function from the reshape2 library.
So, the initial data frames look something like this:
> d1
step Temp
1 512.5 301.16
2 525.0 299.89
3 537.5 299.39
4 550.0 300.58
5 562.5 300.20
6 575.0 300.17
7 587.5 300.62
8 600.0 300.51
9 612.5 300.96
10 625.0 300.21
> d2
step Temp
1 520 299.19
2 540 300.39
3 560 299.67
4 580 299.43
5 600 299.78
6 620 300.74
7 640 301.03
8 660 300.39
9 680 300.54
10 700 300.25
I combine it like this:
> mrgd <- merge(d1, d2, by = "step", all = T)
step Temp.x Temp.y
1 512.5 301.16 NA
2 520.0 NA 299.19
...
And put it into long format for ggplot2 with this:
> melt1 <- melt(mrgd3, id = "step")
> melt1
step variable value
1 512.5 Temp.x 301.16
2 520.0 Temp.x NA
...
Now, I want to for example do a histogram of the distribution of values. I do it like this:
p <- ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) + geom_histogram(alpha = 0.4)
My problem is when I try to modify the Legend of this graph, I don't know how to! I've followed what is suggested in the R Graphics Cookbook book, but I've had no luck.
I've tried to do this, for example (to change the labels of the Legend):
> p + scale_fill_discrete(labels = c("d1", "d2"))
But I just create a "new" Legend box, like so
Or even removing the Legend completely
> p + scale_fill_discrete(guide = F)
I just get this
Finally, doing this also doesn't help
> p + scale_fill_discrete("")
Again, it just adds a new Legend box
Does anyone know what's happening here? It looks as if I'm actually modyfing another Label object, if that makes any sense. I've looked into other related questions in this site, but I haven't found someone having the same problem as me.
Get rid of the aes(color = variable...) to remove the scale that belongs to aes(color = ...).
ggplot(data = melt1, aes(x = value, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) # Change the labels for `fill` scale
This second plot contains aes(color = variable...). Color in this case will draw colored outlines around the histogram bins. You can turn off the scale so that you only have one legend, the one created from fill
ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) +
scale_color_discrete(guide = F) # Turn off the color (outline) scale
The most straightforward thing to do would be to not use reshape2 or merge at all, but instead to rbind your data frames:
dfNew <- rbind(data.frame(d1, Group = "d1"),
data.frame(d2, Group = "d2"))
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group)) +
geom_histogram(alpha = 0.4) +
labs(fill = "", color = "")
If you wanted to vary alpha by group:
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group, alpha = Group)) +
geom_histogram() +
labs(fill = "", color = "") +
scale_alpha_manual("", values = c(d1 = 0.4, d2 = 0.8))
Note also that the default position for geom_histogram is "stacked". There won't be overlap of the bars unless you use geom_histogram(position = identity).

ggplot2 multiple sub groups of a bar chart

I am trying to produce a bar graph that has multiple groupings of factors. An example from excel of what I am attempting to create, subgrouped by Variety and Irrigation treatment:
I know I could produce multiple graphs using facet_wrap(), but I would like to produce multiple graphs for this same type of data for multiple years of similar data. An example of the data I used in this example:
Year Trt Variety geno yield SE
2010-2011 Irr Variety.2 1 6807 647
2010-2011 Irr Variety.2 2 5901 761
2010-2011 Irr Variety.1 1 6330 731
2010-2011 Irr Variety.1 2 5090 421
2010-2011 Dry Variety.2 1 3953 643
2010-2011 Dry Variety.2 2 3438 683
2010-2011 Dry Variety.1 1 3815 605
2010-2011 Dry Variety.1 2 3326 584
Is there a way to create multiple groupings in ggplot2? I have searched for quite some time and have yet to see an example of something like the example graph above.
Thanks for any help you may have!
This may be a start.
dodge <- position_dodge(width = 0.9)
ggplot(df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2)
Update: labelling of x axis
I have added:
coord_cartesian, to set limits of y axis, mainly the lower limit to avoid the default expansion of the axis.
annotate, to add the desired labels. I have hard-coded the x positions, which I find OK in this fairly simple example.
theme_classic, to remove the gray background and the grid.
theme, increase lower plot margin to have room for the two-row label, remove default labels.
Last set of code: Because the text is added below the x-axis, it 'disappears' outside the plot area, and we need to remove the 'clipping'. That's it!
library(grid)
g1 <- ggplot(data = df, aes(x = interaction(Variety, Trt), y = yield, fill = factor(geno))) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymax = yield + SE, ymin = yield - SE), position = dodge, width = 0.2) +
coord_cartesian(ylim = c(0, 7500)) +
annotate("text", x = 1:4, y = - 400,
label = rep(c("Variety 1", "Variety 2"), 2)) +
annotate("text", c(1.5, 3.5), y = - 800, label = c("Irrigated", "Dry")) +
theme_classic() +
theme(plot.margin = unit(c(1, 1, 4, 1), "lines"),
axis.title.x = element_blank(),
axis.text.x = element_blank())
# remove clipping of x axis labels
g2 <- ggplot_gtable(ggplot_build(g1))
g2$layout$clip[g2$layout$name == "panel"] <- "off"
grid.draw(g2)

Resources