Connecting points on a graph within nested groups of data with ggplot2 - r

I'm having a problem working out how to draw lines between points on a ggplot that are in a nested structure.
What I have is a set of data that is broken down by 3 different nested groups.
Which are then plotted, the first group is used with facet to pair the subgroups (Mutation), the second group then splits the data into the initial experiment (HiSeq) and the replication experiment (MiSeq), while the third group (Grouping) colors and shapes the points by the sample type they are from.
Where I have gotten stuck though is I'd like to link the 2 points (HiSeq/Miseq) within an pair (mutation) via a line to make it easy to workout which two are linked. I've made a mock up which can be seen:
However I'm unable to work out how to do this across the two groups (HiSeq/Miseq) while staying within the top level group (Mutation).
Does any one have a solution to this? A fragment of the data and the code I'm using to build the current graph can be seen below. It may end up being to messy to be presentable but it would be useful to solve.
ggplot(test,aes(y=AR,x=Type,fill=Grouping,colour=Grouping,shape=Grouping)) +
geom_point(binaxis='y',stackdir='center',position=position_dodge(width = 0.2),size=7) +
facet_wrap(~ Mutation,nrow=1) +
xlab("") +
ylab("Allelic Ratio") +
theme_minimal(base_size=20)
example data:
structure(list(Mutation = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("chr1:51910329",
"chr1:72951069"), class = "factor"), Type = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("HiSeq", "MiSeq"), class = "factor"), Grouping = structure(c(3L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 1L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Offspring (M)", "Offspring (P)", "Proband"
), class = "factor"), Name = c(288458773L, 288458773L, 423125012L,
423125012L, 344991226L, 344991226L, 422977809L, 422977809L, 420753074L,
420753074L, 351142406L, 351142406L, 422743921L, 422743921L, 425596544L,
425596544L, 422595517L, 422595517L, 477342393L, 477342393L, 288458773L,
288458773L, 423125012L, 423125012L, 344991226L, 344991226L, 422977809L,
422977809L, 420753074L, 420753074L, 351142406L, 351142406L, 477342393L,
477342393L, 480773638L, 480773638L), AR = c(0.38, 0.3, 0, 0,
0.375, 0.545, 0.41, 0.388, 0.35, 0.42, 0, 0, NA, 0.59, NA, 0,
0, 0.05, 0, 0, 0.1875, 0.078379734, 0.4, 0.505582473, 0, 0.002394493,
0, 0.002023547, 0, 0.001600569, 0.6, 0.510240797, 0.6, 0.490997813,
0, 0.001785424)), .Names = c("Mutation", "Type", "Grouping",
"Name", "AR"), class = "data.frame", row.names = c(NA, -36L))

I think this may be what you want -- look into geom_line and understanding its group aesthetic:
ggplot(df, aes(x = Type, y = AR, fill = Grouping, color = Grouping, shape = Grouping)) +
geom_point(size = 5) +
geom_line(aes(group = Name)) +
facet_wrap(~ Mutation)

Related

tidy eval ggplot2 NSE not rendering correctly

I'm trying to write a function to pass quoted items for constructing multiple ggplots.The following code works great and does what I want.
fig2.data %>%
ggplot(aes(x = Surgery, y = BALF_Protein, fill = Exposure)) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +
stat_summary(geom = "bar", fun = mean, position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("lightgrey","darkgrey")) +
facet_grid(cols = vars(Duration))
Using this guide I constructed the following function and called the function.
plotf <- function(x, y, fill, facet){
x_var <- enquo(x)
y_var <- enquo(y)
facet_var <- enquo(facet)
fill_var <- enquo(fill)
ggplot(fig2.data, aes(x = !!x_var, y = !!y_var, fill = !!fill_var)) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +
stat_summary(geom = "bar", fun = mean, position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("lightgrey","darkgrey")) +
facet_grid(cols = vars(!!facet_var))
}
plotf(x = "Surgery", y = "BALF_Protein", fill = "Exposure", facet = "Duration")
My graph rendered without errors, but it is not rendered the same way.
What am I doing wrong?
Thank you #Stefan
I don't understand why, but calling it as you suggested worked. How is that going to work when I want to loop over a vector of variable names to call the function and those are going to be passed as quoted. Use syms() ?
plotf(x = Surgery, y = BALF_Protein, fill = Exposure, facet = Duration)
ReproData here with some rnorm() so your plot might be slightly different heights.
fig2.data <- structure(list(Surgery = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("SHAM", "HEP VAG"
), class = "factor"), Exposure = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Air",
"Ozone"), class = "factor"), Duration = structure(c(2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1d",
"2d"), class = "factor"), BALF_Protein = c(64.2302655135303,
75.8662498743628, 66.944160651771, 64.3494818599307, 93.5733806883362,
93.9843061725941, 94.9296956493259, 85.5985055395191, 80.4974511604734,
70.6316004306272, 85.3439438112908, 79.4666853120619, 84.7319693413318,
224.606438793638, 78.4487502522719, 78.2128699744882, 92.0151032176434,
79.2127901600167, 83.0909690767245, 92.0325415462662, 60.6200784843927,
97.7183404856683, 68.7510921525122, 41.9625493809036, 311.769822036931,
450.597937801349, 283.639976251784, 190.840750069959, 187.810222461528,
203.735530975931, 547.003463243173, 517.871472878502, 164.167773487012,
202.777306107217, 666.896662547508, 361.46103562071, 270.119121964956,
234.635143377769, 94.4541075117046, 91.1060986818939, 142.774777316869,
300.021992736686, 279.775933301683, 246.554185364089, 298.964364163939,
193.737945537319, 232.918974192744, 150.384203703162)), row.names = c(NA,
-48L), class = "data.frame")

Removing specific strips in a double-strip plot

I'm trying to remove the redundant "pro/retro" labels on the second row of panels on my plot. However, I still want to keep the top row of panel labels intact. I've tried for the past hour to selectively remove the 1st strip on the 2nd panel row and I was wondering if anyone here knows how to do this. See below for technical details.
I have the following plot:
It was generated from the following data:
absBtwnDat <- structure(list(setSize = structure(c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L), .Label = c("2", "3", "4", "5", "6", "7", "8"), class = "factor"),
Measure = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("Actual", "Predicted"), class = "factor"),
Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("fix", "forced"), class = "factor"),
JudgementType = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("pro", "retro"), class = "factor"),
Accuracy = c(1.91388888888889, 2.95555555555556, 3.74861111111111,
4.37777777777778, 4.21527777777778, 3.0875, 2.85277777777778,
2, 2.99444444444444, 4, 4.77222222222222, 5.24444444444444,
5.18472222222222, 5.20277777777778, 1.98888888888889, 3,
3.97222222222222, 4.85972222222222, 5.70555555555556, 6.56944444444444,
7.27222222222222, 2, 3, 3.99444444444444, 4.99444444444444,
5.86944444444444, 6.75555555555556, 7.57777777777778, 1.96111111111111,
2.97777777777778, 3.78333333333333, 3.97222222222222, 4.22361111111111,
3.64722222222222, 3.68888888888889, 2, 3, 3.97222222222222,
4.67777777777778, 5.26944444444444, 5.4625, 5.8, 2, 3, 3.98333333333333,
4.87777777777778, 5.73055555555556, 6.48333333333333, 7.62916666666667,
2, 3, 3.98333333333333, 4.96666666666667, 5.96944444444444,
6.94444444444444, 7.93333333333333), LL = c(1.85, 2.87777777777778,
3.59861111111111, 4.15555555555556, 3.78888888888889, 2.73055555555556,
2.55555555555556, 2, 2.96111111111111, 4, 4.64444444444444,
5.01666666666667, 4.88333333333333, 4.88611111111111, 1.91111111111111,
3, 3.89444444444444, 4.73611111111111, 5.47777777777778,
6.20277777777778, 6.71666666666667, 2, 3, 3.96666666666667,
4.95555555555556, 5.65096686319131, 6.48333333333333, 7.17222222222222,
1.86637442123568, 2.92222222222222, 3.65, 3.61666666666667,
3.88333333333333, 3.17092476055122, 3.18888888888889, 2,
3, 3.92222222222222, 4.49444444444444, 5.0375, 5.09444444444444,
5.40555555555556, 2, 3, 3.92777777777778, 4.72222222222222,
5.52777777777778, 6.24444444444444, 7.37361111111111, 2,
3, 3.95, 4.88888888888889, 5.93333333333333, 6.88333333333333,
7.73065763697428), UL = c(1.95555555555556, 2.98333333333333,
3.84444444444444, 4.56666666666667, 4.6, 3.43611111111111,
3.17916666666667, 2, 3, 4, 4.86111111111111, 5.42777777777778,
5.48656054159421, 5.58611111111111, 2, 3, 4, 4.93888888888889,
5.83888888888889, 6.76944444444444, 7.6, 2, 3, 4, 5, 5.94166666666667,
6.88888888888889, 7.78888888888889, 1.98888888888889, 2.99444444444444,
3.87777777777778, 4.22777777777778, 4.53611111111111, 4.19722222222222,
4.20555555555556, 2, 3, 3.98888888888889, 4.78333333333333,
5.45555555555556, 5.79583333333333, 6.16666666666667, 2,
3, 3.99444444444444, 4.95, 5.85972222222222, 6.67222222222222,
7.80138888888889, 2, 3, 3.99444444444444, 4.98888888888889,
5.9875, 6.97222222222222, 7.98333333333333)), .Names = c("setSize",
"Measure", "Location", "JudgementType", "Accuracy", "LL", "UL"
), row.names = c(NA, -56L), class = "data.frame")
I visualized it using using the following code:
library(ggplot2)
p1 <- ggplot(data = absBtwnDat, aes(x = as.numeric(as.character(setSize)),
y = Accuracy, group = Measure,
colour = Measure))+
geom_point()+
geom_line(aes(linetype = Measure))+
scale_x_continuous("Trial Set Size", breaks = 2:8)+
scale_y_continuous("Accuracy (# Correct)", breaks = 0:8, limits = c(0, 8))+
geom_errorbar(aes(ymin = LL, ymax = UL), width = .1, size = .75)+
scale_colour_grey(start = .8, end = .4)+
facet_wrap(~JudgementType+Location, dir = "v")+
theme(legend.position = "top")
Just to be certain, I've highlighted unwanted strip in the following image:
With this you'll only have one row of labels per panel, but they still include both words.
p1 <- ggplot(data = absBtwnDat,
aes(x = as.numeric(as.character(setSize)), y = Accuracy,
group = Measure,
colour = Measure))+
geom_point()+
geom_line(aes(linetype = Measure))+
scale_x_continuous("Trial Set Size", breaks = 2:8)+
scale_y_continuous("Accuracy (# Correct)",
breaks = 0:8, limits = c(0, 8))+
geom_errorbar(aes(ymin = LL, ymax = UL),
width = .1, size = .75)+
scale_colour_grey(start = .8, end = .4)+
facet_wrap(~JudgementType + Location,
dir = "v",
labeller = label_wrap_gen(multi_line=FALSE)) +
theme(legend.position = "top")
p1
Here is a possible solution:
g1 <- ggplotGrob(p1)
k <- which(g1$layout$name=="strip-t-1-2")
g1$grobs[[k]]$grobs[[1]]$children[[2]]$children[[1]]$label <- ""
g1$grobs[[k]]$grobs[[1]]$children[[1]]$gp$fill <- NA
k <- which(g1$layout$name=="strip-t-2-2")
g1$grobs[[k]]$grobs[[1]]$children[[2]]$children[[1]]$label <- ""
g1$grobs[[k]]$grobs[[1]]$children[[1]]$gp$fill <- NA
library(grid)
grid.draw(g1)

Boxplot with two levels and multiple data.frames

I have 4 data.frames with two factor levels in each data.frame. df1 is reproduced below. Please duplicate df1 to produce df2...df4.
How can I produce boxplots with ggplot2 such that my final figure looks very similar to the figure below? The seasons in the figure represent the dataframe names while present and future represent level names and the legend represents heavy, heavy, heaviest in the data reproduced here.
Ignore the dotted horizontal red line.
df1= structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("NN", "SS"), class = "factor"),
heavy = c(0.136230125, 0.136281211, 0.136038018, 0.135392862,
0.137088902, 0.136028293, 0.13640057, 0.135317058, 0.13688615,
0.136448994, 0.137089424, 0.136810847, 0.135865471, 0.136130096,
0.136361327, 0.137796714, 0.136052839, 0.135892646, 0.13544437,
0.136452363, 0.135367421, 0.135617509, 0.138202559, 0.135396942,
0.135930092, 0.135661805, 0.135666, 0.135860128, 0.137648687,
0.136057353, 0.136057731, 0.135162399, 0.136080113, 0.135285036,
0.136204839, 0.138058091, 0.137215664, 0.135696637, 0.135863902,
0.135733243, 0.138274445, 0.136632122, 0.137787919, 0.135033093,
0.136926798, 0.136766413, 0.13690947, 0.135203152, 0.138370968,
0.136862356, 0.136083112, 0.138212845, 0.135964773, 0.13583601,
0.134923731, 0.135828965, 0.136272539, 0.138127602, 0.137028323,
0.136526836, 0.136407397, 0.137025373, 0.138358757, 0.137858521,
0.135464076, 0.136302506, 0.135528362, 0.137540677, 0.136455865,
0.138470144, 0.137227895, 0.136296955, 0.136792631, 0.135875782,
0.13815733, 0.136383864, 0.136696618, 0.13857652, 0.136700903,
0.136743873, 0.136033619, 0.135970522, 0.135816385, 0.136003984,
0.136583925, 0.136768202, 0.136292002, 0.136316737, 0.136540075,
0.136051218, 0.135924119, 0.136736303, 0.136946894, 0.136266073,
0.136263692, 0.136399301, 0.13611577, 0.135857095, 0.136769488,
0.136072466, 0.135564224, 0.136496131, 0.137659507, 0.136704681,
0.136542173, 0.136777403, 0.135771538, 0.13665463, 0.136984748,
0.137717859, 0.138195237, 0.136232227, 0.135956814), heavier = c(0.227332679,
0.227200132, 0.227299118, 0.227289816, 0.22724478, 0.227082442,
0.227861315, 0.227055561, 0.227112284, 0.228651438, 0.228158412,
0.228789678, 0.227188949, 0.228850198, 0.227246991, 0.227359368,
0.227359531, 0.227310607, 0.229490445, 0.227295226, 0.227958185,
0.228104958, 0.227254823, 0.22715392, 0.228062515, 0.227509559,
0.227143662, 0.230048719, 0.227860836, 0.228467792, 0.227263728,
0.227222794, 0.227165592, 0.227140611, 0.228424335, 0.227356425,
0.227243374, 0.228936267, 0.227320467, 0.22738371, 0.227694891,
0.227270428, 0.227751798, 0.228803279, 0.227330453, 0.229679261,
0.228999206, 0.227227604, 0.227247085, 0.227198567, 0.229234921,
0.227211613, 0.23007234, 0.226793036, 0.226474338, 0.226654333,
0.229964991, 0.22880328, 0.22700099, 0.226640822, 0.227522393,
0.227463578, 0.227832692, 0.227293936, 0.230154101, 0.229813709,
0.22761097, 0.227445308, 0.228669159, 0.22660539, 0.229017398,
0.230421347, 0.227041103, 0.227583471, 0.229547568, 0.22676335,
0.226737661, 0.229922588, 0.226907188, 0.227102239, 0.226469073,
0.230680908, 0.227763879, 0.226882448, 0.226741993, 0.226693024,
0.22671415, 0.226773662, 0.227795194, 0.226983096, 0.226647946,
0.226799552, 0.226759218, 0.22692942, 0.226601519, 0.227098192,
0.226886889, 0.226959012, 0.226552119, 0.226809761, 0.226786285,
0.226709252, 0.226834015, 0.228033943, 0.226693494, 0.22748613,
0.227608804, 0.22685023, 0.226586619, 0.227718907, 0.228890098,
0.226701909, 0.230919944), heaviest = c(0.316870607, 0.316772978,
0.316851707, 0.317017543, 0.316673994, 0.317224709, 0.319234458,
0.31861305, 0.319804304, 0.318605816, 0.316930034, 0.31688398,
0.316789552, 0.320783976, 0.317094325, 0.31809319, 0.317134565,
0.318173976, 0.317213167, 0.317084404, 0.321712205, 0.317128056,
0.316866913, 0.3170489, 0.31712423, 0.31684494, 0.319497635,
0.316932301, 0.316864646, 0.317279005, 0.316887692, 0.317134437,
0.316792589, 0.320894499, 0.319883014, 0.316924639, 0.316575642,
0.31686389, 0.316985994, 0.321566256, 0.316683995, 0.320299883,
0.317308965, 0.318151948, 0.316479828, 0.319857732, 0.317171909,
0.322137849, 0.316526917, 0.316870364, 0.322205784, 0.317055758,
0.320329144, 0.318015397, 0.318719989, 0.317910658, 0.317292016,
0.321348723, 0.319915048, 0.317160762, 0.318773245, 0.319627925,
0.31869767, 0.322422407, 0.32082693, 0.318034899, 0.318760783,
0.318325502, 0.320739086, 0.317216142, 0.32284544, 0.319466593,
0.318740499, 0.317489944, 0.319064923, 0.322014928, 0.317353897,
0.318904583, 0.317931141, 0.323295254, 0.318924712, 0.318965677,
0.317700019, 0.31793468, 0.317699508, 0.317168657, 0.318903983,
0.317493401, 0.317511406, 0.317483897, 0.31748495, 0.317776804,
0.318893431, 0.317663608, 0.316978585, 0.317473467, 0.317500429,
0.317144259, 0.317330826, 0.317610353, 0.317881476, 0.31707787,
0.317728374, 0.317452137, 0.31938939, 0.317199373, 0.31898747,
0.318878952, 0.317987024, 0.318951952, 0.318419561, 0.319568088,
0.321165413)), .Names = c("id", "heavy", "heavier", "heaviest"
), class = "data.frame", row.names = c(NA, -113L))
## create some data.frames: this results in a list of four dfs
createDF <- quote(data.frame(id=sample(c("NN", "SS"), 100, rep=T),
heavy=runif(100),
heavier=runif(100),
heaviest=runif(100)))
dfs <- lapply(1:4, function(i) eval(createDF))
## join and shape them
library(reshape2)
dat <- do.call(rbind, dfs)
dat$dfid <- paste("df", rep(1:4, times=sapply(dfs, nrow)))
dat <- melt(dat, id.vars=c("id", "dfid"))
ggplot(dat, aes(id, value, group=interaction(variable, id), fill=variable)) +
geom_boxplot() +
facet_grid(~dfid)
Something like this?
df1$season<- 'winter'
df2$season<- 'spring'
df3$season<- 'summer'
df4$season<- 'fall'
df1.m <- melt(df1, id.vars=c('id', 'season'), variable.name='weight', value.name='weight')
df2.m <- melt(df2, id.vars=c('id', 'season'), variable.name='weight', value.name='weight')
df3.m <- melt(df3, id.vars=c('id', 'season'), variable.name='weight', value.name='weight')
df4.m <- melt(df4, id.vars=c('id', 'season'), variable.name='weight', value.name='weight')
df.all <- rbind(df1.m, df2.m, df3.m, df4.m)
ggplot(df.all, aes(x=id, y=weight, fill=weightCat)) + geom_boxplot() + facet_grid(. ~ season)

ggplot: geom_boxplot and geom_jitter

Align the data points with the box plot.
DATA:
data<-structure(list(score = c(0.058, 0.21, -0.111, -0.103, 0.051,
0.624, -0.023, 0.01, 0.033, -0.815, -0.505, -0.863, -0.736, -0.971,
-0.137, -0.654, -0.689, -0.126), clin = structure(c(1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label =
c("Non-Sensitive",
"Sensitive "), class = "factor"), culture = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Co-culture", "Mono-culture"), class = "factor"),
status = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("new", "old"
), class = "factor")), .Names = c("score", "clin", "culture",
"status"), class = "data.frame", row.names = c(NA, -18L))
CODES:
p<-ggplot(data, aes(culture, as.numeric(score),fill=status))
p+geom_boxplot(outlier.shape = NA)+
theme_bw()+scale_fill_grey(start = 0.8, end = 1)+
labs(title="title", x="", y="score",fill="", colour="")+
geom_jitter(aes(colour = clin), alpha=0.9,
position=position_jitter(w=0.1,h=0.1))
As you can see, the data points plotted using geom_jitter do not align with the boxplot. I know that I need to provide aes elements to geom_jitter as well - but I not sure how to do it correctly.
I don't think you can do this because the positions of the boxplots are being driven by the dodge algorithm as opposed to an explicit aesthetic, though I'd be curious if someone else figures out a way of doing it. Here is a workaround:
p<-ggplot(data, aes(status, as.numeric(score),fill=status))
p+geom_boxplot(outlier.shape = NA)+
theme_bw()+scale_fill_grey(start = 0.8, end = 1)+
labs(title="title", x="", y="score",fill="", colour="")+
geom_jitter(aes(colour = clin), alpha=0.9,
position=position_jitter(w=0.1,h=0.1)) +
facet_wrap(~ culture)
By using the facets for culture, we can assign an explicit aesthetic to status, which then allows to line up the geom_jitter with the geom_boxplot. Hopefully this is close enough for your purposes.

Bar chart with several non mutually exclusive characteristics

I have a data set with a variable that has several other characteristics, which are not mutually exclusive. Here's the data.
df <- structure(list(cont1 = structure(c(2L, 2L, 4L, 1L, 2L, 3L, 2L, 4L, 4L, 1L, 2L, 2L, 4L, 1L, 1L, 2L, 2L), .Label = c("Africa", "Asia", "Europe", "LAC"), class = "factor"), SIDS = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("No", "SIDS"), class = "factor"), LDC = structure(c(2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("LDC", "No"), class = "factor")), .Names = c("cont1",
"SIDS", "LDC"), class = "data.frame", row.names = c(NA, -17L))
So when I put it into long format df.m <- melt(df, id.vars = c("cont1")) I can build the plot with ggplot2 but get all the NAs in the plot. If I exclude them the proportions are distorted because there are more NAs in one of the categories.
ggplot(df.m, aes(x = cont1, fill = value)) + geom_bar()
ggplot(df.m[df.m$value != "No",], aes(x = cont1, fill = value)) + geom_bar()
Is there a way to have a bar plot of the variable cont1 with the value as a fill without the NAs distorting the proportion? That is can I use a different length for the fill in ggplot2?

Resources