post hoc lettering mismatch ggplot2 - r

I have been trying to plot the result of a lsmeans model, where boxes indicate the LS mean. Error bars indicate the 95% confidence interval of the LS mean and where means sharing a letter are not significantly different. I would like to plot the following table cld.mixed.lme with ggplot2:
dput(cld.mixed.lme)
structure(list(hor = structure(c(3L, 3L, 3L, 1L, 1L, 1L, 2L,
2L, 2L), .Label = c("L", "F", "H"), class = "factor"), managem = structure(c(1L,
3L, 2L, 3L, 1L, 2L, 1L, 2L, 3L), .Label = c("WTH", "CH", "CHF"
), class = "factor"), response = c(23.6794086785122, 23.8174295982324,
24.4481975946679, 27.7814605969773, 28.6059616644958, 28.7459261527063,
37.1161977750334, 40.0618072489354, 40.062016186989), SE = c(2.47194303396734,
2.47194303396734, 2.47194303396734, 2.47194303396734, 2.47194303396734,
2.47194303396734, 2.47194303396734, 2.47194303396734, 2.47194303396734
), df = c(12.8849763292624, 12.8849763292851, 12.8849763290692,
12.8849763293197, 12.8849763292728, 12.8849763291023, 12.8849763292846,
12.88497632933, 12.8849763292846), lower.CL = c(15.4642399103678,
15.602260830088, 16.2330288265235, 19.5662918288329, 20.3907928963513,
20.5307573845618, 28.901029006889, 31.846638480791, 31.8468474188446
), upper.CL = c(31.8945774466566, 32.0325983663769, 32.6633663628123,
35.9966293651217, 36.8211304326402, 36.9610949208507, 45.3313665431779,
48.2769760170799, 48.2771849551334), .group = c("a", "ab", "ab",
"abc", "abcde", "abd", "bcde", "ce", "de")), .Names = c("hor",
"managem", "response", "SE", "df", "lower.CL", "upper.CL", ".group"
), row.names = c(8L, 5L, 2L, 6L, 9L, 3L, 7L, 1L, 4L), class = "data.frame")
it looks like this:
----------------------------------------------------------------------------------
hor managem response SE df lower.CL upper.CL .group
-------- ----- --------- ---------- ------- ------- ---------- ---------- --------
**8** H WTH 23.68 2.472 12.88 15.46 31.89 a
**5** H CHF 23.82 2.472 12.88 15.6 32.03 ab
**2** H CH 24.45 2.472 12.88 16.23 32.66 ab
**6** L CHF 27.78 2.472 12.88 19.57 36 abc
**9** L WTH 28.61 2.472 12.88 20.39 36.82 abcde
**3** L CH 28.75 2.472 12.88 20.53 36.96 ab d
**7** F WTH 37.12 2.472 12.88 28.9 45.33 bcde
**1** F CH 40.06 2.472 12.88 31.85 48.28 c e
**4** F CHF 40.06 2.472 12.88 31.85 48.28 de
---------------------------------------------------------------------------------
After running the following code, the plot is displayed correctly, but there is a mismatch as the .group letters fall on the wrong response.
Example in the resulting plot: under hor = L managem = WTH I have .group letters "abc" instead of "abcde" (this falling under managem=CH instead).
Here is the code:
library(ggplot2)
pd = position_dodge(0.7)
plot.mixed.lme<-ggplot(cld.mixed.lme,aes(x = hor,y=response, color=managem, label=.group))+
theme_bw()+
geom_point(shape = 15, size = 4, position = pd) +
geom_errorbar(aes(ymin = lower.CL,ymax = upper.CL),width = 0.2,size = 0.7,position = pd) +
theme(axis.title = element_text(face = "bold"),
axis.text = element_text(face = "bold"),
plot.caption = element_text(hjust = 0)) +
geom_text(nudge_x = c(-0.3, 0, 0.3, -0.3, 0, 0.3,-0.3, 0, 0.3),
nudge_y = c(4.5, 4.5, 4.5,4.5, 4.5, 4.5,4.5, 4.5, 4.5),
color = "black")
plot.mixed.lme
Here is the resulting plot:
I welcome any suggestions and many thanks in advance for your help,
BAlpine

I found a way around, but this is time consuming. Basically I modified the
geom_text to suit the table:
geom_text(nudge_x = c(-0.3, 0.3, 0, 0.3, -0.3, 0,-0.3, 0, 0.3),
Any idea to match it automatically?
Many thanks

Related

Add gradient color within groups in ggplot2

I need help in order to add colors to ggplot objects (specificaly geom_bar).
Here is my data
Names Family Groups Values
H.sapiens A G1 2
H.erectus A G1 6
H.erectus B G2 12
M.griseus C G2 3
A.mellifera D G3 3
L.niger D G3 8
H.erectus D G3 2
L.niger A G1 3
L.niger B G2 3
A.mellifera A G1 8
And so far I suceeded to create this plot :
with this code :
library(ggplot2)
library(ggstance)
library(ggthemes)
ggplot(table, aes(fill=Family, y=Names, x=Values)) +
geom_barh(stat="identity",colour="white")+ theme_minimal() +
scale_x_continuous(limits = c(0,60), expand = c(0, 0))
and now I would like to change the color depending of Groups. More precisely I would like to choose a major color for each group, for instance: G1= blue ; G2 = Green ; G3= Red.
and for each Family to get a gradient within these colors. For instance, B will be darkblue and C ligthblue.
Does someone have an idea, please ?
Here are the data :
dput(table)
structure(list(Names = structure(c(3L, 2L, 2L, 5L, 1L, 4L, 2L,
4L, 4L, 1L), .Label = c("A.mellifera", "H.erectus", "H.sapiens",
"L.niger", "M.griseus"), class = "factor"), Family = structure(c(1L,
1L, 2L, 3L, 4L, 4L, 4L, 1L, 2L, 1L), .Label = c("A", "B", "C",
"D"), class = "factor"), Groups = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 3L, 1L, 2L, 1L), .Label = c("G1", "G2", "G3"), class = "factor"),
Values = c(2L, 6L, 12L, 3L, 3L, 8L, 2L, 3L, 3L, 8L)), class = "data.frame", row.names = c(NA,
-10L))
You may perhaps tweak this one to suit your requirements (I have changed your sample data a bit to show you different gradient among same Group)
df <- read.table(header = T, text = "Names Family Groups Values
H.sapiens A G1 2
H.erectus B G1 6
H.erectus B G2 12
M.griseus C G2 3
A.mellifera D G3 3
L.niger D G3 8
H.erectus A G3 2
L.niger A G1 3
L.niger B G2 3
A.mellifera C G1 8")
library(tidyverse)
df %>% ggplot() +
geom_col(aes(x = Names, y = Values, fill = Groups, alpha = as.integer(as.factor(Family)))) +
coord_flip() +
scale_fill_manual(name = "Groups", values = c("blue", "green", 'red')) +
scale_alpha_continuous(name = "Family", range = c(0.2, 0.7)) +
theme_classic()
Created on 2021-06-12 by the reprex package (v2.0.0)
We can create range of colours for each Group then match on order of Family. You might need to play around with colours to make the difference more prominent:
cols <- lapply(list(G1 = c("darkblue", "lightblue"),
G2 = c("darkgreen", "lightgreen"),
G3 = c("red4", "red")),
function(i) colorRampPalette(i)(length(unique(table$Family))))
table$col <- mapply(function(g, i) cols[[ g ]][ i ],
g = table$Groups, i = as.numeric(table$Family))
ggplot(table, aes(x = Values, y = Names, fill = col )) +
geom_barh(stat = "identity", colour = "white") +
scale_x_continuous(limits = c(0, 60), expand = c(0, 0)) +
scale_fill_identity() +
theme_minimal()

Plot single bar or rectangle with colors based on group

I am trying to create a single bar or rectangle plot in R with colors based on some groupings and order based on some values. See example below:
For those interested in more detail, this is what I am trying to replicate: http://www.broadinstitute.org/cmap/help_topics_linkified.jsp (lots of examples of this plot at the bottom of the page)
EDIT (based on comments): The y-axis values are ranks that change with the score column. The colors represent a grouping with positive score values in green, negative in red, and black lines for a set of "selected" rows. This is not a stacked bar plot. The values on the y-axis (be it rank or score) are not cumulative and the group region for the "selected" (black) group can be distributed across the other three group regions (as shown in example data below).
Example:
structure(list(group = structure(c(1L, 1L, 4L, 1L, 1L, 3L, 3L,
4L, 3L, 2L, 4L, 2L, 2L), .Label = c("positive", "negative", "null",
"selected"), class = "factor"), rank = c(1, 2, 3, 4, 5, 7.5,
7.5, 7.5, 7.5, 10, 11, 12, 13), xaxis = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1), score = c(0.85, 0.7, 0.55, 0.4, 0.25, 0, 0,
0, 0, -0.5, -0.65, -0.8, -0.95)), .Names = c("group", "rank",
"xaxis", "score"), row.names = c(NA, -13L), class = "data.frame")
group rank xaxis score
1 positive 1.0 1 0.85
2 positive 2.0 1 0.70
3 selected 3.0 1 0.55
4 positive 4.0 1 0.40
5 positive 5.0 1 0.25
6 null 7.5 1 0.00
7 null 7.5 1 0.00
8 selected 7.5 1 0.00
9 null 7.5 1 0.00
10 negative 10.0 1 -0.50
11 selected 11.0 1 -0.65
12 negative 12.0 1 -0.80
13 negative 13.0 1 -0.95
I tried the following but I am looking for a bar or rectangle, not points.
ggplot(df, aes(xaxis,rank,colour=group)) +
geom_point(size=3) +
scale_colour_manual(values=c("positive"="green", "negative"="red", "null"="grey", "selected"="black")) +
theme_bw() + scale_y_reverse() + scale_x_discrete(breaks=NULL)
stacked geom_bar() and geom_rect() don't seem to work with continuous y values.
Any help would be appreciated. Thanks!
UPDATE (using #bjoseph's solution to replicate the exact plot shown in the link above)
df$size = as.factor(1)
df$height = 1
ggplot(df, aes(1,x=size,y=height,fill=group,group=rank)) +
geom_bar(stat='identity') + science_theme +
scale_fill_manual(values=c("positive"="green", "negative"="red", "null"="grey", "selected"="black")) + theme_bw() +
scale_y_reverse(breaks=NULL) + scale_x_discrete(breaks=NULL)
This works
df = structure(list(group = structure(c(1L, 1L, 4L, 1L, 1L, 3L, 3L,
4L, 3L, 2L, 4L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"),
value = 1:13), .Names = c("group", "value"), row.names = c(NA,
-13L), class = "data.frame")
df$size=as.factor(1)
df$height=1
ggplot(df, aes(1,x=size,y=height,fill=group,group=value)) +
geom_bar(stat='identity',color="black") +
theme_bw()
It produces the attached plot.
The color="black" command inside geom_bar produces black outlines around your groups. You can also suppress or manually label the y-axis if you need/want.

Applying scale_fill_gradient in ggplot2 conditionally

I am plotting the following data using geom_tile and geom_textin ggplot2
mydf
Var1 Var2 dc1 bin
1 H G 0.93333333 0
2 G H 0.06666667 1
3 I G 0.80000000 0
4 G I 0.20000000 1
5 J G 0.33333333 1
6 G J 0.66666667 0
7 K G 0.57894737 1
8 G K 0.42105263 0
9 I H 0.80000000 0
10 H I 0.20000000 1
11 J H 0.25000000 0
12 H J 0.75000000 1
13 K H 0.20000000 0
14 H K 0.80000000 1
15 J I 0.12500000 0
16 I J 0.87500000 1
17 K I 0.32000000 0
18 I K 0.68000000 1
19 K J 0.28571429 0
20 J K 0.71428571 1
I am plotting 'Var1' vs 'Var2', and then using the 'bin' variable as my geom_text. Currently, I have filled each tile based upon scale_fill_gradient using the variable 'dc1'.
### Plotting
ggplot(mydf, aes(Var2, Var1, fill = dc1)) +
geom_tile(colour="gray20", size=1.5, family="bold", stat="identity", height=1, width=1) +
geom_text(data=mydf, aes(Var2, Var1, label = bin), color="black", size=rel(4.5)) +
scale_fill_gradient(low = "white", high = "firebrick3", space = "Lab", na.value = "gray20",
guide = "colourbar") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
xlab("") +
ylab("") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="gray20", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text = element_text(color="white", size=rel(1.5)),
panel.background = element_rect(fill="gray20"),
plot.background = element_rect(fill="gray20"),
legend.position = "none"
)
Which gives this:
What I am trying to do (unsuccessfully) is to make the fill conditional upon the 'bin' variable. If bin==1then I would like to fill according to 'dc1'. If bin==0 then I would like to fill with 'white'.
This would give the following which I have manually created as an example desired plot:
I tried messing around with scale_fill_gradient to try and introduce a second fill option, but cannot seem to figure this out. Thanks for any help/pointers.
This is the dput for mydf:
structure(list(Var1 = structure(c(4L, 5L, 3L, 5L, 2L, 5L, 1L,
5L, 3L, 4L, 2L, 4L, 1L, 4L, 2L, 3L, 1L, 3L, 1L, 2L), .Label = c("K",
"J", "I", "H", "G"), class = "factor"), Var2 = structure(c(1L,
2L, 1L, 3L, 1L, 4L, 1L, 5L, 2L, 3L, 2L, 4L, 2L, 5L, 3L, 4L, 3L,
5L, 4L, 5L), .Label = c("G", "H", "I", "J", "K"), class = "factor"),
dc1 = c(0.933333333333333, 0.0666666666666667, 0.8, 0.2,
0.333333333333333, 0.666666666666667, 0.578947368421053,
0.421052631578947, 0.8, 0.2, 0.25, 0.75, 0.2, 0.8, 0.125,
0.875, 0.32, 0.68, 0.285714285714286, 0.714285714285714),
bin = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1)), .Names = c("Var1", "Var2", "dc1", "bin"), row.names = c(NA,
-20L), class = "data.frame")
Perhaps replace fill = dc1 with fill = dc1 * bin? A stripped-down version of your code:
ggplot(data = mydf, aes(x = Var2, y = Var1, fill = dc1 * bin, label = bin)) +
geom_tile() +
geom_text() +
scale_fill_gradient(low = "white", high = "firebrick3")

Displaying multiple boxplots per group in R

I have data of the form:
Day A B
1 1 4
1 2 5
1 3 6
2 2 2
2 3 4
2 5 6
3 6 7
3 4 6
And I would like to display this on a single chart, with Day along the x-axis, and with each x-position having a boxplot for each of A and B (colour coded).
Here's a (slight) modification of an example form the ?boxplot help page. The examples show off many common uses of the functions.
tg <- data.frame(
dose=ToothGrowth$dose[1:30],
A=ToothGrowth$len[1:30],
B=ToothGrowth$len[31:60]
)
head(tg)
# dose A B
# 1 0.5 4.2 15.2
# 2 0.5 11.5 21.5
# 3 0.5 7.3 17.6
# 4 0.5 5.8 9.7
# 5 0.5 6.4 14.5
# 6 0.5 10.0 10.0
boxplot(A ~ dose, data = tg,
boxwex = 0.25, at = 1:3 - 0.2,
col = "yellow",
main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length",
xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i")
boxplot(B ~ dose, data = tg, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
col = "orange")
legend(2, 9, c("A", "B"),
fill = c("yellow", "orange"))
Try:
ddf = structure(list(Day = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), A = c(1L,
2L, 3L, 2L, 3L, 5L, 6L, 4L), B = c(4L, 5L, 6L, 2L, 4L, 6L, 7L,
6L)), .Names = c("Day", "A", "B"), class = "data.frame", row.names = c(NA,
-8L))
mm = melt(ddf, id='Day')
ggplot(mm)+geom_boxplot(aes(x=factor(Day), y=value, fill=variable))

Why ggplot2 pie-chart facet confuses the facet labelling

I have two types of data that looks like this:
Type 1 (http://dpaste.com/1697615/plain/)
Cluster-6 abTcells 1456.74119
Cluster-6 Macrophages 5656.38478
Cluster-6 Monocytes 4415.69078
Cluster-6 StemCells 1752.11026
Cluster-6 Bcells 1869.37056
Cluster-6 gdTCells 1511.35291
Cluster-6 NKCells 1412.61504
Cluster-6 DendriticCells 3326.87741
Cluster-6 StromalCells 2008.20603
Cluster-6 Neutrophils 12867.50224
Cluster-3 abTcells 471.67118
Cluster-3 Macrophages 1000.98164
Cluster-3 Monocytes 712.92273
Cluster-3 StemCells 557.88648
Cluster-3 Bcells 599.94109
Cluster-3 gdTCells 492.61994
Cluster-3 NKCells 524.42522
Cluster-3 DendriticCells 647.28811
Cluster-3 StromalCells 876.27875
Cluster-3 Neutrophils 1025.24105
And type two, (http://dpaste.com/1697602/plain/).
These values are identical with Cluster-6 in type 1 above:
abTcells 1456.74119
Macrophages 5656.38478
Monocytes 4415.69078
StemCells 1752.11026
Bcells 1869.37056
gdTCells 1511.35291
NKCells 1412.61504
DendriticCells 3326.87741
StromalCells 2008.20603
Neutrophils 12867.50224
But why when dealing with type 1 data with this code:
library(ggplot2);
library(RColorBrewer);
filcol <- brewer.pal(10, "Set3")
dat <- read.table("http://dpaste.com/1697615/plain/")
ggplot(dat,aes(x=factor(1),y=dat$V3,fill=dat$V2))+
facet_wrap(~V1)+
xlab("") +
ylab("") +
geom_bar(width=1,stat="identity",position = "fill") +
scale_fill_manual(values = filcol,guide = guide_legend(title = "")) +
coord_polar(theta="y")+
theme(strip.text.x = element_text(size = 8, colour = "black", angle = 0))
Ready data:
> dput(dat)
structure(list(V1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Cluster-3",
"Cluster-6"), class = "factor"), V2 = structure(c(1L, 5L, 6L,
9L, 2L, 4L, 8L, 3L, 10L, 7L, 1L, 5L, 6L, 9L, 2L, 4L, 8L, 3L,
10L, 7L), .Label = c("abTcells", "Bcells", "DendriticCells",
"gdTCells", "Macrophages", "Monocytes", "Neutrophils", "NKCells",
"StemCells", "StromalCells"), class = "factor"), V3 = c(1456.74119,
5656.38478, 4415.69078, 1752.11026, 1869.37056, 1511.35291, 1412.61504,
3326.87741, 2008.20603, 12867.50224, 471.67118, 1000.98164, 712.92273,
557.88648, 599.94109, 492.61994, 524.42522, 647.28811, 876.27875,
1025.24105)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-20L))
Generated this following figures:
Notice that the Facet label is misplaced, Cluster-3 should be Cluster-6,
where Neutrophils takes larger proportions.
How can I resolve the problem?
When dealing with type 2 data have no problem at all.
library(ggplot2)
df <- read.table("http://dpaste.com/1697602/plain/");
library(RColorBrewer);
filcol <- brewer.pal(10, "Set3")
ggplot(df,aes(x=factor(1),y=V2,fill=V1))+
geom_bar(width=1,stat="identity")+coord_polar(theta="y")+
theme(axis.title = element_blank())+
scale_fill_manual(values = filcol,guide = guide_legend(title = "")) +
theme(strip.text.x = element_text(size = 8, colour = "black", angle = 0))
Ready data:
> dput(df)
structure(list(V1 = structure(c(1L, 5L, 6L, 9L, 2L, 4L, 8L, 3L,
10L, 7L), .Label = c("abTcells", "Bcells", "DendriticCells",
"gdTCells", "Macrophages", "Monocytes", "Neutrophils", "NKCells",
"StemCells", "StromalCells"), class = "factor"), V2 = c(1456.74119,
5656.38478, 4415.69078, 1752.11026, 1869.37056, 1511.35291, 1412.61504,
3326.87741, 2008.20603, 12867.50224)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-10L))
It's because you use the data frame name in aes(...). This fixes the problem.
ggplot(dat,aes(x=factor(1),y=V3,fill=V2))+
facet_wrap(~V1)+
xlab("") +
ylab("") +
geom_bar(width=1,stat="identity",position = "fill") +
scale_fill_manual(values = filcol,guide = guide_legend(title = "")) +
coord_polar(theta="y")+
theme(strip.text.x = element_text(size = 8, colour = "black", angle = 0))
In defining the facets, you reference V1 in the context of the default dataset, and ggplot sorts alphabetically by level (so "Cluster-3" comes first). In your call to aes(...) you reference dat$V3 directly, so ggplot goes out of the context of the default dataset to the original dataframe. There, Cluster-6 is first.
As a general comment, one should never reference data in aes(...) outside the context of the dataset defined with data=.... So:
ggplot(data=dat, aes(y=V3...)) # good
ggplot(data=dat, aes(y=dat$V3...)) # bad
Your problem is a perfect example of why the second option is bad.

Resources