ggplot: filling color based on condition - r

I want to plot two categorical variables (group, condition) and one numeric variable (value). In addition, I want to base the filling color on the significance of the values (significant bars should be grey, the rest white). With the following code, however, only some significant bars are colored in grey.
plot <- ggplot(dat, aes(group, value))+
geom_col(aes(fill = condition), position = position_dodge(0.8), width = .7, color= "black") +
scale_fill_manual(values = ifelse(dat$significance > .05, "white", "grey")) +
geom_linerange(aes(group = condition, ymin = ci_lower, ymax= ci_upper), position = position_dodge(0.8)) +
coord_flip(ylim =c(-.2,1))
plot
here is my data:
dat <- structure(list(group = c("friends", "parent", "esm", "friends", "parent", "esm"),
value = c(0.25, 0.44, 0.33, 0.47, 0.25, 0.32),
significance = c(0.08, 0, 0, 0, 0.01, 0),
condition = c("S1", "S1", "S1", "S2", "S2", "S2"),
trait = c("E", "E", "E", "E", "E", "E"),
ci_lower = c(0.52, 0.74, 0.53, 0.67, 0.44, 0.49),
ci_upper = c(-0.03, 0.14, 0.14, 0.27, 0.06, 0.15)),
row.names = c(1L,2L, 3L, 16L, 17L, 18L), class = "data.frame")

You can add an inline mutate to create a column to specify the color group based on significance. The key here is to use the group aesthetic so the bars can still be dodged and positioned correctly based on the condition variable.
dat %>%
mutate(sig = significance < .05) %>%
ggplot(aes(group, value, group = condition)) +
geom_col(
aes(fill = sig),
position = position_dodge(0.8),
color = "black",
width = .7
) +
scale_fill_manual(values = c("white", "grey")) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(0.8)) +
coord_flip(ylim = c(-.2, 1))
Gives this plot:
However, I think you need another aesthetic to distinguish condition in addition to significance. Color is one option, but this is a nice place to use ggpattern which will be more obvious than the outline color and keep the B&W look.
Here's an example:
library(ggpattern)
dat %>%
mutate(sig = significance > .05) %>%
ggplot(aes(group, value, group = condition)) +
geom_col_pattern(
aes(fill = sig, pattern_angle = condition),
position = position_dodge(0.8),
pattern_fill = "black",
pattern_spacing = 0.025,
pattern = "stripe",
width = .7,
color = "black"
) +
scale_pattern_angle_discrete(range = c(45, 135)) +
scale_fill_manual(values = c("grey", "white")) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(0.8)) +
coord_flip(ylim = c(-.2, 1))
Which gives this plot:
Finally, it's worth noting that the color of a bar is not usually used to denote significance of a statistical metric; a much more common convention would be to use asterisk to indicate relevant p value thresholds (e.g. ** p < 0.01) or letters to indicate membership in a grouped analysis such as an ANOVA. These can be easily implemented using the ggpubr package. That would leave fill color free to indicate the grouping by condition.

It can also be useful:
library(ggplot2)
#Code
ggplot(dat, aes(group, value))+
geom_col(aes(fill = interaction(condition,significance > .05)),
position = position_dodge(0.8), width = .7, color= "black") +
scale_fill_manual(values = c("grey","grey","white"),
breaks = c('S2.FALSE','S1.TRUE'),
labels=c('S2','S1')) +
geom_linerange(aes(group = condition, ymin = ci_lower, ymax= ci_upper), position = position_dodge(0.8)) +
coord_flip(ylim =c(-.2,1))+
labs(fill='Var')
Output:

Related

Rearranging trendline colors in ggplot

I created a plot that turned out mostly how I'd like it in ggplot but I need the lines to appear in a slightly different color arrangement.
Basically, I need all "mean" lines to appear in blue and all "odd" lines to appear in red. Pref 1 will appear in either the lighter or darker shade and vice versa. As you can see ggplot has not quite done that.
p2 <- ggplot(asd_pref_plot_groups, aes(x, pref_plot_groups$predicted, col = combined)) +
geom_line(size=1.5) +
scale_color_manual(values = c("blue","deepskyblue","red","pink")) +
geom_ribbon(aes(ymin=conf.low,ymax=conf.high, fill=combined),alpha=.2,colour=NA) +
scale_fill_manual(values = c("blue","deepskyblue","red","pink")) +
geom_point(data=summStats,aes(trial,mean,col = combined),size=2) +
scale_color_manual(values = c("blue","deepskyblue","red","pink")) +
theme_bw() +
xlab('Trial') +
ylab('Prediction Error') +
ggtitle('ASD learning about TD vs. ASD \n learning about ASD') +
theme(text=element_text(size=20),
plot.title = element_text(hjust = 0.5),
panel.border = element_blank())
Above is my code. I thought I could shift around scale_color_manual as needed but it doesn't seem to work? Is there an easy fix or does this extend to my data frames. Thank you
Your question didn't include any example data, so I have had to try to recreate your data set (see footnote)
To ensure we are on the right track, I will use exactly your plotting code to get a very similar plot:
ggplot(asd_pref_plot_groups, aes(x, pref_plot_groups$predicted, col = combined)) +
geom_line(size=1.5) +
scale_color_manual(values = c("blue","deepskyblue","red","pink")) +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = combined),
alpha = 0.2, colour = NA) +
scale_fill_manual(values = c("blue","deepskyblue","red","pink")) +
geom_point(data = summStats, aes(trial, mean,col = combined), size = 2) +
scale_color_manual(values = c("blue","deepskyblue","red","pink")) +
theme_bw() +
xlab('Trial') +
ylab('Prediction Error') +
ggtitle('ASD learning about TD vs. ASD \n learning about ASD') +
theme(text=element_text(size=20),
plot.title = element_text(hjust = 0.5),
panel.border = element_blank())
All we need to do here is to remove one of your redundant scale_color_manual calls (you currently have 2), and change the ordering of the colors in both the fill and color scales:
ggplot(asd_pref_plot_groups, aes(x, pref_plot_groups$predicted,
col = combined, fill = combined)) +
geom_line(size = 1.5) +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high),
alpha = 0.2, colour = NA) +
scale_fill_manual(values = c("blue","red", "deepskyblue", "pink")) +
scale_color_manual(values = c("blue","red","deepskyblue", "pink")) +
geom_point(data = summStats, aes(trial, mean,col = combined), size = 2) +
theme_bw() +
xlab('Trial') +
ylab('Prediction Error') +
ggtitle('ASD learning about TD vs. ASD \n learning about ASD') +
theme(text=element_text(size=20),
plot.title = element_text(hjust = 0.5),
panel.border = element_blank())
Footnote: Reproducible data to approximate data in question
set.seed(1)
asd_pref_plot_groups <- data.frame(x = rep(c(1, 60), 4),
combined = rep(c('pref1_mean', 'pref1_odd',
'pref2_mean', 'pref2_odd'),
each = 2),
predicted = c(1.3, 1.3, 1.45, 1.3,
2, 1.75, 2.05, 1.77),
conf.high = c(1.35, 1.35, 1.5, 1.35,
2.05, 1.8, 2.1, 1.82),
conf.low = c(1.25, 1.25, 1.4, 1.25,
1.95, 1.7, 2, 1.72))
pref_plot_groups <- asd_pref_plot_groups
summStats <- data.frame(trial = rep(1:60, 4),
combined = rep(c('pref1_mean', 'pref1_odd',
'pref2_mean', 'pref2_odd'),
each = 60),
mean = c(rnorm(60, seq(1.3, 1.3, length = 60), 0.05),
rnorm(60, seq(1.45, 1.3, length = 60), 0.05),
rnorm(60, seq(2, 1.75, length = 60), 0.05),
rnorm(60, seq(2.05, 1.77, length = 60), 0.05)))

Geom_bar_pattern not treating x-axis categories as different

take the following data
df <- data.frame(replicate(2,sample(0:1,30,rep=TRUE)))
df <- reshape(data=df, varying=list(1:2),
direction="long",
times = names(df),
timevar="Type",
v.names="Score")
plotted like this:
plot <- ggbarplot(df, x = "Type", y = "Score",
color = "black", fill = "Type", add = "mean_ci")
And I want to add stripes only to X1
plot +
geom_bar_pattern(stat = "summary", fun = "mean", position="dodge", color="black", width=1,pattern_angle = 45, pattern_density = 0.4,pattern_spacing = 0.025, pattern_key_scale_factor = 0.6) +
scale_pattern_manual(values = c(X1 = "stripe", X2 = "none"))
However stripes are added to both x-axis categories (scale_pattern_manual does not work?)
Any help is much appreciated.
You could build your error bars with stat_summary instead of using ggpubr::ggbarplot, then you would get this:
library(ggplot2)
library(ggpattern)
df <- data.frame(replicate(2,sample(0:1,30,rep=TRUE)))
df <- reshape(data=df, varying=list(1:2),
direction="long",
times = names(df),
timevar="Type",
v.names="Score")
ggplot(df, aes(x = Type, y = Score, pattern=Type,
color = "black", fill = Type)) +
geom_bar_pattern(stat = "summary",
fun = "mean",
position="dodge",
color="black",
width=1, pattern_angle = 45,
pattern_density = 0.4, pattern_spacing = 0.025,
pattern_key_scale_factor = 0.6) +
scale_pattern_manual(values = c("stripe", "none")) +
stat_summary(fun.data=mean_cl_normal, geom="errorbar", col="black", width=.1)
Created on 2021-05-19 by the reprex package (v2.0.0)
As far as I know scale_pattern_manual will not work in this setting.
To avoid that stripes are added to both cols add aes(pattern = Type) to geom_bar_pattern.
See Gallery of ggpattern package
plot <- ggbarplot(df, x = "Type", y = "Score",
color = "black", fill = "Type", add = "mean_ci")
plot +
geom_bar_pattern(
stat = "summary",
fun = "mean",
position="dodge",
color="white",
width=0.7,
pattern_angle = 45,
pattern_density = 1,
pattern_spacing = 0.025,
pattern_key_scale_factor = 0.8,
aes(pattern = Type))

Function scale_size_manual() isn't affecting the size of the points when using ggplot() in R

I'm using the function scale_size_manual() for the first time. I'm trying to decrease the size of the points using the script below:
p2<-ggplot(data = dfnew, aes(x = Area, y = Proportion, group=linegroup)) +
geom_point(aes(shape = as.character(Collar)), size = 6, stroke = 0,
position = myjit)+
geom_line(aes(group = linegroup),linetype = "dotted",size=1, position = myjit) +
theme(axis.text=element_text(size=15),
axis.title=element_text(size=20)) +
geom_errorbar(aes(ymin = Lower, ymax = Upper), width=0.3, size=1,
position = myjit) + scale_shape_manual(values=c("41361´"=19,"41365´"=17)) + scale_size_manual(values=c(2,2)) +
scale_color_manual(values = c("SNP" = "black",
"LGCA" = "black")) + labs(shape="Collar ID") + ylim(0.05, 0.4)
However, the size of the points doesn't regardless of the number entered. I've seen other internet posts implementing this function in the same way, so I wondered if somebody could set me on the right track?
Thanks in advance!
P.S. My data:
> dput(dfnew)
structure(list(Proportion = c(0.181, 0.289, 0.099, 0.224), Lower = c(0.148,
0.242, 0.096, 0.217), Upper = c(0.219, 0.341, 0.104, 0.232),
Area = c("LGCA", "SNP", "LGCA", "SNP"), Collar = c("41361´",
"41361´", "41365´", "41365´"), ymin = c(0.033, 0.047, 0.003,
0.00700000000000001), ymax = c(0.4, 0.63, 0.203, 0.456),
linegroup = c("LGCA 41361´", "SNP 41361´", "LGCA 41365´",
"SNP 41365´")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
You're setting the size of your points in the geom_point layer by using the argument size.
geom_point(aes(shape = as.character(Collar)), size = 6, stroke = 0,
position = myjit)+
The only way to decrease the size of the points in your case is by decreasing size, e.g. to 4.
geom_point(aes(shape = as.character(Collar)), size = 4, stroke = 0,
position = myjit)+
You can only use scale_size_manual if the size is mapped within aes using a variable within data, like in the example below.
ggplot(data = dfnew, aes(x = Area, y = Proportion, group = linegroup, size = Area))

Repel geom label and text in ggplot. And ordering geom points based on size

I have 2 data frames such as these:
df1 <- data.frame(
party = c("Blue Party", "Red Party"),
dim1 = c(0.03, -0.04),
dim2 = c(-0.05, 0.02),
sz = c(34, 42)
)
df2 <- data.frame(
var = c("Economic", "Gov trust", "Inst trust", "Nationalism", "Religiosity"),
dim1 = c(0.1, -0.5, 0, 0.6, 0.4),
dim2 = c(0.1, 0.6, 0, 0, 0.3)
)
I want to plot the parties from df1 as points defined by size and include arrows based on df2 on the same graph. I've used ggplot to do this:
ggplot(df1, aes(x = dim1, y = dim2, color = party)) +
geom_point(size = df1$sz) +
scale_size_area() +
scale_x_continuous(limits = c(-1.5, 1.5)) +
scale_y_continuous(limits = c(-1.5, 1.5)) +
geom_label_repel(aes(label = party),
box.padding = 1,
point.padding = 1.5,
force = 1) +
geom_segment(aes(xend=0, yend=0, x=dim1, y=dim2), data=df2,
arrow=arrow(length=unit(0.20,"cm"), ends="first", type = "closed"), color="black") +
geom_text_repel(aes(x=dim1, y=dim2, label=var),
data = df2, color = "black", size = 3, force = 1)
Resulting in this:
The functions geom_label_repel and geom_text_repel prevent the party labels and the texts from overlapping, but how can I repel the labels and texts from each other?
My second problem is that I want to order the points, with the smallest in the front and the largest at the back. How could this be done?
Appreciate the help!

How to avoid overlapping plots in ggplot2

I want to plot estimates for three age groups (agecat) by two exposures (expo). The code below produced overlapped plots with alphabetically rearranged age groups. How could I avoid overlap of the plots and plot maintain the existing order of the age groups?
I used this code:
ggplot(mydf, aes(x = agecat, y = est,ymin = lcl, ymax = ucl, group=agecat,color=agecat,shape=agecat)) +
geom_point(position="dodge",size = 4) +
geom_linerange(position="dodge",size =0.7) +
geom_hline(aes(yintercept = 0)) +
labs(colour="Age Group", shape="Age Group") + theme(axis.title=element_text(face="bold",size="12"),axis.text=element_text(size=12,face="bold"))
Sample data:
> dput(mydf)
structure(list(expo = c(0, 1, 0, 1, 0, 1), est = c(0.290780632898979,
0.208093573361601, 0.140524761247529, 0.156713614649751, 0.444402395010579,
0.711469870845916), lcl = c(0.0679784035303221, -0.00413163014975071,
-0.208866152400888, -0.175393089838871, -0.227660022186016, 0.0755871550441212
), ucl = c(0.514078933380535, 0.420769190852455, 0.491138970050864,
0.489925205664665, 1.12099179726843, 1.35139300089608), agecat = c("young",
"young", "middle", "middle", "old", "old")), .Names = c("expo",
"est", "lcl", "ucl", "agecat"), row.names = c(2L, 4L, 6L, 8L,
10L, 12L), class = "data.frame")
I would do this by using expo as a variable in the plot. This would let ggplot know that you have overlap and so you need dodging at each level of your x variable. Once you do this, you can use position = position_dodge() directly in the two geoms and set the width argument to whatever you'd like. See the help page for position_dodge for examples of when you need to set width explicitly.
Here I'll replace group = agecat with group = expo. Using group instead of an aesthetic like shape means that there is no indication which point represents which expo level on the graphic.
mydf$agecat = factor(mydf$agecat, levels = c("young", "middle", "old"))
ggplot(mydf, aes(x = agecat, y = est, ymin = lcl, ymax = ucl, group = expo, color = agecat, shape = agecat)) +
geom_point(position = position_dodge(width = .5), size = 4) +
geom_linerange(position = position_dodge(width = .5), size = 0.7) +
geom_hline(aes(yintercept = 0)) +
labs(colour="Age Group", shape="Age Group") +
theme(axis.title = element_text(face="bold", size="12"),
axis.text = element_text(size=12, face="bold"))
You can convert the column agecat to factor with the levels in the desired order. Then, as Heroka pointed out in the comments, we can achieve a similar effect using facet_wrap:
mydf$agecat <- factor(mydf$agecat, levels=c("young", "middle", "old"))
ggplot(mydf, aes(x = agecat, y = est, ymin = lcl, ymax = ucl, group=agecat,color=agecat, shape=agecat)) +
geom_linerange(size =0.7) +
geom_hline(aes(yintercept = 0)) + labs(colour="Age Group", shape="Age Group") +
facet_wrap(agecat~est, scales="free_x", ncol=6) + geom_point(size = 4)+ theme(axis.title=element_text(face="bold",size="12"),axis.text=element_text(size=12,face="bold"),strip.text.x = element_blank())

Resources