Re-create graph design - r

[Update]:I found a great graph design for bar charts that I'd like to recreate in R, but I'm having difficulty with some of the major elements (it's from 538). Below is a picture of the graph and my progress so far.
Here's the graph I'm trying to recreate
Here's my code:
convicted <- c(0.68, 0.33)
incarcertated <- c(0.48, 0.12)
group <- c("GENERAL POPULATION", "LAW ENFORCEMENT")
df <- data.frame(convicted, incarcertated, group)
mdf <- melt(df)
ggplot(mdf) +
geom_bar(aes(x=variable, y=1), stat="identity", alpha=.1, position=position_dodge(1)) +
geom_bar(aes(x=variable, y=value, fill=group), stat="identity", position=position_dodge(1)) +
scale_fill_manual(values=c("#058cd3", "#ff2700"))
Here's what I'm not sure how to do
Get the "group" label to sit on top of each group and separate
them *(key design element)
Create a title and gray subheader
get the color gray bars to separate the same distance as the colored
bars
get the value labels to dodge with bar charts
I will add that in my ideal recreation, the colors would be separated (so incarcerated with be the same color in both groups).
Would love help re-creating this chat as precisely as possible. I'm pretty sure this was created in R, so I'm know it can be done. Thanks for the help!
[Update]: thanks to the help of hfty I'm getting very close, but i get a weird border effect, which I couldn't upload to the comment sections, so i've done it here. What's going on with this?

This was most likely not created solely with R. If it was, it probably was subsequently edited in Illustrator or something similar. However, here are some ways ggplot2 can get you close to the desired result:
Get the "group" label to sit on top of each group and separate them *(key design element)
Using a combination of facet_wrap() to separate the plots and coord_flip() to flip it should get you there.
ggplot(mdf, aes(x=variable, y=value, fill=group)) +
facet_wrap(~group, ncol=1) +
geom_bar(stat="identity", position=position_dodge(1)) +
coord_flip() + ...
Create a title and gray subheader
No easy way to do this with ggplot. I would suggest editing it later, e.g. with Illustrator. However, you can add a bold title e.g. like this:
... + ggtitle(expression(atop(bold("What Percentage of Crimininal Defendants Are\nConvicted and Incarcerated?")))) + ...
get the color gray bars to separate the same distance as the colored bars
You were almost there:
... + geom_bar(aes(x=variable, y=1), stat="identity", alpha=.1,
position=position_dodge(1), fill = "#aaaaaa") + ...
get the value labels to dodge with bar charts
Putting it all together with a few other tweaks, like using ggthemr to clean up the default style:
# devtools::install_github('ggthemr', 'cttobin') # Install ggthemr
library(ggthemr)
ggthemr('fresh')
ggplot(mdf, aes(x=variable, y=value, fill=group)) + facet_wrap(~group, ncol=1) +
geom_bar(stat="identity", position=position_dodge(1)) +
geom_bar(aes(x=variable, y=1), stat="identity", alpha=.1, position=position_dodge(1), fill = "#aaaaaa") +
geom_text(aes(label=round(100*value)), hjust=-0.5) +
scale_fill_manual(values=c("#058cd3", "#ff2700")) +
theme(strip.text.x = element_text(hjust=-0.15),
axis.text.x = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.title = element_blank(),
axis.title.y=element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none"
) +
coord_flip() +
ggtitle(expression(atop(bold("What Percentage of Crimininal Defendants Are\nConvicted and Incarcerated?")))) +
theme(plot.title = element_text(size = 20, hjust=-0.4, vjust=0.2))

Related

ggplot2 geom_jitterdodge points and with overlayed dodged boxplots: I want to preserve color in points but force boxplots to be black

Using ggplot2, I want to geom_jitterdodge a swarm of points with overlayed dodged boxplots. The trick is that I want the boxplots to be black, not colored like the points. The point plot looks like this:
It's easy enough to get boxplots in place:
The code for that looks like this:
D_cohort1 %>%
filter(!is.na(pssa_ela_code)) %>%
ggplot(aes(x=timepoint,
y=dibels_lnf,
color=pssa_ela_code)) +
geom_point(alpha=1/6, size=2, width=1/3, height=0,
position=position_jitterdodge()) +
geom_boxplot(fill=NA, outlier.shape=NA,
position=position_dodge2(padding=.3)) +
facet_grid(rows=vars(school_type)) +
guides(colour = guide_legend(override.aes = list(alpha=1))) +
labs(title="Figure A.1: DIBELS LNF Scores at each Timepoint") +
theme_cowplot() +
theme(plot.background=element_rect(fill="aliceblue"),
panel.border=element_rect(color="black", fill=NA),
legend.position = c(.85,.87),
legend.text = element_text(size = rel(.7)))
For visibilities sake, I want the boxplot lines to be black, but I can't quite figure out how to get there. Closest I've come is this (same as before but for the call to geom_boxplot():
D_cohort1 %>%
filter(!is.na(pssa_ela_code)) %>%
ggplot(aes(x=timepoint,
y=dibels_lnf,
color=pssa_ela_code)) +
geom_point(alpha=1/6, size=2, width=1/3, height=0,
position=position_jitterdodge()) +
geom_boxplot(aes(color=NULL, group=fct_cross(timepoint, pssa_ela_code)),
fill=NA, outlier.shape=NA,
position=position_dodge2(padding=.3)) +
facet_grid(rows=vars(school_type)) +
guides(colour = guide_legend(override.aes = list(alpha=1))) +
labs(title="Figure A.1: DIBELS LNF Scores at each Timepoint") +
theme_cowplot() +
theme(plot.background=element_rect(fill="aliceblue"),
panel.border=element_rect(color="black", fill=NA),
legend.position = c(.85,.87),
legend.text = element_text(size = rel(.7)))
That gets the color effect I want, but positions the boxplots incorrectly. Shown here:
How can I achieve the effect I want: correctly positioned, black boxplots over colored points?
Ok. I slept on it and was able to come up with a solution this morning. The effect I want is shown below. The code used to get there is this:
D_cohort1 %>%
filter(!is.na(pssa_ela_code)) %>%
ggplot(aes(x=timepoint,
y=dibels_lnf,
color=pssa_ela_code)) +
geom_point(alpha=1/6, size=2, width=1/3, height=0,
position=position_jitterdodge()) +
geom_boxplot(aes(color=NULL, fill=pssa_ela_code),
outlier.shape=NA, alpha=0,
position=position_dodge2(padding=.3)) +
facet_grid(rows=vars(school_type)) +
guides(colour = guide_legend(override.aes = list(alpha=1))) +
labs(title="Figure A.1: DIBELS LNF Scores at each Timepoint") +
theme_cowplot() +
theme(plot.background=element_rect(fill="aliceblue"),
panel.border=element_rect(color="black", fill=NA),
legend.position = c(.85,.87),
legend.text = element_text(size = rel(.7)))
It's the same as before but for the call to geom_boxplot(). It took over-riding the color aesthetic and setting fill. Then, alpha=0 makes the fill fully transparent, which is what I want.

changing axis size in ggplot

I'm working on a plot where I would like to change the axis thickness to match the boarder of the facet labels. Somehow axis.line = element_line(color="black", size=0.5) doesn't work - any ideas why?
This is my code...
ggplot(datgg_final, aes(y = total_GLS, x = timing)) +
geom_boxplot(aes(fill = genotype)) +
facet_grid(col=vars(genotype)) +
theme(legend.position = "none") +
scale_fill_manual(values=c("#0496FF", "#53A548")) +
ggtitle("Effect of Timing") +
xlab("Days since Defence Induction") +
ylab("Total Glucosinolates (µmol g^-1 DW)") +
theme(strip.background = element_rect(color = "black", fill ="white", size=0.5, linetype="solid"),
axis.line = element_line(color="black", size=0.5))
... and the plot:
enter image description here
Even in most basic plots I cannot change any axis settings (except the linetype), this code just shows the normal boxplot, no red axes, no change in line size:
ggplot(datgg_final, aes(y=total_GLS, x=timing)) +
geom_boxplot() +
theme(axis.line=element_line(size=0.5, color="red"))
Fortunately, this seems to be a simple clipping issue. Unfortunately, this can't be adressed with the normal ggplot interface (as far as I know), but you could mess around in the gtable to produce the plot you want.
Consider the following plot:
library(ggplot2)
library(grid)
g <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point() +
facet_grid(~ Species) +
theme(strip.background.x = element_rect(colour = "black", fill = "white",
size = 0.5, linetype = "solid"),
axis.line = element_line(colour = "black", size = 0.5))
g
You can see that the apparent linewidths of the facet strips and the axes are unequal. We can turn of the clipping by messing around in the gtable:
# Convert plot to gtable
gt <- ggplotGrob(g)
# Find the strips
is_strip <- grep("strip", gt$layout$name)
# Turn off clipping at highest level
gt$layout$clip[is_strip] <- "off"
# Turn off clipping at the strip level
gt$grobs[is_strip] <- lapply(gt$grobs[is_strip], function(strip) {
strip$layout$clip <- "off"
strip
})
# Plot
grid.newpage(); grid.draw(gt)
Now the apparent linewidths are the intended linewidths, but it took quite some extra steps to get there. If somebody has a more elegant solution, be welcome to post an alternative.

Displaying multiple factors with Sina plots

NOTE: I have updated this post following discussion with Z. Lin. Originally, I had simplified my problem to a two factor design (see section "Original question"). However, my actual data consists of four factors, requiring facet_grid. I am therefore providing an example for a four factor design further below (see section "Edit").
Original question
Let's assume I have a two factor design with dv as my dependent variable and iv.x and iv.y as my factors/independent variables. Some quick sample data:
DF <- data.frame(dv = rnorm(900),
iv.x = sort(rep(letters[1:3], 300)),
iv.y = rep(sort(rep(rev(letters)[1:3], 100)), 3))
My goal is to display each condition separately as can nicely be done with violin plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_violin()
I have recently come across Sina plots and would like to do the same here. Unfortunately Sina plots don't do this, collapsing the data instead.
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina()
An explicit call to position dodge doesn't help either, as this produces an error message:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina(position = position_dodge(width = 0.5))
The authors of Sina plots have already been made aware of this issue in 2016:
https://github.com/thomasp85/ggforce/issues/47
My problem is more in terms of time. We soon want to submit a manuscript and Sina plots would be a great way to display our data. Can anyone think of a workaround for Sina plots such that I can still display two factors as in the example with violin plots above?
Edit
Sample data for a four factor design:
DF <- data.frame(dv=rnorm(400),
iv.w=sort(rep(letters[1:2],200)),
iv.x=rep(sort(rep(letters[3:4],100)), 2),
iv.y=rep(sort(rep(rev(letters)[1:2],50)),4),
iv.z=rep(sort(rep(letters[5:6],25)),8))
An example with violin plots of what I would like to create using Sina plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_violin(aes(y = dv, fill = iv.y),
position = position_dodge(width = 1))+
stat_summary(aes(y = dv, fill = iv.y), fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(aes(y = dv, fill = iv.y), fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2, show.legend = FALSE,
colour="black", size=.2)
Edited solution, since OP clarified that facets are required:
ggplot(DF, aes(x = interaction(iv.y, iv.x),
y = dv, fill = iv.y, colour = iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_sina() +
stat_summary(fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2,
show.legend = FALSE,
colour="black", size=.2) +
scale_x_discrete(name = "iv.x",
labels = c("c", "", "d", "")) +
theme(panel.grid.major.x = element_blank(),
axis.text.x = element_text(hjust = -4),
axis.ticks.x = element_blank())
Instead of using facets to simulate dodging between colours, this approach creates a new variable interaction(colour.variable, x.variable) to be mapped to the x-axis.
The rest of the code in scale_x_discrete() & theme() are there to hide the default x-axis labels / ticks / grid lines.
axis.text.x = element_text(hjust = -4) is a hack that shifts x-axis labels to approximately the right position. It's ugly, but considering the use case is for a manuscript submission, I assume the size of plots will be fixed, and you just need to tweak it once.
Original solution:
Assuming your plots don't otherwise require facetting, you can simulate the appearance with facets:
ggplot(DF, aes(x = iv.y, y = dv, colour = iv.y)) +
geom_sina() +
facet_grid(~iv.x, switch = "x") +
labs(x = "iv.x") +
theme(axis.text.x = element_blank(), # hide iv.y labels
axis.ticks.x = element_blank(), # hide iv.y ticks
strip.background = element_blank(), # make facet strip background transparent
panel.spacing.x = unit(0, "mm")) # remove horizontal space between facets

How to make box plots within the same column to represent the soil column

I am trying to demonstrate the soil type (soil column) at different depths in the ground using box plots. However, as the sampling interval is not consistent, there are also gaps in between the samples.
My questions are as follows:
Is it possible to put the box plots within the same column? i.e. all box plots in 1 straight column
Is it possible to remove the x-axis labels and ticks when using ggdraw? I tried to remove it when using plot, but appears again when I use ggdraw.
My code looks like this:
SampleID <- c("Rep-1", "Rep-2", "Rep-3", "Rep-4")
From <- c(0,2,4,9)
To <- c(1,4,8,10)
Mid <- (From+To)/2
ImaginaryVal <- c(1,1,1,1)
Soiltype <- c("organic", "silt","clay", "sand")
df <- data.frame(SampleID, From, To, Mid, ImaginaryVal, Soiltype)
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype,
middle=`Mid`, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity") + scale_y_reverse(breaks = seq(0,10,0.5)) + xlab('Soiltype') + ylab('Depth (m)') + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
ggdraw(switch_axis_position(plot + theme_bw(8), axis = 'x'))
In the image I have pointed out what I want, using the red arrows and lines.
You can use position = position_dodge() like so:
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
edit: I don't think you need cowplot at all, if this is what you want your plot to look like:
ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme_bw() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
xlab("") +
ggtitle("Soiltype")

Truncate highest bar in ggplot

Please consider the following
library(ggplot)
data <- data.frame(qnt=c(10,20,22,12,14,9,1000),lbl=c("A","B","C","D","E","F","G"))
ggplot(data=data, aes(x=lbl, y=qnt)) + geom_histogram(stat="identity")
which produces
Which options should I consider to truncate the highest bar G in the plot? (of course explaining to the viewer what I did)
If you want to fiddle with it, you can use the gridExtra package, and plot 2 (or more) trimmed out sections of the graph. I've tinkered with the margins to make it line up, but a better plan would probably be to format the axis labels to the same text width,
require(ggplot2)
require(gridExtra)
data <- data.frame(qnt=c(10,20,22,12,14,9,1000),lbl=c("A","B","C","D","E","F","G"))
g1<-ggplot(data=data, aes(x=lbl, y=qnt)) +
geom_histogram(stat="identity")+
coord_cartesian(ylim=c(-10,50)) +
labs(x=NULL, y=NULL)+
theme(plot.margin=unit(c(2,2,6,3),"mm"))
g2<-ggplot(data=data, aes(x=lbl, y=qnt)) +
geom_histogram(stat="identity") +
coord_cartesian(ylim=c(990,1010)) +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.ticks.x = element_blank()) +
labs(x=NULL, y=NULL) +
theme(plot.margin=unit(c(5,2,0,0),"mm"))
grid.arrange(g2,g1, heights=c(1/4, 3/4), ncol=1, nrow=2)
You can use coord_cartesian() and change limits for the y axis - coord_cartesian() will "zoom" the plot to the limits you will provide. Also I used geom_bar() as your are plotting factors on x axis.
ggplot(data=data, aes(x=lbl, y=qnt)) + geom_bar(stat="identity")+
coord_cartesian(ylim=c(0,100))
Another possibility is to use logarithm scale for y values.
ggplot(data=data, aes(x=lbl, y=qnt)) + geom_bar(stat="identity")+
scale_y_log10()

Resources