How to add a minor grid in boxplots in ggplot2? - r

I want to add a vertical minor grid between two major grids in a boxplot with discrete x variables in ggplot2.
This is the sample:
boxplot <- ggplot(data = mtcars ,aes(x = as.factor(cyl),y=wt, fill=as.factor(am))) + geom_boxplot()
boxplot
As seen from the visualization, it can be unclear which box belongs to which x label because the major vertical grid is separating the two boxes at the same x-variable (it may not be an issue here, but it does become problematic when there are many x variables and narrow boxes). Therefore, I am thinking of adding a minor grid in the middle of each major grid. I tried using the "minor grid.x" in ggplot2, shown below, but I could not see any added lines.
boxplot + theme(panel.grid.minor.x = element_line(color="black"))
I've looked over related posts on setting gridlines, but it seems that they are focused on continuous x variables, and is not applicable to box plots.
Thank you in advance.

Update thanks to #Allan Cameron:
ggplot(data = mtcars ,aes(x =factor(cyl),y=wt, fill=as.factor(am))) +
geom_boxplot() +
geom_vline(xintercept = c(1.5, 2.5),linetype="dashed",colour="green",size=1)
First answer:
Are you looking for such a solution:
library(ggplot2)
ggplot(data = mtcars ,aes(x =factor(cyl),y=wt, fill=as.factor(am))) +
geom_boxplot() +
geom_vline(aes(xintercept=1.5),linetype="dashed",colour="green",size=1)+
geom_vline(aes(xintercept=2.5),linetype="dashed",colour="green",size=1)

You can try this and modify the values of the theme and see what works better for you:
boxplot +
theme(
panel.grid.major.y = element_line(color = "blue",
size = 0.5,
linetype = 2),
panel.grid.minor.y = element_line(color = "red",
size = 0.25,
linetype = 1),
panel.grid.major.x = element_line(color = "green",
size = 0.5,
linetype = 3)
)

Related

How to change colour of histograms with facet_grid and the position of the text?

How can I change the colour for the two histograms plot made with facet_grid?
I would like to have the histogram of "Active" in green and "Failed" in red.
And also, how can I change the position of the text of the line in order to have it a bit more down?
Here is my code:
df %>%
ggplot(aes(x = `Banks/turnover%Last avail. yr`)) +
geom_histogram(
bins = nclass.Sturges(`Banks/turnover%Last avail. yr`),colour = "black")+
geom_vline(data = status_means, aes(xintercept = avg), color="red", linetype="dashed")+
geom_text(data = status_means , aes(x = avg, label = avg), y= Inf )+
theme(legend.position="None")+
facet_grid(~general_status)
You need to add a factor to the fill parameter. Here, I just use active-failed as I'm not sure how you have that distinguished in your data.
ggplot(df_all,
aes(x = `Banks/turnover%Last avail. yr`, fill = as.factor(active-failed)))
Then, you can add a line to define the colors:
scale_fill_manual(values = c("green", "red"))
As #teunbrand said, you can just use vjust to move the text so that it is not cut off.
Remember to give a good reproducible example, it is helpful to provide some of your data via dput(head(df)).

ggplot2 dodged boxplot with geom_point dodging and unequal number of subgroups

I am attempting to plot a dodged boxplot but I run into a couple of difficulties. First of all, the x-axis basically has 2 types of grouping: the "letter-groups" (A, B, C etc...) are the main groups, I specify these as my "X" aesthetic (X_main_group). Within this main group I have subgroups called "X_group", the boxes are coloured by those subgroup types. What causes problems is that for each letter group I have different amounts of these subgroups, e.g. for x=A I have 4 subgroups but for x=B I have only one. This causes problems, for one the dodging of the plotted points do not work anymore (see the example plot below) as they do not align with the dodged boxplots. Secondly, the boxes are not centered around the x-axis tick anymore, this is most clear for x=B. How can I fix this?
I would also like to achieve small x-axis ticks below each subgroup (so 4 ticks for x=A, 1 tick for x=B, 3 for x=C etc..) but this has less priority. I have attached the figure, and in red I drew some examples of what I hope to achieve with the tick-marks. ggplot2 code is shown below. I would like to provide a reproducible piece of code, but I can not manage to create a piece of code that creates a dataframe with unequal amounts of subgroups so people that want to help can run it. I can only make "symmetrical" dataframes...
cbpallette <- c("#999999", "#666666", "#333333", "#000000", "#003300")
p1 <- ggplot(data=df, aes(x=X_main_group,y=Intensity, colour=factor(X_group))) + stat_boxplot(geom = "errorbar", width=.4, position = position_dodge(0.5, preserve="single")) + geom_boxplot(width=0.5, outlier.shape=NA, position=position_dodge(preserve = "single")) + theme_classic() + geom_point(position=position_jitterdodge(), alpha=0.3)
p2 <- p1 + scale_colour_manual(values = cbpallette) + theme(legend.position = "none") + theme(axis.ticks.length = unit(-0.1, "cm"), axis.text.x = element_text(size=30, vjust=-0.4), axis.text.y=element_text(size=35, hjust = 0.5, angle=45), axis.title = element_blank())
p3 <- p2 + theme(axis.text.x = element_text(margin = margin(t = .5, unit = "cm")), axis.text.y = element_text(margin = margin(r = .5, unit = "cm")))
p3

Problem of different x-axis position when using grid.arrange and legend on bottom

I have to arrange two plots with same axes next to each other and did this with ggplot2 and grid.arrange. Because of a more tidy representation, the legends have to be placed bottom. Unfortunately some times the left plot has more legend entries than the right one and therefore needs a second line, yielding x-axes on different y positions. Therefore it does not only look untidy, the aim of being able to compare these plots is not fulfilled anymore.
Can anybody help?
plot_left <- some_ggplot2_fct(variable,left) +
theme(legend.position = "bottom")+
theme(legend.background = element_rect(size = 0.5, linetype="solid", colour ="black"))
plot_right <- some_ggplot2_fct(variable,right,f)+
theme(legend.position = "bottom")+
theme(legend.background = element_rect(size = 0.5, linetype="solid", colour ="black"))
# adjust y axis for more easy compare
upper_lim <- max(plot_Volume_right$data$value, plot_Volume_left$data$value)
lower_lim <- min(plot_Volume_right$data$value, plot_Volume_left$data$value)
plot_Volume_left <- plot_Volume_left + ylim(c(lower_lim, upper_lim))
plot_Volume_right <- plot_Volume_right + ylim(c(lower_lim, upper_lim))
# Arrange plots in grid
grid.arrange(plot_Volume_left, plot_Volume_right,
ncol = 2,
top = textGrob(strTitle,
gp = gpar(fontfamily = "Raleway", fontsize = 15, font = 2)))
In the picture you can see the result:
Do you now an easy way to solve this without too much change in code? (The underlying framework is quite large)

Stat summary for each factor in scatter plot ggplot2: What about fun.x, fun_y combinations?

I have a bunch of data for people touching bacteria for up to 5 touches. I'm comparing how much they pick up with and without gloves. I'd like to plot the mean by the factor NumberContacts and colour it red. E.g. the red dots on the following graphs.
So far I have:
require(tidyverse)
require(reshape2)
Make some data
df<-data.frame(Yes=rnorm(n=100),
No=rnorm(n=100),
NumberContacts=factor(rep(1:5, each=20)))
Calculate the mean for each group= NumberContacts
centroids<-aggregate(data=melt(df,id.vars ="NumberContacts"),value~NumberContacts+variable,mean)
Get them into two columns
centYes<-subset(centroids, variable=="Yes",select=c("NumberContacts","value"))
centNo<-subset(centroids, variable=="No",select="value")
centroids<-cbind(centYes,centNo)
colnames(centroids)<-c("NumberContacts","Gloved","Ungloved")
Make an ugly plot.
ggplot(df,aes(x=gloves,y=ungloved)+
geom_point()+
geom_abline(slope=1,linetype=2)+
stat_ellipse(type="norm",linetype=2,level=0.975)+
geom_point(data=centroids,size=5,color='red')+
#stat_summary(fun.y="mean",colour="red")+ doesn't work
facet_wrap(~NumberContacts,nrow=2)+
theme_classic()
Is there a more elegant way by using stat_summary? Also How can I change the look of the boxes at the top of my graphs?
stat_summary is not an option because (see ?stat_summary):
stat_summary operates on unique x
That is, while we can take a mean of y, x remains fixed. But we may do something else that is very concise:
ggplot(df, aes(x = Yes, y = No, group = NumberContacts)) +
geom_point() + geom_abline(slope = 1, linetype = 2)+
stat_ellipse(type = "norm", linetype = 2, level = 0.975)+
geom_point(data = df %>% group_by(NumberContacts) %>% summarise_all(mean), size = 5, color = "red")+
facet_wrap(~ NumberContacts, nrow = 2) + theme_classic() +
theme(strip.background = element_rect(fill = "black"),
strip.text = element_text(color = "white"))
which also shows that to modify the boxes above you want to look at strip elements of theme.

Change number of minor gridlines in ggplot2 (per major ones), r

I read this answer : How to control number of minor grid lines in ggplot2?
Although I couldn't figure out a way to reconcile it with my requirements.
I want there to be a way to input the number of minor gridlines between two major ticks. (or the ratio of the minor to major grid size) Say I want to divide it into 5 parts (4 minor gridlines). How do I do that?
Since there will be many graphs, of which I wouldn't know the axis limits, I can't explicitly define the size of one minor gridline step. I want to use whatever algo ggplot2 uses to pick the number of major gridlines, and just have 4 times as many minor ones.
I'd like the r graph on the right to look like the excel graph on the left
CODE (in case that helps solve the issue)
ggtheme <- theme(axis.line = element_line(size = 0.5, colour = "black"),
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(colour="grey90",size = rel(0.5)),
panel.grid.minor = element_line(colour="grey95",size = rel(0.25)));
ggp_sctr2 = ggplot( sub2_ac_data, aes(x=(sub2_ac_data[,i]),
y=sub2_ac_data[, rescol], colour = factor(sub2_ac_data[,topfac[1]]),
shape = factor(sub2_ac_data[,topfac[1]]) )) + geom_point(size = 2.5) +
scale_shape_manual(values=symlist[Nmsn_sub1+1:20]) +
scale_colour_manual(values = unname(cols[Nmsn_sub1+1:16])) +
geom_smooth(method="lm", formula = y ~ splines::bs(x, 3),
linetype = "solid", size = 0.25,fill = NA )
print(ggp_sctr2 + ggtitle( paste(scxnam[1],nomvar,
"vs",colnames(sub2_ac_data[i]),i, sep = " ")) +
theme(axis.text.x=element_text(angle = 90, hjust = 1,vjust = 0.5,size=8)) +
labs(x = colnames(sub2_ac_data[i]), y=colnames(sub2_ac_data[rescol]),
colour=colnames(sub2_ac_data[topfac[1]]),
shape=colnames(sub2_ac_data[topfac[1]])) + ggtheme +
theme(plot.title = element_text(face = "bold", size = 16,hjust = 0.5))) ;
I found one possible solution, although it's not very elegant.*
Basically you extract the major gridlines and then create a sequence of minor gridlines based on a multiplier.
I've replied to a related question (as a modification to an answer by Eric Watt). It's just a change of syntax, that wasn't explained too well in any documentation.
This is the link: ggplot2 integer multiple of minor breaks per major break
Here is the code:
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 5
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
*Not very elegant because:
It fails in case the graph doesn't get any data (which can happen in a loop, so exceptions need to be made for it).
It only starts minor gridlines from the min to max of the existing major gridlines, not throughout the extremities of the coordinate space where points can be placed

Resources