how to change x-axis labels in ggboxplot - r

I have a data frame including multiple factors. I used ggboxplot to get a box plot with comparisons for different categories. I am not satisfied with the x axis labels. I tried different ways but failed to get what I expected.
The code used to create a plot is:
df <- data.frame(country=sample(LETTERS[1:4], 1000, TRUE),
rating=round(rnorm(1000,70,15),1),
sex =rep(c("Female","Male"),500),
school=sample(c("public","private"),1000,TRUE))
df$group <- paste(df$school,df$sex,sep=".")
df <- df[order(df$group),]
my_comparisons <- list(c("public.Female","public.Male") , c("private.Female","private.Male"))
library(ggpubr)
ggboxplot(df, x = "group",y = "rating",
color = "group", palette = "simpsons",
add = "jitter",facet.by="country",legend="none", ylab="Rating") +
theme(strip.text.x=element_text(size=10, color="red", face="bold.italic"),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank()) +
stat_compare_means(method = "t.test",comparisons = my_comparisons,
label.y = 110,label = "p.signif")
The expected plot looks like:

This gets you close to what you're looking for (I couldn't figure out the line separator). You may also have to play around with the positioning of the labels to get them just right, as well as sizes.
ggboxplot(df, x = "group",y = "rating",
color = "group", palette = "simpsons",
add = "jitter", facet.by="country", legend="none", ylab="Rating") +
scale_x_discrete(labels=rep(c("F","M"),4)) +
theme(strip.text.x=element_text(size=10, color="red", face="bold.italic"),
axis.title.x = element_blank(),
plot.margin=unit(c(2,2,15,2), "mm")) +
stat_compare_means(method = "t.test",comparisons = my_comparisons,
label.y = 110, label = "p.signif") +
coord_cartesian(ylim=c(20,120), xlim=c(1,4), clip="off") +
annotate("text", x=1.5, y=0, label=c("","","Private","Private")) +
annotate("text", x=3.5, y=0, label=c("","","Public","Public")) +
annotate("text", x=0.5, y=10, label=c("","","Sex",""), hjust=1) +
annotate("text", x=0.5, y=0, label=c("","","School",""), hjust=1)
Additions include scale_x_discrete() to change x-axis labels, plot.margin and coord_cartesian to allow annotations outside the plot area, and annotate for each annotation, where the labels for each facet panel are given as a vector, with blanks for panels which shouldn't get labels.
There may be a cleaner way to do this, but the faceted nature of the plot means that annotations get replicated across facets which you don't want in this case.

Related

Putting horizontal lines on grouped boxplots

I am trying to make a boxplot with this basic code:
design=c("Red","Green","Blue")
actions=c("1","2","3","4","5","6","7","8")
proportion=(seq(1:240)+sample(1:500, 240, replace=T))/2000
df=data.frame(design, actions , proportion)
ggplot(df, aes(x=actions, y=proportion, fill=design)) +
geom_boxplot()+
xlab(TeX("group"))+
ylab("Y value")+
ggtitle("Y values for each group stratified by color")
Producing something like this:
I want to add horizontal lines for "true" Y values that are different for each group.
Does anyone have any tips for doing this? I don't know how to extract the width of each group of boxes, otherwise I could use geom_segment.
Here is a MWE with a non-grouped boxplot:
dBox <- data.frame(y = rnorm(10),group="1")
dBox=rbind(dBox,data.frame(y=rnorm(10),group="2"))
dLines <- data.frame(X =c(-0.36, 0.015),
Y = c(0.4, -0.2),
Xend = c(0.-0.015, 0.36),
Yend=c(0.4, -0.2),
group = c("True", "True"),
color = c("black", "red"))
ggplot(dBox, aes(x=0, y=y,fill=group)) +
geom_boxplot(outlier.shape = 1)+
geom_segment(data = dLines, aes(x = X, xend = Xend, y = Y, yend = Yend),color="red",size=1.5,linetype=1) +
theme(legend.background = element_rect(fill = "white", size = 0.1, linetype = "solid", colour = "black"))
This produces something like this:
However, it's difficult to make the geom_segments line up with the boxes exactly, and to then extend this to the grouped boxplot setting.
Thanks!
This can be done using a workaround with facets:
lines = data.frame(actions = 1:8, proportion=abs(rnorm(8)))
design=c("Red","Green","Blue")
actions=c("1","2","3","4","5","6","7","8")
proportion=(seq(1:240)+sample(1:500, 240, replace=T))/2000
df=data.frame(design, actions , proportion)
lines = data.frame(actions = 1:8, proportion=abs(rnorm(8)))
p = ggplot(df, aes(x=actions, y=proportion, fill=design)) +
geom_boxplot()+
xlab("group")+
ylab("Y value")+
ggtitle("Y values for each group stratified by color") +
facet_grid(~actions, scale='free_x') +
theme(
panel.spacing.x = unit(0, "lines"),
strip.background = element_blank(),
strip.text.x = element_blank())
p + geom_hline(aes(yintercept = proportion), lines)
You could probably fiddle around with removing the spaces between the facets to make it look more like what you intended.
Thanks to #eugene100hickey for pointing out how to remove spacing between facets.
theme(panel.spacing.x) can remove those pesky lines:
p + geom_hline(aes(yintercept = proportion), lines) +
theme(panel.spacing.x = unit(0, "lines"))

Manually change order of y axis items on complicated stacked bar chart in ggplot2

I've been stuck on an issue and can't find a solution. I've tried many suggestions on Stack Overflow and elsewhere about manually ordering a stacked bar chart, since that should be a pretty simple fix, but those suggestions don't work with the huge complicated mess of code I plucked from many places. My only issue is y-axis item ordering.
I'm making a series of stacked bar charts, and ggplot2 changes the ordering of the items on the y-axis depending on which dataframe I am trying to plot. I'm trying to make 39 of these plots and want them to all have the same ordering. I think ggplot2 only wants to plot them in ascending order of their numeric mean or something, but I'd like all of the bar charts to first display the group "Bird Advocates" and then "Cat Advocates." (This is also the order they appear in my data frame, but that ordering is lost at the coord_flip() point in plotting.)
I think that taking the data frame through so many changes is why I can't just add something simple at the end or use the reorder() function. Adding things into aes() also doesn't work, since the stacked bar chart I'm creating seems to depend on those items being exactly a certain way.
Here's one of my data frames where ggplot2 is ordering my y-axis items incorrectly, plotting "Cat Advocates" before "Bird Advocates":
Group,Strongly Opposed,Opposed,Slightly Opposed,Neutral,Slightly Support,Support,Strongly Support
Bird Advocates,0.005473026,0.010946052,0.012509773,0.058639562,0.071149335,0.31118061,0.530101642
Cat Advocates,0.04491726,0.07013396,0.03624901,0.23719464,0.09141056,0.23404255,0.28605201
And here's all the code that takes that and turns it into a plot:
library(ggplot2)
library(reshape2)
library(plotly)
#Importing data from a .csv file
data <- read.csv("data.csv", header=TRUE)
data$s.Strongly.Opposed <- 0-data$Strongly.Opposed-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Opposed <- 0-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Slightly.Opposed <- 0-data$Slightly.Opposed-.5*data$Neutral
data$s.Neutral <- 0-.5*data$Neutral
data$s.Slightly.Support <- 0+.5*data$Neutral
data$s.Support <- 0+data$Slightly.Support+.5*data$Neutral
data$s.Strongly.Support <- 0+data$Support+data$Slightly.Support+.5*data$Neutral
#to percents
data[,2:15]<-data[,2:15]*100
#melting
mdfr <- melt(data, id=c("Group"))
mdfr<-cbind(mdfr[1:14,],mdfr[15:28,3])
colnames(mdfr)<-c("Group","variable","value","start")
#remove dot in level names
mylevels<-c("Strongly Opposed","Opposed","Slightly Opposed","Neutral","Slightly Support","Support","Strongly Support")
mdfr$variable<-droplevels(mdfr$variable)
levels(mdfr$variable)<-mylevels
pal<-c("#bd7523", "#e9aa61", "#f6d1a7", "#999999", "#c8cbc0", "#65806d", "#334e3b")
ggplot(data=mdfr) +
geom_segment(aes(x = Group, y = start, xend = Group, yend = start+value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
geom_hline(yintercept = 0, color =c("#646464")) +
coord_flip() +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white")) +
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
The plot:
I think this works, you may need to play around with the axis limits/breaks:
library(dplyr)
mdfr <- mdfr %>%
mutate(group_n = as.integer(case_when(Group == "Bird Advocates" ~ 2,
Group == "Cat Advocates" ~ 1)))
ggplot(data=mdfr) +
geom_segment(aes(x = group_n, y = start, xend = group_n, yend = start + value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
scale_x_continuous(limits = c(0,3), breaks = c(1, 2), labels = c("Cat", "Bird")) +
geom_hline(yintercept = 0, color =c("#646464")) +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
coord_flip() +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white"))+
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
produces this plot:
You want to factor the 'Group' variable in the order by which you want the bars to appear.
mdfr$Group <- factor(mdfr$Group, levels = c("Bird Advocates", "Cat Advocates")

Space specific ares in x axis ggplot

I have the below script:
testFigure <- ggplot(data = final_df, aes(x=final_df$`ng DNA`,
y=final_df$`count`)) +
geom_point(col = "darkmagenta") + ggtitle("ng VS Number") +
xlab(expression(paste("ng"))) + ylab("Num (#)") +
theme(plot.title = element_text(hjust = 0.5, color="orange", size=18,
face="bold.italic"),
axis.title.x = element_text(color="#993333", size=10, face = "bold"),
axis.title.y = element_text(color="#993333", size=10,face = "bold")) +
scale_y_log10(breaks=c(0,10,50,200,600))
testFigure+scale_x_continuous(breaks=c(5,50,100,150,200,250,300,350,400)
Which generates the plot:
I'd like to space the area of the plot in the X axis, so that the lower values 0-10, but especially 0-5, will be more clear and spaced between them, while keeping the spaces of the other ticks.
Any suggestions how to do that?
Solution 1:
I also noticed you have done it but in wrong axis
scale_y_log10(breaks=c(0,10,50,200,600))
but it seems that you have to do the same thing for x axe as well
scale_x_log10(breaks=c(0,10,50,200,600))
Solution 2:
scale_x_discrete(limits=0:5)
You can use scale on each axis and set a limit over it
library(ggplot2)
dt<-data.frame("Name"=sample(c("A","B"),10,replace = T),
x=sample(1:10,10),y=sample(1:10,10))
ggplot(dt, aes( x= x , y= y))+
geom_point(stat='identity', aes(shape=Name,colour = Name))+
scale_x_discrete(limits=1:12)+
scale_y_discrete(limits=1:12)

Displaying multiple factors with Sina plots

NOTE: I have updated this post following discussion with Z. Lin. Originally, I had simplified my problem to a two factor design (see section "Original question"). However, my actual data consists of four factors, requiring facet_grid. I am therefore providing an example for a four factor design further below (see section "Edit").
Original question
Let's assume I have a two factor design with dv as my dependent variable and iv.x and iv.y as my factors/independent variables. Some quick sample data:
DF <- data.frame(dv = rnorm(900),
iv.x = sort(rep(letters[1:3], 300)),
iv.y = rep(sort(rep(rev(letters)[1:3], 100)), 3))
My goal is to display each condition separately as can nicely be done with violin plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_violin()
I have recently come across Sina plots and would like to do the same here. Unfortunately Sina plots don't do this, collapsing the data instead.
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina()
An explicit call to position dodge doesn't help either, as this produces an error message:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina(position = position_dodge(width = 0.5))
The authors of Sina plots have already been made aware of this issue in 2016:
https://github.com/thomasp85/ggforce/issues/47
My problem is more in terms of time. We soon want to submit a manuscript and Sina plots would be a great way to display our data. Can anyone think of a workaround for Sina plots such that I can still display two factors as in the example with violin plots above?
Edit
Sample data for a four factor design:
DF <- data.frame(dv=rnorm(400),
iv.w=sort(rep(letters[1:2],200)),
iv.x=rep(sort(rep(letters[3:4],100)), 2),
iv.y=rep(sort(rep(rev(letters)[1:2],50)),4),
iv.z=rep(sort(rep(letters[5:6],25)),8))
An example with violin plots of what I would like to create using Sina plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_violin(aes(y = dv, fill = iv.y),
position = position_dodge(width = 1))+
stat_summary(aes(y = dv, fill = iv.y), fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(aes(y = dv, fill = iv.y), fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2, show.legend = FALSE,
colour="black", size=.2)
Edited solution, since OP clarified that facets are required:
ggplot(DF, aes(x = interaction(iv.y, iv.x),
y = dv, fill = iv.y, colour = iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_sina() +
stat_summary(fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2,
show.legend = FALSE,
colour="black", size=.2) +
scale_x_discrete(name = "iv.x",
labels = c("c", "", "d", "")) +
theme(panel.grid.major.x = element_blank(),
axis.text.x = element_text(hjust = -4),
axis.ticks.x = element_blank())
Instead of using facets to simulate dodging between colours, this approach creates a new variable interaction(colour.variable, x.variable) to be mapped to the x-axis.
The rest of the code in scale_x_discrete() & theme() are there to hide the default x-axis labels / ticks / grid lines.
axis.text.x = element_text(hjust = -4) is a hack that shifts x-axis labels to approximately the right position. It's ugly, but considering the use case is for a manuscript submission, I assume the size of plots will be fixed, and you just need to tweak it once.
Original solution:
Assuming your plots don't otherwise require facetting, you can simulate the appearance with facets:
ggplot(DF, aes(x = iv.y, y = dv, colour = iv.y)) +
geom_sina() +
facet_grid(~iv.x, switch = "x") +
labs(x = "iv.x") +
theme(axis.text.x = element_blank(), # hide iv.y labels
axis.ticks.x = element_blank(), # hide iv.y ticks
strip.background = element_blank(), # make facet strip background transparent
panel.spacing.x = unit(0, "mm")) # remove horizontal space between facets

Cowplot: How to add tick marks and corresponding data labels to a marginal plot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
R Packages: cowplot / ggplot2
Use Case: Scatter plot with marginal histograms.
Issue: For histograms, I can't add bin sizes or reference lower/ upper
class intervals in the x-axis. Without these histograms are difficult
to read.
In cowplot, is there any way to add tick marks and corresponding data
labels (in x-axis) to marginal plots, when required? E.g. for
histograms in marginal plots
Basic scatter + marginal histogram plot using cowplot
require(ggplot2)
require(cowplot)
Main Plot:
pmain <- ggplot(data = mpg, aes(x = cty, y = hwy)) +
geom_point() +
xlab("City driving") +
ylab("Highway driving") +
theme_grey()
Marginal plot:
xbox <- axis_canvas(pmain, axis = "x") +
geom_histogram(
data = mpg,
aes(x = cty),
colour = "black"
)
Combined Plot:
p1 <- insert_xaxis_grob(pmain, xbox, grid::unit(0.5, "in"), position = "top")
ggdraw(p1)
However, I'd want the following plot xbox2 to be displayed as x-axis marginal plot:
xbox2.1 <- ggplot() +
geom_histogram(
data = mpg,
aes(x = cty),
colour = "black"
)
hist_tab <- ggplot_build(xbox2.1)$data[[1]]
xbox2 <- xbox2.1 +
scale_x_continuous(
breaks = c(round(hist_tab$xmin,1),
round(hist_tab$xmax[length(hist_tab$xmax)],1))
) +
labs(x = NULL, y = NULL) +
theme(
axis.text.x = element_text(angle = 90, size=7,vjust=0.5),
axis.line = element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()
)
xbox2
But I can't create a scatter + marginal histogram (xbox2). I get the same plot as the first one:
p2 <- insert_xaxis_grob(pmain, xbox2, grid::unit(0.5, "in"), position = "top")
ggdraw(p2)
Package author here. What you're seeing is the documented behavior. From the documentation of the grob argument of insert_xaxis_grob():
The grob to insert. This will generally have been obtained via get_panel() from a ggplot2 object, in particular one generated with axis_canvas(). If a ggplot2 plot is provided instead of a grob, then get_panel() is called to extract the panel grob.
This function is specifically not meant to stack plots. You could turn your entire plot into a grob and then insert using this function, but I'm not sure that makes a lot of sense. What you're trying to do is equivalent to stacking two plots with the same x-axis range. I think it's better to just code it like that explicitly.
library(cowplot)
xlimits <- c(6, 38)
pmain <- ggplot(data = mpg, aes(x = cty, y = hwy)) +
geom_point() +
xlab("City driving") +
ylab("Highway driving") +
scale_x_continuous(limits = xlimits, expand = c(0, 0)) +
theme_grey() +
theme(plot.margin = margin(0, 5.5, 5.5, 5.5))
xhist <- ggplot() +
geom_histogram(
data = mpg,
aes(x = cty),
colour = "black",
binwidth = 1,
center = 10
) +
scale_x_continuous(limits = xlimits, expand = c(0, 0), breaks = 8:35) +
labs(x = NULL, y = NULL) +
theme(
axis.text.x = element_text(angle = 90, size=7, vjust=0.5),
axis.line = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
plot.margin = margin(5.5, 5.5, 0, 5.5)
)
plot_grid(xhist, pmain, ncol = 1, align = "v", rel_heights = c(0.2, 1))

Resources