Create a dodged barplot with ggplot2 - r

I have the dataset below:
Database<-c("Composite","DB","TC","RH","DGI","DCH","DCH","DCH","LDP")
Unique_Drugs<-c(12672,5130,1425,3090,6100,2019,250,736,1182)
Unique_Targets<-c(3987,2175,842,2308,2413,1441,198,327,702)
db<-data.frame(Database,Unique_Drugs,Unique_Targets)
and I would like to create a dodged bar chart like the picture below:
This plot came from a dataframe like:
The difference is that in the x-axis I want the 7 unique Database names and the fill argument should be the Unique_Drugs and Unique_Targets in order to create 2 colored bars that will display their values. Im not sure how to make it work.
My code is:
p <- ggplot(data = db, aes(Database)) +
geom_bar(position = position_dodge(preserve = "single"), stat="count", aes(fill = colnames(db[2:4])), color = "black")+
coord_flip()+
theme(legend.position="top",
legend.title=element_blank(),
axis.title.x=element_text(size=18, face="bold", color="#000000"), # this changes the x axis title
axis.text.x = element_text(size=14, face="bold", color="#000000"), #This changes the x axis ticks text
axis.title.y=element_text(size=18, face="bold", color="#000000"), # this changes the y axis title
axis.text.y = element_text(size=14, face="bold", color="#000000"))+ #This changes the y axis ticks text
labs(x = "Database") +
labs(y = "Value") +
scale_x_discrete(limits = rev(factor(Database))) +
scale_fill_manual("Databases", values = c("tomato","steelblue3"))

Here's one way to achieve what you want:
library(reshape2)
ggplot(melt(db), aes(x = Database, y = value, fill = variable)) +
geom_col(position = "dodge") + ylab(NULL) + theme_minimal() +
scale_fill_discrete(NULL, labels = c("Drugs", "Targets"))
If you wanted a bar plot only for drugs, there would be no need for melt as you could use y = Unique_Drugs to specify the bar heights (note that since we have heights we use geom_col). In this case, however, we want to specify two kinds of heights. Your words that fill argument should be the Unique_Drugs and Unique_Targets precisely suggest that we need some transformations because ggplot doesn't accept two variables for the same aesthetic. So, using melt we get all the heights as a single variable and get a single variable for fill.

Related

How to display number of cases per group in a stacked bar plot?

I am attempting to produce a stacked bar plot that has the fill color defined by a variable and also shows the number of cases represented by each of the filled sections.
Reproducible example:
library(tidyverse)
data(mpg)
ggplot(mpg,aes(manufacturer))+
geom_bar(position = "fill",stat = "count",aes(fill=drv))+
theme_classic()+
theme(text = element_text(size=20),
axis.text.x = element_text(angle = 45,
vjust = 0.5))
which produces .
Here is a paired-down version of what I would like to produce programmatically:
, where the
n=...
are centered on each groups filled section and display the number of cases per group (drv) in each category (manufacturer).
Additionally, I have tried (unsuccessfully) incorporating code from this post and this post, which seem close to what I want, but when I incorporate the code from this post the following error is thrown:
Error: StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat="count"?
I am not sure why this error is thrown because I do define stat="count" in the geom_bar() function call.
Use position_fill(vjust = 0.5) and label with after_stat(count):
ggplot(mpg, aes(manufacturer, fill = drv)) +
geom_bar(position = "fill", stat = "count")+
geom_text(aes(label = paste0("n=", after_stat(count))), stat='count', position = position_fill(vjust = 0.5)) +
theme_classic()

R ggplot2 - align theme grid with axis tick lines

I have a ggplot2 barchart for which I changed the axis ticks. However, the panel grid is adding additional lines that I do not want. How do I remove them?
My problems:
I only want the vertical grid lines that match the x-axis ticks.
The position dodge preserve is not working correctly due to having a group and a fill.
My code:
ggplot(byyear, aes(x = year, y = count, group = venue, colour = venue, fill = type)) +
geom_bar(stat = "identity", position=position_dodge(preserve = "single")) +
# BORDER SO I CAN DISTINGUISH THEM
scale_colour_manual(name = "Venue", values = c("#FFFFFF", "#FFFFFF")) +
# MAKE ALL YEARS APPEAR
scale_y_continuous(labels = number_format(accuracy = 1)) +
scale_x_continuous(breaks = unique(byyear$year)) +
theme(legend.position="bottom",
axis.text.x = element_text(angle = 90, hjust = 1))
The data is of the structure:
year,venue,type,count
2010,venue1,type1,163
2010,venue1,type2,18
2011,venue1,type1,16
...
The plot that I'm obtaining is the following (I removed the legend on the plot)

Manually change order of y axis items on complicated stacked bar chart in ggplot2

I've been stuck on an issue and can't find a solution. I've tried many suggestions on Stack Overflow and elsewhere about manually ordering a stacked bar chart, since that should be a pretty simple fix, but those suggestions don't work with the huge complicated mess of code I plucked from many places. My only issue is y-axis item ordering.
I'm making a series of stacked bar charts, and ggplot2 changes the ordering of the items on the y-axis depending on which dataframe I am trying to plot. I'm trying to make 39 of these plots and want them to all have the same ordering. I think ggplot2 only wants to plot them in ascending order of their numeric mean or something, but I'd like all of the bar charts to first display the group "Bird Advocates" and then "Cat Advocates." (This is also the order they appear in my data frame, but that ordering is lost at the coord_flip() point in plotting.)
I think that taking the data frame through so many changes is why I can't just add something simple at the end or use the reorder() function. Adding things into aes() also doesn't work, since the stacked bar chart I'm creating seems to depend on those items being exactly a certain way.
Here's one of my data frames where ggplot2 is ordering my y-axis items incorrectly, plotting "Cat Advocates" before "Bird Advocates":
Group,Strongly Opposed,Opposed,Slightly Opposed,Neutral,Slightly Support,Support,Strongly Support
Bird Advocates,0.005473026,0.010946052,0.012509773,0.058639562,0.071149335,0.31118061,0.530101642
Cat Advocates,0.04491726,0.07013396,0.03624901,0.23719464,0.09141056,0.23404255,0.28605201
And here's all the code that takes that and turns it into a plot:
library(ggplot2)
library(reshape2)
library(plotly)
#Importing data from a .csv file
data <- read.csv("data.csv", header=TRUE)
data$s.Strongly.Opposed <- 0-data$Strongly.Opposed-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Opposed <- 0-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Slightly.Opposed <- 0-data$Slightly.Opposed-.5*data$Neutral
data$s.Neutral <- 0-.5*data$Neutral
data$s.Slightly.Support <- 0+.5*data$Neutral
data$s.Support <- 0+data$Slightly.Support+.5*data$Neutral
data$s.Strongly.Support <- 0+data$Support+data$Slightly.Support+.5*data$Neutral
#to percents
data[,2:15]<-data[,2:15]*100
#melting
mdfr <- melt(data, id=c("Group"))
mdfr<-cbind(mdfr[1:14,],mdfr[15:28,3])
colnames(mdfr)<-c("Group","variable","value","start")
#remove dot in level names
mylevels<-c("Strongly Opposed","Opposed","Slightly Opposed","Neutral","Slightly Support","Support","Strongly Support")
mdfr$variable<-droplevels(mdfr$variable)
levels(mdfr$variable)<-mylevels
pal<-c("#bd7523", "#e9aa61", "#f6d1a7", "#999999", "#c8cbc0", "#65806d", "#334e3b")
ggplot(data=mdfr) +
geom_segment(aes(x = Group, y = start, xend = Group, yend = start+value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
geom_hline(yintercept = 0, color =c("#646464")) +
coord_flip() +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white")) +
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
The plot:
I think this works, you may need to play around with the axis limits/breaks:
library(dplyr)
mdfr <- mdfr %>%
mutate(group_n = as.integer(case_when(Group == "Bird Advocates" ~ 2,
Group == "Cat Advocates" ~ 1)))
ggplot(data=mdfr) +
geom_segment(aes(x = group_n, y = start, xend = group_n, yend = start + value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
scale_x_continuous(limits = c(0,3), breaks = c(1, 2), labels = c("Cat", "Bird")) +
geom_hline(yintercept = 0, color =c("#646464")) +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
coord_flip() +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white"))+
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
produces this plot:
You want to factor the 'Group' variable in the order by which you want the bars to appear.
mdfr$Group <- factor(mdfr$Group, levels = c("Bird Advocates", "Cat Advocates")

Displaying multiple factors with Sina plots

NOTE: I have updated this post following discussion with Z. Lin. Originally, I had simplified my problem to a two factor design (see section "Original question"). However, my actual data consists of four factors, requiring facet_grid. I am therefore providing an example for a four factor design further below (see section "Edit").
Original question
Let's assume I have a two factor design with dv as my dependent variable and iv.x and iv.y as my factors/independent variables. Some quick sample data:
DF <- data.frame(dv = rnorm(900),
iv.x = sort(rep(letters[1:3], 300)),
iv.y = rep(sort(rep(rev(letters)[1:3], 100)), 3))
My goal is to display each condition separately as can nicely be done with violin plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_violin()
I have recently come across Sina plots and would like to do the same here. Unfortunately Sina plots don't do this, collapsing the data instead.
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina()
An explicit call to position dodge doesn't help either, as this produces an error message:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) + geom_sina(position = position_dodge(width = 0.5))
The authors of Sina plots have already been made aware of this issue in 2016:
https://github.com/thomasp85/ggforce/issues/47
My problem is more in terms of time. We soon want to submit a manuscript and Sina plots would be a great way to display our data. Can anyone think of a workaround for Sina plots such that I can still display two factors as in the example with violin plots above?
Edit
Sample data for a four factor design:
DF <- data.frame(dv=rnorm(400),
iv.w=sort(rep(letters[1:2],200)),
iv.x=rep(sort(rep(letters[3:4],100)), 2),
iv.y=rep(sort(rep(rev(letters)[1:2],50)),4),
iv.z=rep(sort(rep(letters[5:6],25)),8))
An example with violin plots of what I would like to create using Sina plots:
ggplot(DF, aes(iv.x, dv, colour=iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_violin(aes(y = dv, fill = iv.y),
position = position_dodge(width = 1))+
stat_summary(aes(y = dv, fill = iv.y), fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(aes(y = dv, fill = iv.y), fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2, show.legend = FALSE,
colour="black", size=.2)
Edited solution, since OP clarified that facets are required:
ggplot(DF, aes(x = interaction(iv.y, iv.x),
y = dv, fill = iv.y, colour = iv.y)) +
facet_grid(iv.w ~ iv.z) +
geom_sina() +
stat_summary(fun.y=mean, geom="point",
colour="black", show.legend = FALSE, size=.2,
position=position_dodge(width=1))+
stat_summary(fun.data=mean_cl_normal, geom="errorbar",
position=position_dodge(width=1), width=.2,
show.legend = FALSE,
colour="black", size=.2) +
scale_x_discrete(name = "iv.x",
labels = c("c", "", "d", "")) +
theme(panel.grid.major.x = element_blank(),
axis.text.x = element_text(hjust = -4),
axis.ticks.x = element_blank())
Instead of using facets to simulate dodging between colours, this approach creates a new variable interaction(colour.variable, x.variable) to be mapped to the x-axis.
The rest of the code in scale_x_discrete() & theme() are there to hide the default x-axis labels / ticks / grid lines.
axis.text.x = element_text(hjust = -4) is a hack that shifts x-axis labels to approximately the right position. It's ugly, but considering the use case is for a manuscript submission, I assume the size of plots will be fixed, and you just need to tweak it once.
Original solution:
Assuming your plots don't otherwise require facetting, you can simulate the appearance with facets:
ggplot(DF, aes(x = iv.y, y = dv, colour = iv.y)) +
geom_sina() +
facet_grid(~iv.x, switch = "x") +
labs(x = "iv.x") +
theme(axis.text.x = element_blank(), # hide iv.y labels
axis.ticks.x = element_blank(), # hide iv.y ticks
strip.background = element_blank(), # make facet strip background transparent
panel.spacing.x = unit(0, "mm")) # remove horizontal space between facets

ggplot2: legend of two different data sets

I am drawing a map with ggplot using a shape file. Then I add arcs using geom_line. The arcs are colored according to their type (oneway or twoway) and then I add nodes using geom_point. The nodes are colored according to their type (Origin, Destination, Node, Parking lot). I want to have two different legends: one for the node types and one for the arc types. Unfortunately, ggplot merges the legends and produces just one legend.
Here is the code (sorry that I can't provide a workable example. I can't send the shape files):
cityplot <- ggplot(data = s_zurich, aes(x = long, y = lat, group = id), fill = "white") +
geom_polygon(data = s_zurich, fill = "white") +
ylab("") + xlab("") +
theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank())
cityplot_arcs <- cityplot +
geom_line(data = allarcs, aes(x = X1, y = X2, group = Id, colour = Direction), size = 1) +
xlab("") + ylab("")
cityplot_arcs_nodes <- cityplot_arcs + geom_point(aes(x = lon, y = lat, colour = Type), shape = 15, size = 4, inherit.aes = FALSE, data = allnodes) +
theme(legend.position = "none")
Any help would be appreciated.
Here is a possible workaround. If you can keep your geom_polygon fill out of the aes() call - as looks to be the case above, then you can use a filled shape for the point (21 is a circle) and set the fill attribute rather than the color in the aes() call. See below:
mock_data<-
data.frame(x=sample(1:10,20,T),
y=sample(1:10,20,T),
direction=sample(c("1way","2way"),20,T),
type=sample(c("origin","destination","node","lot"),20,T))
ggplot(mock_data) +
geom_polygon(aes(x=c(0,12,12,0),y=c(0,0,12,12),id=c(1,1,1,1)),fill="white") +
geom_point(aes(x=x,y=y,fill=type),size=10,shape=21) +
geom_line(aes(x=x,y=y,color=direction),size=2) +
scale_fill_brewer(palette="Greens") + scale_color_brewer(palette="Set1")
Failing that, you can plot a mock legend only using ggplot() and use grid.arrange() to plot it next to your graph minus the default legend. Let me know in the comments if you need help with that.

Resources