Related
I work inside a research environment and I can't copy paste the code I used there, but I have previously generated this plot, and have been helped by various people in labelling it with the count number. The problem arises when I screenshot the plot from inside the research environment, and the legends are illegible. I am hoping I can address this by making the labels (including the X-axis label) all bold.
I used some mock-data outside the environment and this is what I have so far.
library(ggplot2)
library(reshape2)
md.df = melt(df, id.vars = c('Group.1'))
tmp = c("virginica","setosa","versicolor")
md.df2 = md.df[order(match(md.df$Group.1, tmp)),]
md.df2$Group.1 = factor(as.character(md.df2$Group.1), levels = unique(md.df2$Group.1))
ggplot(md.df2, aes(x = Group.1, y = value, group = variable, fill = variable)) +
geom_bar(stat="identity",color='black', position = "dodge") +
xlab('Species') + ylab('Values') + theme_bw()+
ylim(0,8)+
theme(text = element_text(size=16),
axis.text.x = element_text(angle=0, hjust=.5),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))+
ggtitle("Order variables in barplot")+
geom_text(aes(label=value), vjust=-0.3, size=4, # adding values
position = position_dodge(0.9))+ element_text(face="bold")
I need to make the labels onto bold, and the element_text isn't working mainly because I am probably using it in the wrong way. I'd appreciate any help with this.
An example of this plot which I haven't been able to find mock data to re-create outside the environment, have asked a question about in the past, is the one where the axis ticks also need to be made bold. This is because the plot is illegible from the outside.
I've tried addressing the illegibility by saving all my plots using ggsave in 300 resolution but it is very illegible.
I'd appreciate any help with this, and thank you for taking the time to help with this.
As I mentioned in my comment to make the value labels bold use geom_text(..., fontface = "bold") and to make the axis labels bold use axis.text.x = element_text(angle=0, hjust=.5, face = "bold").
Using a a minimal reproducible example based on the ggplot2::mpg dataset:
library(ggplot2)
library(dplyr)
# Create exmaple data
md.df2 <- mpg |>
count(Group.1 = manufacturer, name = "value") |>
mutate(
variable = value >= max(value),
Group.1 = reorder(Group.1, -value)
)
ggplot(md.df2, aes(x = Group.1, y = value, group = variable, fill = variable)) +
geom_col(color = "black", position = "dodge") +
geom_text(aes(label = value), vjust = -0.3, size = 4, position = position_dodge(0.9), fontface = "bold") +
labs(x = "Species", y = "Values", title = "Order variables in barplot") +
theme_bw() +
theme(
text = element_text(size = 16),
axis.text.x = element_text(angle = 90, vjust = .5, face = "bold"),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)
)
In addition to #stefan 's answer, you can also set tick length and thickness like so:
## ... plot code +
theme(## other settings ...,
axis.ticks = element_line(linewidth = 5),
axis.ticks.length = unit(10, 'pt'))
However, the example "phenotype conditions" will probably remain hard to appraise, regardless of optimization on the technical level. On the conceptual level it might help to display aggregates, e. g. condition counts per Frequency and supplement a textual list of conditions (sorted alphabetically and by Frequency, or the other way round) for those readers who indeed want to look up any specific condition.
I wanted the barplot to appear in two forms, so I created repeated data and used it as an input.
So I used the data in the form below.
I put the data in the form above and wrote the following code to use it.
Select <- "Mbp"
if(Select == "Mbp"){
Select <- "Amount of sequence (Mbp)"
} else if (Select == "Gbp"){
Select <- "Amount of sequence (Gbp)"
}
ggplot(G4, aes(x = INDV, y = Bp, fill = Group)) + theme_light() +
geom_bar(stat = 'identity', position = 'dodge', width = 0.6) + coord_flip() +
scale_x_discrete(limits = rev(unname(unlist(RAW_TRIM[1])))) +
scale_fill_discrete(breaks = c("Raw data","Trimmed data"))+
scale_y_continuous(labels = scales::comma, position = "right") +
theme(axis.text = element_text(colour = "black", face = "bold", size = 15)) +
theme(legend.position = "bottom", legend.text = element_text(face = "bold", size = 15),
legend.title = element_blank()) + ggtitle(Select) + xlab("") + ylab("") +
theme(plot.title = element_text(size = 25, face = "bold", hjust = 0.5))
Then I can get a plot like the one below, where I want the red graph to be on top of the green graph.
I also tried changing the order of the data, and several sites such as the Internet and Stack Overflow provided solutions and used them, but not a single solution was able to solve them.
If you know a solution, please let me know how to modify the code or change the data.
thank you.
You seem to be asking more than one question at once here, but the main one is: why do the bars for Raw appear under those for Trimmed? The short answer is: factor levels and the behaviour of coord_flip().
Let's make a toy dataset:
library(tidyverse)
G4 <- data.frame(INDV = c("C_01", "C_01", "C_41", "C_41"),
Group = c("Raw data", "Trimmed data", "Raw data", "Trimmed data"),
Bp = c(200, 100, 500, 400))
A simple dodged bar chart. Note that Raw comes before Trimmed, because R is before T in the alphabet:
G4 %>%
ggplot(aes(INDV, Bp)) +
geom_col(aes(fill = Group),
position = "dodge")
Now we coord_flip:
G4 %>%
ggplot(aes(INDV, Bp)) +
geom_col(aes(fill = Group),
position = "dodge") +
coord_flip()
This has the effect of reversing the variables, so Raw is now below Trimmed.
We can fix that by altering factor levels. As there are only two groups we can just reverse them using fct_rev() from the forcats package:
G4 %>%
ggplot(aes(INDV, Bp)) +
geom_col(aes(fill = fct_rev(Group)),
position = "dodge") +
coord_flip()
The bar for Raw is now on top but unfortunately, the colours are now reversed so that Raw bars are green. We can fix that using scale_fill_manual():
G4 %>%
ggplot(aes(INDV, Bp)) +
geom_col(aes(fill = fct_rev(Group)),
position = "dodge") +
coord_flip() +
scale_fill_manual(values = c("#00BFC4", "#F8766D"))
Now the Raw bars are on top, and they are red.
I'm trying to plot Alluvial Plots using ggplot. So far it went well until I want to try to clean the plot up.
As you can see on the plot, from left to right, the first stratum/column is the ID column then it follows by a column of labels: disease risk. What I want to achieve is in the out plot, instead of having the patient IDs zigzagging, I want them to be ordered by disease risk column, so that all the high risk IDs are all together on top, followed by low risk then the not filled ones. In this way it is much easier to see if there's any relations.
I have looked around for the arrange() and order() functions, they seem to do the trick for my actual input data but once I pass that data frame in ggplot, the output figure is still scrambled.
I thought of set the IDs to factor, then use levels=.... But this is not very smart if the patient ID keeps growing.
Is there a smarter way? please enlighten me. I have attached a link towards the sample data.
https://drive.google.com/file/d/16Pd8V3MCgEHmZEButVi2UjDiwZWklK-T/view?usp=sharing
My code to plot the graph :
library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolors <- c("Black")
#read the data
CLL3S.plusrec <- read.csv("xxxx.CSV", as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS, levels = c("low_risk", "high_risk", "Not filled"))
CLL3S.plusrec$`Enriched response phenotype` <- factor(CLL3S.plusrec$`Enriched response phenotype`, levels = c("Live cells","Pre-dead", "TN & PDB", "PDB & Lenalidomide", "TN & STSVEN & Live cells","Mixed"))
#here I reorder the dataframe and it looks good
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec, CLL3S.plusrec[order(risk_by_DS),])
d <-ggplot(OR, aes(y = count,
axis1= Patient.ID,
axis2= risk_by_DS,
axis3 = `Cluster assigned consensus`,
axis4 = `Cluster assigned single drug`,
axis5 = `Enriched response phenotype`
)) +
scale_x_discrete(limits = c("Patient ID","Disease Risk", "Consensus cluster", "Single-drug cluster", "Enriched drug response by Phenoptype")) +
geom_alluvium(aes(fill=`Cluster assigned consensus`)) +
geom_stratum(width = 1/3, fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:8],mycolor1[1:6]), color = "red") +
#geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size=3) +
theme(axis.title.x = element_text(size = 15, face="bold"))+
theme(axis.title.y = element_text(size = 15, face="bold"))+
theme(axis.text.x = element_text(size = 10, face="bold")) +
theme(axis.text.y = element_text(size = 10, face="bold")) +
labs(fill = "Consensus clusters")+
guides(fill=guide_legend(override.aes = list(color=mycolors)))+
ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters",
"3S stimulated patients")
print(d)
Not sure if this is what you want, try formating the risk column in this way:
library(tidyr)
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolors <- c("Black")
#read the data
CLL3S.plusrec <- read.csv("test data.CSV", as.is = T)
CLL3S.plusrec$risk_by_DS <- factor(CLL3S.plusrec$risk_by_DS,
levels = c("high_risk","low_risk","Not filled"),ordered = T)
CLL3S.plusrec$Enriched.response.phenotype <- factor(CLL3S.plusrec$Enriched.response.phenotype, levels = c("Live cells","Pre-dead", "TN & PDB", "PDB & Lenalidomide", "TN & STSVEN & Live cells","Mixed"))
#here I reorder the dataframe and it looks good
#but the output ggplot changes the order of ID in the output graph
OR <- with(CLL3S.plusrec, CLL3S.plusrec[order(risk_by_DS),])
ggplot(OR, aes(y = count,
axis1= reorder(Patient.ID,risk_by_DS),
axis2= risk_by_DS,
axis3 = reorder(Cluster.assigned.consensus,risk_by_DS),
axis4 = reorder(Cluster.assigned.single.drug,risk_by_DS),
axis5 = reorder(Enriched.response.phenotype,risk_by_DS)
)) +
scale_x_discrete(limits = c("Patient ID","Disease Risk", "Consensus cluster", "Single-drug cluster", "Enriched drug response by Phenoptype")) +
geom_alluvium(aes(fill=Cluster.assigned.consensus)) +
geom_stratum(width = 1/3, fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:8],mycolor1[1:6]), color = "red") +
#geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size=3) +
theme(axis.title.x = element_text(size = 15, face="bold"))+
theme(axis.title.y = element_text(size = 15, face="bold"))+
theme(axis.text.x = element_text(size = 10, face="bold")) +
theme(axis.text.y = element_text(size = 10, face="bold")) +
labs(fill = "Consensus clusters")+
guides(fill=guide_legend(override.aes = list(color=mycolors)))+
ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters",
"3S stimulated patients")
Output:
Also in my read.csv() the quotes got off and dots are in the variables. That is why your original quoted variables now have dots. Maybe an issue from reading.
Update:
#Update
OR <- with(CLL3S.plusrec, CLL3S.plusrec[order(risk_by_DS),])
OR <- OR[order(OR$risk_by_DS,OR$Patient.ID),]
OR$Patient.ID <- factor(OR$Patient.ID,levels = unique(OR$Patient.ID),ordered = T)
#Plot
ggplot(OR, aes(y = count,
axis1= reorder(Patient.ID,risk_by_DS),
axis2= risk_by_DS,
axis3 = reorder(Cluster.assigned.consensus,risk_by_DS),
axis4 = reorder(Cluster.assigned.single.drug,risk_by_DS),
axis5 = reorder(Enriched.response.phenotype,risk_by_DS)
)) +
scale_x_discrete(limits = c("Patient ID","Disease Risk", "Consensus cluster", "Single-drug cluster", "Enriched drug response by Phenoptype")) +
geom_alluvium(aes(fill=Cluster.assigned.consensus)) +
geom_stratum(width = 1/3, fill = c(mycolor1[1:69],mycolor1[1:3],mycolor1[1:8],mycolor1[1:8],mycolor1[1:6]), color = "red") +
#geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum)), size=3) +
theme(axis.title.x = element_text(size = 15, face="bold"))+
theme(axis.title.y = element_text(size = 15, face="bold"))+
theme(axis.text.x = element_text(size = 10, face="bold")) +
theme(axis.text.y = element_text(size = 10, face="bold")) +
labs(fill = "Consensus clusters")+
guides(fill=guide_legend(override.aes = list(color=mycolors)))+
ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters",
"3S stimulated patients")
Output:
I have a dataset that looks like the following:
df <- data.frame(Name=rep(c('Sarah', 'Casey', 'Mary', 'Tom'), 3),
Scale=rep(c('Scale1', 'Scale2', 'Scale3'), 4),
Score=sample(1:7, 12, replace=T))
I am trying to create a barchat in ggplot2 that currently looks like this:
ggplot(df, aes(x=Name, y=Score, fill=Scale)) + geom_bar(stat='identity', position='dodge') +
coord_flip() +
scale_y_continuous(breaks=seq(0, 7, 1), limits = c(0, 7)) +
scale_x_discrete() +
scale_fill_manual(values=c('#253494', '#2c7fb8', '#000000')) +
theme(panel.background = element_blank(),
legend.position = 'right',
axis.line = element_line(),
axis.title = element_blank(),
axis.text = element_text(size=10))
However, I only want to show one observation (one Name) at a time. Is this possible to do without creating a ton of separate datasets, one for each person? I would like the end result to look like the example below, where I can just iterate through the names to produce a separate plot for each, or some similar process.
# Trying to avoid creating separate datasets, but for the sake of the example:
df2 <- data.frame(Name=rep(c('Sarah'), 3),
Scale=c('Scale1', 'Scale2', 'Scale3'),
Score=sample(1:7, 3, replace=T))
ggplot(df2, aes(x=Name, y=Score, fill=Scale)) + geom_bar(stat='identity', position='dodge') +
coord_flip() +
scale_y_continuous(breaks=seq(0, 7, 1), limits = c(0, 7)) +
scale_x_discrete() +
scale_fill_manual(values=c('#253494', '#2c7fb8', '#000000')) +
theme(panel.background = element_blank(),
legend.position = 'right',
axis.line = element_line(),
axis.title = element_blank(),
axis.text = element_text(size=10))
Since your data is already tidy ie. in long format, you can use facet_wrap as suggested and set the scales as "free" thus creating facets with your different Name groups.
df %>% ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip() +
facet_wrap(~Name, scales = "free")
You can get rid of the facet labels or the axis labels depending which you prefer.
EDIT: in response to comment.
You can use the same data frame to create seperate plots by just piping a filter in at the start, hence,
df %>%
filter(Name == "Sarah") %>%
ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip()
Since you are using Rmarkdown you could throw a for loop around that to plot all the names
for(i in c("Sarah", "Casey", "Mary", "Tom")){
df %>%
filter(Name == i) %>%
ggplot(aes(y = Score, x = Name)) +
geom_bar(stat = "identity", aes(colour = Scale, fill = Scale),
position = "dodge") +
coord_flip()
}
If you want to arrange all these into a group you can use ggpubr::ggarrange to place all the plots into the same object.
facet_grid(.~Name)
Maybe somehow implement this, it'll plot them all, but should do so in individual plots.
I've been stuck on an issue and can't find a solution. I've tried many suggestions on Stack Overflow and elsewhere about manually ordering a stacked bar chart, since that should be a pretty simple fix, but those suggestions don't work with the huge complicated mess of code I plucked from many places. My only issue is y-axis item ordering.
I'm making a series of stacked bar charts, and ggplot2 changes the ordering of the items on the y-axis depending on which dataframe I am trying to plot. I'm trying to make 39 of these plots and want them to all have the same ordering. I think ggplot2 only wants to plot them in ascending order of their numeric mean or something, but I'd like all of the bar charts to first display the group "Bird Advocates" and then "Cat Advocates." (This is also the order they appear in my data frame, but that ordering is lost at the coord_flip() point in plotting.)
I think that taking the data frame through so many changes is why I can't just add something simple at the end or use the reorder() function. Adding things into aes() also doesn't work, since the stacked bar chart I'm creating seems to depend on those items being exactly a certain way.
Here's one of my data frames where ggplot2 is ordering my y-axis items incorrectly, plotting "Cat Advocates" before "Bird Advocates":
Group,Strongly Opposed,Opposed,Slightly Opposed,Neutral,Slightly Support,Support,Strongly Support
Bird Advocates,0.005473026,0.010946052,0.012509773,0.058639562,0.071149335,0.31118061,0.530101642
Cat Advocates,0.04491726,0.07013396,0.03624901,0.23719464,0.09141056,0.23404255,0.28605201
And here's all the code that takes that and turns it into a plot:
library(ggplot2)
library(reshape2)
library(plotly)
#Importing data from a .csv file
data <- read.csv("data.csv", header=TRUE)
data$s.Strongly.Opposed <- 0-data$Strongly.Opposed-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Opposed <- 0-data$Opposed-data$Slightly.Opposed-.5*data$Neutral
data$s.Slightly.Opposed <- 0-data$Slightly.Opposed-.5*data$Neutral
data$s.Neutral <- 0-.5*data$Neutral
data$s.Slightly.Support <- 0+.5*data$Neutral
data$s.Support <- 0+data$Slightly.Support+.5*data$Neutral
data$s.Strongly.Support <- 0+data$Support+data$Slightly.Support+.5*data$Neutral
#to percents
data[,2:15]<-data[,2:15]*100
#melting
mdfr <- melt(data, id=c("Group"))
mdfr<-cbind(mdfr[1:14,],mdfr[15:28,3])
colnames(mdfr)<-c("Group","variable","value","start")
#remove dot in level names
mylevels<-c("Strongly Opposed","Opposed","Slightly Opposed","Neutral","Slightly Support","Support","Strongly Support")
mdfr$variable<-droplevels(mdfr$variable)
levels(mdfr$variable)<-mylevels
pal<-c("#bd7523", "#e9aa61", "#f6d1a7", "#999999", "#c8cbc0", "#65806d", "#334e3b")
ggplot(data=mdfr) +
geom_segment(aes(x = Group, y = start, xend = Group, yend = start+value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
geom_hline(yintercept = 0, color =c("#646464")) +
coord_flip() +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white")) +
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
The plot:
I think this works, you may need to play around with the axis limits/breaks:
library(dplyr)
mdfr <- mdfr %>%
mutate(group_n = as.integer(case_when(Group == "Bird Advocates" ~ 2,
Group == "Cat Advocates" ~ 1)))
ggplot(data=mdfr) +
geom_segment(aes(x = group_n, y = start, xend = group_n, yend = start + value, colour = variable,
text=paste("Group: ",Group,"<br>Percent: ",value,"%")), size = 5) +
scale_x_continuous(limits = c(0,3), breaks = c(1, 2), labels = c("Cat", "Bird")) +
geom_hline(yintercept = 0, color =c("#646464")) +
theme(legend.position="top") +
theme(legend.key.width=unit(0.5,"cm")) +
coord_flip() +
guides(col = guide_legend(ncol = 12)) + #has 7 real columns, using to adjust legend position
scale_color_manual("Response", labels = mylevels, values = pal, guide="legend") +
theme(legend.title = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.ticks = element_blank()) +
theme(axis.text.x = element_blank()) +
theme(legend.key = element_rect(fill = "white"))+
scale_y_continuous(breaks=seq(-100,100,100), limits=c(-100,100)) +
theme(panel.background = element_rect(fill = "#ffffff"),
panel.grid.major = element_line(colour = "#CBCBCB"))
produces this plot:
You want to factor the 'Group' variable in the order by which you want the bars to appear.
mdfr$Group <- factor(mdfr$Group, levels = c("Bird Advocates", "Cat Advocates")