ggplot2 heatmaps: using different gradients for categories - r

This Learning R blog post shows how to make a heatmap of basketball stats using ggplot2. The finished heatmap looks like this:
My question (inspired by Jake who commented on the Learning R blog post) is: would it be possible to use different gradient colors for different categories of stats (offensive, defensive, other)?

First, recreate the graph from the post, updating it for the newer (0.9.2.1) version of ggplot2 which has a different theme system and attaches fewer packages:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
library("ggplot2")
library("plyr")
library("reshape2")
library("scales")
nba.m <- melt(nba)
nba.s <- ddply(nba.m, .(variable), transform,
rescale = scale(value))
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient(low = "white", high = "steelblue") +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))
Using different gradient colors for different categories is not all that straightforward. The conceptual approach, to map the fill to interaction(rescale, Category) (where Category is Offensive/Defensive/Other; see below) doesn't work because interacting a factor and continuous variable gives a discrete variable which fill can not be mapped to.
The way to get around this is to artificially do this interaction, mapping rescale to non-overlapping ranges for different values of Category and then use scale_fill_gradientn to map each of these regions to different color gradients.
First create the categories. I think these map to those in the comment, but I'm not sure; changing which variable is in which category is easy.
nba.s$Category <- nba.s$variable
levels(nba.s$Category) <-
list("Offensive" = c("PTS", "FGM", "FGA", "X3PM", "X3PA", "AST"),
"Defensive" = c("DRB", "ORB", "STL"),
"Other" = c("G", "MIN", "FGP", "FTM", "FTA", "FTP", "X3PP",
"TRB", "BLK", "TO", "PF"))
Since rescale is within a few (3 or 4) of 0, the different categories can be offset by a hundred to keep them separate. At the same time, determine where the endpoints of each color gradient should be, in terms of both rescaled values and colors.
nba.s$rescaleoffset <- nba.s$rescale + 100*(as.numeric(nba.s$Category)-1)
scalerange <- range(nba.s$rescale)
gradientends <- scalerange + rep(c(0,100,200), each=2)
colorends <- c("white", "red", "white", "green", "white", "blue")
Now replace the fill variable with rescaleoffset and change the fill scale to use scale_fill_gradientn (remembering to rescale the values):
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(fill = rescaleoffset), colour = "white") +
scale_fill_gradientn(colours = colorends, values = rescale(gradientends)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))
Reordering to get related stats together is another application of the reorder function on the various variables:
nba.s$variable2 <- reorder(nba.s$variable, as.numeric(nba.s$Category))
ggplot(nba.s, aes(variable2, Name)) +
geom_tile(aes(fill = rescaleoffset), colour = "white") +
scale_fill_gradientn(colours = colorends, values = rescale(gradientends)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))

Here's a simpler suggestion that uses ggplot2 aesthetics to map both gradients as well as color categories. Simply use an alpha-aesthetic to generate the gradient, and the fill-aesthetic for the category.
Here is the code to do so, refactoring Brian Diggs' response:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
library("ggplot2")
library("plyr")
library("reshape2")
library("scales")
nba.m <- melt(nba)
nba.s <- ddply(nba.m, .(variable), transform,
rescale = scale(value))
nba.s$Category <- nba.s$variable
levels(nba.s$Category) <- list("Offensive" = c("PTS", "FGM", "FGA", "X3PM", "X3PA", "AST"),
"Defensive" = c("DRB", "ORB", "STL"),
"Other" = c("G", "MIN", "FGP", "FTM", "FTA", "FTP", "X3PP", "TRB", "BLK", "TO", "PF"))
Then, normalise the rescale variable to between 0 and 1:
nba.s$rescale = (nba.s$rescale-min(nba.s$rescale))/(max(nba.s$rescale)-min(nba.s$rescale))
And now, do the plotting:
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(alpha = rescale, fill=Category), colour = "white") +
scale_alpha(range=c(0,1)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0)) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
Note the use of alpha=rescale and then the scaling of the alpha range using scale_alpha(range=c(0,1)), which can be adapted to change the range appropriately for your plot.

Related

ggplot: Add annotations using separate data above faceted chart

I'm trying to add set of markers with text above the top of a faceted chart to indicate certain points of interest in the value of x. Its important that they appear in the right position left to right (as per the main scale), including when the overall ggplot changes size.
Something like this...
However, I'm struggling to:
place it in the right vertical position (above the facets). In my
reprex below (a simplified version of the original), I tried using a
value of the factor (Merc450 SLC), but this causes issues such as adding that to
every facet including when it is not part of that facet and doesn't
actually go high enough. I also tried converting the factor to a number using as.integer, but this causes every facet to include all factor values, when they obviously shouldn't
apply to the chart as a whole, not each
facet
Note that in the full solution, the marker x values are independent of the main data.
I have tried using cowplot to draw it separately and overlay it, but that seems to:
affect the overall scale of the main plot, with the facet titles on the right being cropped
is not reliable in placing the markers at the exact location along the x scale
Any pointers welcome.
library(tidyverse)
mtcars2 <- rownames_to_column(mtcars, var = "car") %>%
mutate(make = stringr::word(car, 1)) %>%
filter(make >= "m" & make < "n")
markers <- data.frame(x = c(max(mtcars2$mpg), rep(runif(nrow(mtcars2), 1, max(mtcars2$mpg))), max(mtcars2$mpg))) %>%
mutate(name = paste0("marker # ", round(x)))
ggplot(mtcars2, aes()) +
# Main Plot
geom_tile(aes(x = mpg, y = car, fill = cyl), color = "white") +
# Add Markers
geom_point(data = markers, aes(x = x, y = "Merc450 SLC"), color = "red") +
# Marker Labels
geom_text(data = markers, aes(x = x, "Merc450 SLC",label = name), angle = 45, size = 2.5, hjust=0, nudge_x = -0.02, nudge_y = 0.15) +
facet_grid(make ~ ., scales = "free", space = "free") +
theme_minimal() +
theme(
# Facets
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines"),
plot.margin = margin(50, 20, 20, 20)
)
Perhaps draw two separate plots and assemble them together with patchwork:
library(patchwork)
p1 <- ggplot(markers, aes(x = x, y = 0)) +
geom_point(color = 'red') +
geom_text(aes(label = name),
angle = 45, size = 2.5, hjust=0, nudge_x = -0.02, nudge_y = 0.02) +
scale_y_continuous(limits = c(-0.01, 0.15), expand = c(0, 0)) +
theme_minimal() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank())
p2 <- ggplot(mtcars2, aes(x = mpg, y = car, fill = cyl)) +
geom_tile(color = "white") +
facet_grid(make ~ ., scales = "free", space = "free") +
theme_minimal() +
theme(
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines")
)
p1/p2 + plot_layout(heights = c(1, 9))
It required some workaround with plot on different plot and using cowplot alignment function to align them on the same axis. Here is a solution
library(tidyverse)
library(cowplot)
# define a common x_axis to ensure that the plot are on same scales
# This may not needed as cowplot algin_plots also adjust the scale however
# I tended to do this extra step to ensure.
x_axis_common <- c(min(mtcars2$mpg, markers$x) * .8,
max(mtcars2$mpg, markers$x) * 1.1)
# Plot contain only marker
plot_marker <- ggplot() +
geom_point(data = markers, aes(x = x, y = 0), color = "red") +
# Marker Labels
geom_text(data = markers, aes(x = x, y = 0,label = name),
angle = 45, size = 2.5, hjust=0, nudge_x = 0, nudge_y = 0.001) +
# using coord_cartesian to set the zone of plot for some scales
coord_cartesian(xlim = x_axis_common,
ylim = c(-0.005, 0.03), expand = FALSE) +
# using theme_nothing from cow_plot which remove all element
# except the drawing
theme_nothing()
# main plot with facet
main_plot <- ggplot(mtcars2, aes()) +
# Main Plot
geom_tile(aes(x = mpg, y = car, fill = cyl), color = "white") +
coord_cartesian(xlim = x_axis_common, expand = FALSE) +
# Add Markers
facet_grid(make ~ ., scales = "free_y", space = "free") +
theme_minimal() +
theme(
# Facets
strip.background = element_rect(fill="Gray90", color = "white"),
panel.background = element_rect(fill="Gray95", color = "white"),
panel.spacing.y = unit(.7, "lines"),
plot.margin = margin(0, 20, 20, 20)
)
Then align the plot and plot them using cow_plot
# align the plots together
temp <- align_plots(plot_marker, main_plot, axis = "rl",
align = "hv")
# plot them with plot_grid also from cowplot - using rel_heights for some
# adjustment
plot_grid(temp[[1]], temp[[2]], ncol = 1, rel_heights = c(1, 8))
Created on 2021-05-03 by the reprex package (v2.0.0)

r - column wise heatmap using ggplot2

I would really appreciate if anyone could guide me with the following challenge.
I am trying to build column wise heatmap. For each column, I want the lowest value to be green and highest value to be red. The current solution takes a matrix wide approach.
I saw the solution on Heat map per column with ggplot2. As you can see, I implemented the same code but I am not getting the desired result [picture below]
df <- data.frame(
F1 = c(0.66610194649319, 0.666123551800434,
0.666100611954119, 0.665991102703081,
0.665979885730484),
acc_of_pred = c(0.499541627510021, 0.49960260221954,
0.499646067768102, 0.499447308828986,
0.499379552967265),
expected_mean_return = c(2.59756065316356e-07, 2.59799087404167e-07,
2.86466725381146e-07, 2.37977452007967e-07,
2.94242908573705e-07),
win_loss_ratio = c(0.998168189343307, 0.998411671274781,
0.998585272507726, 0.997791676357902,
0.997521287688458),
corr_pearson = c(0.00161443345430616, -0.00248811119331013,
-0.00203407575954095, -0.00496817102369628,
-0.000140531627184482),
corr_spearman = c(0.00214838517340878, -0.000308343671725617,
0.00228492127281917, -0.000359577740835049,
0.000608090759428587),
roc_vec = c(0.517972308828151, 0.51743161463546,
0.518033230192484, 0.518033294993802,
0.517931553535524)
)
combo <- data.frame(combo = c("baseline_120", "baseline_20",
"baseline_60", "baseline_288",
"baseline_5760"))
df.scaled <- scale(df)
df.scaled <- cbind(df.scaled,combo)
df.melt <- melt(df.scaled, id.vars = "combo")
ggplot(df.melt, aes(combo, variable)) +
geom_tile(aes(fill = value), colour = "white") +
scale_fill_gradient(low = "green", high = "red") +
geom_text(aes(label=value)) +
theme_grey(base_size = 9) +
labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "none", axis.ticks = element_blank(),
axis.text.x = element_text(size = 9 * 0.8,
angle = 0, hjust = 0, colour = "grey50"))
You are nearly correct. The code you implemented is the same for plotting. But the person who asked the question did one step in data preparation, he added a scaling variable.
If you scale your variable before plotting it and using the scaled factor as fill argument it works (i just added the rescale in scale_fill_gradient in ggplot after calculating it):
df.melt <- melt(df.scaled, id.vars = "combo")
df.melt<- ddply(df.melt, .(combo), transform, rescale = rescale(value))
ggplot(df.melt, aes(combo, variable)) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient( low= "green", high = "red") +
geom_text(aes(label=round(value,4))) +
theme_grey(base_size = 9) +
labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "none", axis.ticks = element_blank(),
axis.text.x = element_text(size = 9 * 0.8,
angle = 0, hjust = 0, colour = "grey50"))
giving the plot:

Removing axis labelling for one geom when multiple geoms are present

All I want is this R code to display the names of players inside the "topName" while hiding the names inside the "otherNames" by plotting both of them using two different geom_col().
epldata <- read.csv(file = 'epldata.csv')
epldata$srno <- c(1:461)
attach(epldata)
points <- epldata[order(-fpl_points),]
detach(epldata)
topNames[24:461]<-NA epldata$topNames <- topNames
topPoints[24:461]<-NA epldata$topPoints <- topPoints
epldata$otherNames <- NA epldata$otherNames[24:461] <-
as.character(points$name[c(24:461)]) epldata$otherPoints <- NA
epldata$otherPoints[24:461] <-
as.numeric(points$fpl_points[c(24:461)])
ggplot(data = epldata)+ geom_col(aes(x=epldata$topNames,
y=epldata$topPoints), fill = "red", alpha = 1) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
annotate("text", x=epldata$topNames, y=-50, #epldata$topPoints,
label = epldata$topNames, fontface = 1, size = 2, hjust = 0)+ geom_col(aes(x=epldata$otherNames, y=epldata$otherPoints), fill
= "gray", alpha = 0.3)+ theme(legend.position = "none")+ #theme(axis.text.x = element_text(angle = 90, hjust = 1))+ xlab("Player Names")+ ylab("FPL Points")+ guides(fill=FALSE,
color=FALSE, guide = FALSE) + coord_flip() + theme(axis.text.y =
element_blank(),
axis.ticks.y = element_blank())
This is the kind of output I am looking for but without using the Annotate Hack that I am currently using but directly plotting the names on the axis.
Update : have added the entire code and the link to the data set is below :
https://drive.google.com/open?id=1KTitWDcLIBmeBsz8mLcHXDIyhQLZnlhS
Once you've created a list of topNames, you can use scale_x_continuous to display only these axis labels:
scale_x_discrete(breaks = topNames)
Also, rather than using two separate geom_col() geometries, you can create a new "highlight" column in the dataframe and use that with the fill and alpha aesthetics:
library(dplyr)
library(ggplot2)
# read data from google drive
id <- "1KTitWDcLIBmeBsz8mLcHXDIyhQLZnlhS" #google file ID
epldata <- read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id),
stringsAsFactors = FALSE)
N <- 24 #number of players to highlight
#get list of names of top N players
topNames <- epldata %>%
arrange(-fpl_points) %>%
head(N) %>%
pull(name)
#> Warning: package 'bindrcpp' was built under R version 3.5.1
# make variable for highlighting
epldata <- epldata %>%
mutate(highlight = ifelse(name %in% topNames, TRUE, FALSE))
ggplot(data = epldata,
aes(x = name, y = fpl_points, fill = highlight, alpha = highlight)) +
geom_col() +
scale_fill_manual(guide = FALSE,
values = c("gray", "red")) +
scale_alpha_manual(guide = FALSE,
values = c(0.4, 1)) +
scale_x_discrete(breaks = topNames) + #use breaks to determine axis labels
coord_flip() +
ylab("FPL Points") +
theme_classic() +
theme(axis.ticks.y = element_blank(),
axis.title.y = element_blank())
Created on 2018-09-19 by the reprex package (v0.2.1)

creating custom labels for multilayered plot using ggplot 2 in R

I have this plot
library(dplyr)
library(ggplot2)
indexYear <- as.numeric(2000:2010)
lDLatIndex <- rep.int(4,11)
LDLoneYear <- c(rep.int(3,5), rep.int(2,6))
hba1catIndex <- c(rep.int(8,6), rep.int(7.5,5))
hba1coneYear <- rep.int(7,11)
LDLeffect <- data.frame(indexYear, lDLatIndex, hba1catIndex, hba1coneYear)
LDLeffect %>%
ggplot(., aes(x = indexYear))+
geom_line(aes(y = lDLatIndex, colour=rgb(237/255, 115/255,116/255)), linetype = "solid" , size = 2)+
theme_classic()+
theme(legend.position = "bottom")+
ylab("mean LDL cholesterol (mmol/l) ")+
xlab("Calendar year")+
theme(axis.title = element_text(size = 17, face="bold"), axis.text = element_text(size = 17, face = "bold"))+
scale_x_continuous(breaks = seq(2000,2015, by=1),labels = c(2000,rep("",4),2005,rep("",4), 2010, rep("",4),2015))+
scale_y_continuous(sec.axis = sec_axis(~(.-2.15)*10.929, name = "mean HbA1c (%) "))+
geom_line(aes(y = LDLoneYear, colour=rgb(237/255, 115/255,116/255)), linetype = "dashed" , size = 2)+
geom_line(aes(y = hba1coneYear, colour=rgb(152/255, 201/255,139/255)), linetype = "twodash" , size = 2)+
geom_line(aes(y = hba1catIndex, colour=rgb(152/255, 201/255,139/255)), linetype = "F1" , size = 2)
I know that usually, the best option is to supply data in long format for ggplot, but I couldn't get it to work. The plot above produces a strange legend that I cannot understand how got there.
I see that the names to the legend added are from the colour definitions.
What I want to make is legends that show the colour and linetype and name for each variable plotted, preferably with the option of adding custom names. I looked through a lot of pages with suggestions, but most makes use of long format which I cannot figure out because I wanted different linetypes and colours by two and two. The rest couldn't help me address this strange expression in the labelling.
Would below proposal go into right direction? Main points are: using "melt" from reshape2 for bringing data in ggplot-friendly shape. And with scale_linetype_manual and scale_colour_manual I'm explicitly providing colours and line types.
library(dplyr)
library(ggplot2)
library(reshape2) ## for "melt"
indexYear <- as.numeric(2000:2010)
lDLatIndex <- rep.int(4,11)
LDLoneYear <- c(rep.int(3,5), rep.int(2,6))
hba1catIndex <- c(rep.int(8,6), rep.int(7.5,5))
hba1coneYear <- rep.int(7,11)
LDLeffect <- data.frame(indexYear, lDLatIndex, hba1catIndex, hba1coneYear, LDLoneYear)
melted_df <- melt(LDLeffect, id.vars="indexYear", measure.vars=c("lDLatIndex", "hba1catIndex", "hba1coneYear", "LDLoneYear"))
ggplot(melted_df, aes(x=indexYear, value, colour=variable)) +
geom_line(aes(linetype=variable), size = 2) +
scale_linetype_manual(values=c("F1", "twodash", "solid", "dashed")) +
scale_colour_manual(values=c(rgb(237/255, 115/255,116/255), rgb(237/255, 115/255,116/255), rgb(152/255, 201/255,139/255), rgb(152/255, 201/255,139/255))) +
theme_classic() +
theme(legend.position = "bottom")+
ylab("mean LDL cholesterol (mmol/l)")+
xlab("Calendar year")+
theme(axis.title = element_text(size = 17, face="bold"), axis.text = element_text(size = 17, face = "bold"))+
scale_x_continuous(breaks = seq(2000,2015, by=1),labels = c(2000,rep("",4),2005,rep("",4), 2010, rep("",4),2015))+
scale_y_continuous(sec.axis = sec_axis(~(.-2.15)*10.929, name = "mean HbA1c (%)"))

How to add axis text in this negative and positive bars differently using ggplot2?

I've drawed bar graph with negative and positive bars which is familiar to the research. However, my code seems extremely inconvenient and verbose usinggraphics::plot() and graphics::text() as showed below. Try as I may, I could find the solution using element_text to fulfill in ggplot2. Please help or try to give some ideas how to achieve this in ggplot2.Thanks in advance.
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
# bar graph
tiff(file="lefse.tiff",width=2000,height=2000,res=400)
par(mar=c(5,2,1,1))
barplot(df[,3],horiz=T,xlim=c(-6,6),xlab="LDA score (log 10)",
col=c(rep("forestgreen",length(which(df[,2]=="groupA"))),
rep("goldenrod",length(which(df[,2]=="groupB")))))
axis(1,at=seq(-6,6,by=1))
# add text
text(0.85,36.7,label=df[,1][31],cex=0.6);text(0.75,35.4,label=df[,1][30],cex=0.6)
text(0.75,34.1,label=df[,1][29],cex=0.6);text(0.85,33.0,label=df[,1][28],cex=0.6)
text(0.75,31.8,label=df[,1][27],cex=0.6);text(0.6,30.6,label=df[,1][26],cex=0.6)
text(0.8,29.5,label=df[,1][25],cex=0.6);text(0.85,28.3,label=df[,1][24],cex=0.6)
text(0.45,27.1,label=df[,1][23],cex=0.6);text(0.4,25.9,label=df[,1][22],cex=0.6)
text(0.55,24.7,label=df[,1][21],cex=0.6);text(0.55,23.5,label=df[,1][20],cex=0.6)
text(0.85,22.3,label=df[,1][19],cex=0.6);text(-0.75,21.1,label=df[,1][18],cex=0.6)
text(-1,19.9,label=df[,1][17],cex=0.6);text(-1,18.8,label=df[,1][16],cex=0.6)
text(-0.85,17.6,label=df[,1][15],cex=0.6);text(-0.85,16.3,label=df[,1][14],cex=0.6)
text(-0.7,15.1,label=df[,1][13],cex=0.6);text(-0.65,13.9,label=df[,1][12],cex=0.6)
text(-0.85,12.7,label=df[,1][11],cex=0.6);text(-1.05,11.5,label=df[,1][10],cex=0.6)
text(-0.85,10.3,label=df[,1][9],cex=0.6);text(-0.85,9.1,label=df[,1][8],cex=0.6)
text(-0.47,7.9,label=df[,1][7],cex=0.6);text(-0.85,6.7,label=df[,1][6],cex=0.6)
text(-0.49,5.5,label=df[,1][5],cex=0.6);text(-1.44,4.3,label=df[,1][4],cex=0.6)
text(-0.49,3.1,label=df[,1][3],cex=0.6);text(-0.93,1.9,label=df[,1][2],cex=0.6)
text(-0.69,0.7,label=df[,1][1],cex=0.6)
# add lines
segments(0,-1,0,40,lty=3,col="grey")
segments(2,-1,2,40,lty=3,col="grey")
segments(4,-1,4,40,lty=3,col="grey")
segments(6,-1,6,40,lty=3,col="grey")
segments(4,-1,4,40,lty=3,col="grey")
segments(-2,-1,-2,40,lty=3,col="grey")
segments(-4,-1,-4,40,lty=3,col="grey")
segments(-6,-1,-6,40,lty=3,col="grey")
legend("topleft",bty="n",cex=0.65,inset=c(0.01,-0.02),ncol=2,
legend=c("groupA","groupB"),
col=c("forestgreen", "goldenrod"),pch=c(15,15))
dev.off()
Here's a solution using dplyr to create some extra columns for the label position and the justification, and then theming the plot to match reasonably closely what you originally had:
library("dplyr")
library("ggplot2")
df <- df %>%
mutate(
genus = factor(genus, levels = genus[order(value, decreasing = TRUE)]),
label_y = ifelse(value < 0, 0.2, -0.2),
label_hjust = ifelse(value < 0, 0, 1)
)
my_plot <- ggplot(df, aes(x = genus, y = value, fill = class)) +
geom_bar(stat = "identity", col = "black") +
geom_text(aes(y = label_y, label = genus, hjust = label_hjust)) +
coord_flip() +
scale_fill_manual(values = c(groupA = "forestgreen", groupB = "goldenrod")) +
theme_minimal() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top",
legend.justification = 0.05,
legend.title = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_line(colour = "grey80", linetype = "dashed"),
panel.grid.minor.x = element_blank()) +
scale_y_continuous(expression(log[10](italic("LDA score"))),
breaks = -6:6, limits = c(-6, 6))
print(my_plot)
ggsave("lefse.tiff", width = 5, height = 5, dpi = 400, my_plot)
I would try this:
library(ggplot2)
# change the factor levels so it will be displayed in correct order
df$genus <- factor(df$genus, levels = as.character(df$genus))
ggplot(df, aes(x = genus, y = value)) +
geom_bar(aes(fill = class), stat = 'identity') + # color by class
coord_flip() + # horizontal bars
geom_text(aes(y = 0, label = genus, hjust = as.numeric(value > 0))) + # label text based on value
theme(axis.text.y = element_blank())
In the above, hjust will change the direction of the text relative to its y position (flipped to x now), which is similar to pos parameter in base R plot. So you code could also be simplified with a vector for pos argument to text function.
Two options:
library(ggplot2)
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
ggplot(df, aes(reorder(genus, -value), value, fill = class)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label = genus,
y = ifelse(value < 1, 1.5, -1.5)), size = 2.5) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
Or this:
library(ggplot2)
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
ggplot(df, aes(reorder(genus, -value), value, fill = class)) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("genus")

Resources