r - column wise heatmap using ggplot2 - r

I would really appreciate if anyone could guide me with the following challenge.
I am trying to build column wise heatmap. For each column, I want the lowest value to be green and highest value to be red. The current solution takes a matrix wide approach.
I saw the solution on Heat map per column with ggplot2. As you can see, I implemented the same code but I am not getting the desired result [picture below]
df <- data.frame(
F1 = c(0.66610194649319, 0.666123551800434,
0.666100611954119, 0.665991102703081,
0.665979885730484),
acc_of_pred = c(0.499541627510021, 0.49960260221954,
0.499646067768102, 0.499447308828986,
0.499379552967265),
expected_mean_return = c(2.59756065316356e-07, 2.59799087404167e-07,
2.86466725381146e-07, 2.37977452007967e-07,
2.94242908573705e-07),
win_loss_ratio = c(0.998168189343307, 0.998411671274781,
0.998585272507726, 0.997791676357902,
0.997521287688458),
corr_pearson = c(0.00161443345430616, -0.00248811119331013,
-0.00203407575954095, -0.00496817102369628,
-0.000140531627184482),
corr_spearman = c(0.00214838517340878, -0.000308343671725617,
0.00228492127281917, -0.000359577740835049,
0.000608090759428587),
roc_vec = c(0.517972308828151, 0.51743161463546,
0.518033230192484, 0.518033294993802,
0.517931553535524)
)
combo <- data.frame(combo = c("baseline_120", "baseline_20",
"baseline_60", "baseline_288",
"baseline_5760"))
df.scaled <- scale(df)
df.scaled <- cbind(df.scaled,combo)
df.melt <- melt(df.scaled, id.vars = "combo")
ggplot(df.melt, aes(combo, variable)) +
geom_tile(aes(fill = value), colour = "white") +
scale_fill_gradient(low = "green", high = "red") +
geom_text(aes(label=value)) +
theme_grey(base_size = 9) +
labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "none", axis.ticks = element_blank(),
axis.text.x = element_text(size = 9 * 0.8,
angle = 0, hjust = 0, colour = "grey50"))

You are nearly correct. The code you implemented is the same for plotting. But the person who asked the question did one step in data preparation, he added a scaling variable.
If you scale your variable before plotting it and using the scaled factor as fill argument it works (i just added the rescale in scale_fill_gradient in ggplot after calculating it):
df.melt <- melt(df.scaled, id.vars = "combo")
df.melt<- ddply(df.melt, .(combo), transform, rescale = rescale(value))
ggplot(df.melt, aes(combo, variable)) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient( low= "green", high = "red") +
geom_text(aes(label=round(value,4))) +
theme_grey(base_size = 9) +
labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "none", axis.ticks = element_blank(),
axis.text.x = element_text(size = 9 * 0.8,
angle = 0, hjust = 0, colour = "grey50"))
giving the plot:

Related

Add an isoquant curve to a scatterplot made in ggplot

I was curious if anyone knew how to create a heatmap with an isoquant curve that identifies all x and y combinations whose product equals a certain constant. The final product should look like the following picture:
Scatterplot with isoquant curve
Here is the code I use to generate my plot, but as of right now I can't get the curve in the plot as depicted in the picture above:
vs.vpd.by.drop_days <- ggplot(event_drops, aes(vs, vpd)) +
geom_point(aes(color = day_since), size = 2, alpha = 0.2) +
scale_color_gradientn(colors = c("darkblue","green","yellow","red"),
breaks = c(0,25,50,75),
limits = c(0,75),
name = "Days since \n first drop") +
ggtitle("Drops by VPD and Wind Speed") +
theme(plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
axis.title = element_text(size = 15)) +
xlab(label = "Wind Speed (mph)") +
ylab(label = "Vapor Pressure Deficit") +
expand_limits(x = 0, y = 0) +
scale_x_continuous(expand = c(0, 0), limits = c(0,20)) +
scale_y_continuous(expand = c(0, 0), limits = c(0,5))
vs.vpd.by.drop_days
One option to achieve that would be geom_function.
Using mtcars as example data:
library(ggplot2)
ggplot(mtcars, aes(hp, mpg, color = disp)) +
geom_point() +
geom_function(fun = function(x) 3000 / x)

Convert col plot to pyramid format

Is this type of plot possible with ggplot? I've seen examples such as this one:
http://alburez.me/2018-03-20-Population-pyramids-in-R-for-beginners/
But actually I want something much more simpler like this one:
Is this possible in ggplot? My first instinct is to create a bar plot/col plot but cannot make the columns centered like the ones in the attached picture above.
Sure could this be achieved via ggplot2. One option would be to make use of geom_rect like so:
library(ggplot2)
dat <- data.frame(
name = factor(c("Gen Z", "Millenials", "Gen X")),
name_y = c(3:1),
pct = c(.3, .51, .19)
)
midpoint <- max(dat$pct) / 2
ggplot(dat) +
geom_rect(aes(xmin = midpoint - pct / 2, xmax = midpoint + pct / 2,
ymin = name_y - .45, ymax = name_y + .45),
fill = "steelblue", color = "white") +
geom_text(aes(x = midpoint, y = name_y, label = scales::percent(pct)), color = "white", fontface = "bold") +
scale_y_continuous(breaks = unique(dat$name_y), labels = unique(dat$name)) +
theme_minimal() +
labs(x = NULL, y = NULL) +
theme(axis.line.x = element_blank(), axis.text.x = element_blank(),
panel.grid = element_blank())

Combine legend for fill and colour ggplot to give only single legend

I am plotting a smooth to my data using geom_smooth and using geom_ribbon to plot shaded confidence intervals for this smooth. No matter what I try I cannot get a single legend that represents both the smooth and the ribbon correctly, i.e I am wanting a single legend that has the correct colours and labels for both the smooth and the ribbon. I have tried using + guides(fill = FALSE), guides(colour = FALSE), I also read that giving both colour and fill the same label inside labs() should produce a single unified legend.
Any help would be much appreciated.
Note that I have also tried to reset the legend labels and colours using scale_colour_manual()
The below code produces the below figure. Note that there are two curves here that are essentially overlapping. The relabelling and setting couours has worked for the geom_smooth legend but not the geom_ribbon legend and I still have two legends showing which is not what I want.
ggplot(pred.dat, aes(x = age.x, y = fit, colour = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci, fill = tagged), alpha = 0.2, colour = NA) +
theme_classic() +
labs(x = "Age (days since hatch)", y = "Body mass (g)", colour = "", fill = "") +
scale_colour_manual(labels = c("Untagged", "Tagged"), values = c("#3399FF", "#FF0033")) +
theme(axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.text = element_text(size = 12))
The problem is that you provide new labels for the color-aesthetic but not for the fill-aesthetic. Consequently ggplot shows two legends because the labels are different.
You can either also provide the same labels for the fill-aesthetic (code option #1 below) or you can set the labels for the levels of your grouping variable ("tagged") before calling ggplot (code option #2).
library(ggplot2)
#make some data
x = seq(0,2*pi, by = 0.01)
pred.dat <- data.frame(x = c(x,x),
y = c(sin(x), cos(x)) + rnorm(length(x) * 2, 0, 1),
tag = rep(0:1, each = length(x)))
pred.dat$lci <- c(sin(x), cos(x)) - 0.4
pred.dat$uci <- c(sin(x), cos(x)) + 0.4
#option 1: set labels within ggplot call
pred.dat$tagged <- as.factor(pred.dat$tag)
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
scale_color_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
scale_fill_manual(labels = c("untagged", "tagged"), values = c("#F8766D", "#00BFC4")) +
theme_classic() + theme(legend.title = element_blank())
#option 2: set labels before ggplot call
pred.dat$tagged <- factor(pred.dat$tag, levels = 0:1, labels = c("untagged", "tagged"))
ggplot(pred.dat, aes(x = x, y = y, color = tagged, fill = tagged)) +
geom_smooth(size = 1.2) +
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.2, color = NA) +
theme_classic() + theme(legend.title = element_blank())

How to add axis text in this negative and positive bars differently using ggplot2?

I've drawed bar graph with negative and positive bars which is familiar to the research. However, my code seems extremely inconvenient and verbose usinggraphics::plot() and graphics::text() as showed below. Try as I may, I could find the solution using element_text to fulfill in ggplot2. Please help or try to give some ideas how to achieve this in ggplot2.Thanks in advance.
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
# bar graph
tiff(file="lefse.tiff",width=2000,height=2000,res=400)
par(mar=c(5,2,1,1))
barplot(df[,3],horiz=T,xlim=c(-6,6),xlab="LDA score (log 10)",
col=c(rep("forestgreen",length(which(df[,2]=="groupA"))),
rep("goldenrod",length(which(df[,2]=="groupB")))))
axis(1,at=seq(-6,6,by=1))
# add text
text(0.85,36.7,label=df[,1][31],cex=0.6);text(0.75,35.4,label=df[,1][30],cex=0.6)
text(0.75,34.1,label=df[,1][29],cex=0.6);text(0.85,33.0,label=df[,1][28],cex=0.6)
text(0.75,31.8,label=df[,1][27],cex=0.6);text(0.6,30.6,label=df[,1][26],cex=0.6)
text(0.8,29.5,label=df[,1][25],cex=0.6);text(0.85,28.3,label=df[,1][24],cex=0.6)
text(0.45,27.1,label=df[,1][23],cex=0.6);text(0.4,25.9,label=df[,1][22],cex=0.6)
text(0.55,24.7,label=df[,1][21],cex=0.6);text(0.55,23.5,label=df[,1][20],cex=0.6)
text(0.85,22.3,label=df[,1][19],cex=0.6);text(-0.75,21.1,label=df[,1][18],cex=0.6)
text(-1,19.9,label=df[,1][17],cex=0.6);text(-1,18.8,label=df[,1][16],cex=0.6)
text(-0.85,17.6,label=df[,1][15],cex=0.6);text(-0.85,16.3,label=df[,1][14],cex=0.6)
text(-0.7,15.1,label=df[,1][13],cex=0.6);text(-0.65,13.9,label=df[,1][12],cex=0.6)
text(-0.85,12.7,label=df[,1][11],cex=0.6);text(-1.05,11.5,label=df[,1][10],cex=0.6)
text(-0.85,10.3,label=df[,1][9],cex=0.6);text(-0.85,9.1,label=df[,1][8],cex=0.6)
text(-0.47,7.9,label=df[,1][7],cex=0.6);text(-0.85,6.7,label=df[,1][6],cex=0.6)
text(-0.49,5.5,label=df[,1][5],cex=0.6);text(-1.44,4.3,label=df[,1][4],cex=0.6)
text(-0.49,3.1,label=df[,1][3],cex=0.6);text(-0.93,1.9,label=df[,1][2],cex=0.6)
text(-0.69,0.7,label=df[,1][1],cex=0.6)
# add lines
segments(0,-1,0,40,lty=3,col="grey")
segments(2,-1,2,40,lty=3,col="grey")
segments(4,-1,4,40,lty=3,col="grey")
segments(6,-1,6,40,lty=3,col="grey")
segments(4,-1,4,40,lty=3,col="grey")
segments(-2,-1,-2,40,lty=3,col="grey")
segments(-4,-1,-4,40,lty=3,col="grey")
segments(-6,-1,-6,40,lty=3,col="grey")
legend("topleft",bty="n",cex=0.65,inset=c(0.01,-0.02),ncol=2,
legend=c("groupA","groupB"),
col=c("forestgreen", "goldenrod"),pch=c(15,15))
dev.off()
Here's a solution using dplyr to create some extra columns for the label position and the justification, and then theming the plot to match reasonably closely what you originally had:
library("dplyr")
library("ggplot2")
df <- df %>%
mutate(
genus = factor(genus, levels = genus[order(value, decreasing = TRUE)]),
label_y = ifelse(value < 0, 0.2, -0.2),
label_hjust = ifelse(value < 0, 0, 1)
)
my_plot <- ggplot(df, aes(x = genus, y = value, fill = class)) +
geom_bar(stat = "identity", col = "black") +
geom_text(aes(y = label_y, label = genus, hjust = label_hjust)) +
coord_flip() +
scale_fill_manual(values = c(groupA = "forestgreen", groupB = "goldenrod")) +
theme_minimal() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top",
legend.justification = 0.05,
legend.title = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_line(colour = "grey80", linetype = "dashed"),
panel.grid.minor.x = element_blank()) +
scale_y_continuous(expression(log[10](italic("LDA score"))),
breaks = -6:6, limits = c(-6, 6))
print(my_plot)
ggsave("lefse.tiff", width = 5, height = 5, dpi = 400, my_plot)
I would try this:
library(ggplot2)
# change the factor levels so it will be displayed in correct order
df$genus <- factor(df$genus, levels = as.character(df$genus))
ggplot(df, aes(x = genus, y = value)) +
geom_bar(aes(fill = class), stat = 'identity') + # color by class
coord_flip() + # horizontal bars
geom_text(aes(y = 0, label = genus, hjust = as.numeric(value > 0))) + # label text based on value
theme(axis.text.y = element_blank())
In the above, hjust will change the direction of the text relative to its y position (flipped to x now), which is similar to pos parameter in base R plot. So you code could also be simplified with a vector for pos argument to text function.
Two options:
library(ggplot2)
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
ggplot(df, aes(reorder(genus, -value), value, fill = class)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label = genus,
y = ifelse(value < 1, 1.5, -1.5)), size = 2.5) +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
Or this:
library(ggplot2)
# my data
df <- data.frame(genus=c("Prevotella","Streptococcus","YRC22","Phascolarctobacterium","SMB53","Epulopiscium",
"CF231","Anaerovibrio","Paludibacter","Parabacteroides","Desulfovibrio","Sutterella",
"Roseburia","Others__0_5_","Akkermansia","Bifidobacterium","Campylobacter","Fibrobacter",
"Coprobacillus","Bulleidia","f_02d06","Dorea","Blautia","Enterococcus","Eubacterium",
"p_75_a5","Clostridium","Coprococcus","Oscillospira","Escherichia","Lactobacillus"),
class=c(rep("groupA",18),rep("groupB",13)),
value=c(4.497311,4.082377,3.578472,3.567310,3.410453,3.390026,
3.363542,3.354532,3.335634,3.284165,3.280838,3.218053,
3.071454,3.026663,3.021749,3.004152,2.917656,2.811455,
-2.997631,-3.074314,-3.117659,-3.151276,-3.170631,-3.194323,
-3.225207,-3.274281,-3.299712,-3.299875,-3.689051,-3.692055,
-4.733154)
)
ggplot(df, aes(reorder(genus, -value), value, fill = class)) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("genus")

ggplot2 heatmaps: using different gradients for categories

This Learning R blog post shows how to make a heatmap of basketball stats using ggplot2. The finished heatmap looks like this:
My question (inspired by Jake who commented on the Learning R blog post) is: would it be possible to use different gradient colors for different categories of stats (offensive, defensive, other)?
First, recreate the graph from the post, updating it for the newer (0.9.2.1) version of ggplot2 which has a different theme system and attaches fewer packages:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
library("ggplot2")
library("plyr")
library("reshape2")
library("scales")
nba.m <- melt(nba)
nba.s <- ddply(nba.m, .(variable), transform,
rescale = scale(value))
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(fill = rescale), colour = "white") +
scale_fill_gradient(low = "white", high = "steelblue") +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))
Using different gradient colors for different categories is not all that straightforward. The conceptual approach, to map the fill to interaction(rescale, Category) (where Category is Offensive/Defensive/Other; see below) doesn't work because interacting a factor and continuous variable gives a discrete variable which fill can not be mapped to.
The way to get around this is to artificially do this interaction, mapping rescale to non-overlapping ranges for different values of Category and then use scale_fill_gradientn to map each of these regions to different color gradients.
First create the categories. I think these map to those in the comment, but I'm not sure; changing which variable is in which category is easy.
nba.s$Category <- nba.s$variable
levels(nba.s$Category) <-
list("Offensive" = c("PTS", "FGM", "FGA", "X3PM", "X3PA", "AST"),
"Defensive" = c("DRB", "ORB", "STL"),
"Other" = c("G", "MIN", "FGP", "FTM", "FTA", "FTP", "X3PP",
"TRB", "BLK", "TO", "PF"))
Since rescale is within a few (3 or 4) of 0, the different categories can be offset by a hundred to keep them separate. At the same time, determine where the endpoints of each color gradient should be, in terms of both rescaled values and colors.
nba.s$rescaleoffset <- nba.s$rescale + 100*(as.numeric(nba.s$Category)-1)
scalerange <- range(nba.s$rescale)
gradientends <- scalerange + rep(c(0,100,200), each=2)
colorends <- c("white", "red", "white", "green", "white", "blue")
Now replace the fill variable with rescaleoffset and change the fill scale to use scale_fill_gradientn (remembering to rescale the values):
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(fill = rescaleoffset), colour = "white") +
scale_fill_gradientn(colours = colorends, values = rescale(gradientends)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))
Reordering to get related stats together is another application of the reorder function on the various variables:
nba.s$variable2 <- reorder(nba.s$variable, as.numeric(nba.s$Category))
ggplot(nba.s, aes(variable2, Name)) +
geom_tile(aes(fill = rescaleoffset), colour = "white") +
scale_fill_gradientn(colours = colorends, values = rescale(gradientends)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0))
Here's a simpler suggestion that uses ggplot2 aesthetics to map both gradients as well as color categories. Simply use an alpha-aesthetic to generate the gradient, and the fill-aesthetic for the category.
Here is the code to do so, refactoring Brian Diggs' response:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
library("ggplot2")
library("plyr")
library("reshape2")
library("scales")
nba.m <- melt(nba)
nba.s <- ddply(nba.m, .(variable), transform,
rescale = scale(value))
nba.s$Category <- nba.s$variable
levels(nba.s$Category) <- list("Offensive" = c("PTS", "FGM", "FGA", "X3PM", "X3PA", "AST"),
"Defensive" = c("DRB", "ORB", "STL"),
"Other" = c("G", "MIN", "FGP", "FTM", "FTA", "FTP", "X3PP", "TRB", "BLK", "TO", "PF"))
Then, normalise the rescale variable to between 0 and 1:
nba.s$rescale = (nba.s$rescale-min(nba.s$rescale))/(max(nba.s$rescale)-min(nba.s$rescale))
And now, do the plotting:
ggplot(nba.s, aes(variable, Name)) +
geom_tile(aes(alpha = rescale, fill=Category), colour = "white") +
scale_alpha(range=c(0,1)) +
scale_x_discrete("", expand = c(0, 0)) +
scale_y_discrete("", expand = c(0, 0)) +
theme_grey(base_size = 9) +
theme(legend.position = "none",
axis.ticks = element_blank(),
axis.text.x = element_text(angle = 330, hjust = 0)) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
Note the use of alpha=rescale and then the scaling of the alpha range using scale_alpha(range=c(0,1)), which can be adapted to change the range appropriately for your plot.

Resources