ggplot2: heatmap customize legend

ggplot2: heatmap customize legend - r

I am trying to plot a heatmap (colored by odds ratios) using ggplot2. The odds ratio values range from 0-200. I would like my heatmap legend to show markings corresponding to certain values (0.1, 1, 10, 50, 100, 200). This is the code I am using but my legend does not label all the values (see below)
Code below:
map is a sample data frame with columns: segments, OR, tissue type
segments <- c("TssA", "TssBiv", "BivFlnk", "EnhBiv","ReprPC", "ReprPCWk", "Quies", "TssAFlnk", "TxFlnk", "Tx", "TxWk", "EnhG", "Enh", "ZNF/Rpts", "Het")
OR <- c(1.4787622, 46.99886002, 11.74417278, 4.49223136, 204.975818, 1.85228517, 0.85762414, 0.67926846, 0.33696213, 0.06532777, 0.10478027, 0.07462983, 0.06501252, 1.32922162, 0.32638438)
df <- data.frame(segments, OR)
map <- df %>% mutate(tissue = 'colon')
ggplot(map, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(colours=c("lightskyblue1", "white","navajowhite","lightsalmon", "orangered2", "indianred1"),
values=rescale(c(0.1, 1, 10, 50, 100, 200)), guide="colorbar", breaks=c(0.1, 1, 10, 50, 150, 200))
I am looking for my legend to look something similar to this (using the values I specified):

With your map data, first rescale OR to log(OR).
Also, you might want to assign white to OR = 1. If that's the case, your approach would be able to achieve that. You may want to try different limits values to achieve that with real data.
map_1 <-map %>% mutate(OR = log(OR))
OR_max <- max(map$OR, na.rm = TRUE)
log_list <- c(0.2, 1, 10, 50, 200) %>% log
ggplot(map_1, aes(tissue,segments, fill = OR))+ geom_tile(colour="gray80")+
theme_bw()+coord_equal()+
scale_fill_gradientn(
colours = c("red3", "white", "navy"),
values=rescale(log_list),
guide="colorbar",
breaks=log_list,
limits = c(1/OR_max, OR_max) %>% log,
labels = c("0.1", "1", "10", "50", "200")
)

Related

How to plot very high odds ratios in R using forestplot? (Issue with max limit on clip)

I have been trying to display the odds of receiving a psychotropic medication for a list of psychiatric diagnoses but have not been able to show the entire range (on a log scale) due to the limitations of the x axis.
Looking at the forestplot documentation, it appears that the clip() is what is used to specify the xlimits. However, I have noticed that anytime I set it to be something greater than 54 the number on the bottom will not be shown at all and it stops at 4. This is an issue for me because I need to plot numbers as high as 221 (the upper confidence limit for my highest odds ratio).
I am using the following code:
# Cochrane data from the 'rmeta'-package
base_data <- tibble::tibble(mean = c(19.92 , 41.46, 11.67, 11.69, 25.44, 105.89, 145.45),
lower = c(17.09, 34.70, 9.04, 10.92, 19.78, 67.40, 95.64),
upper = c(23.22, 49.54, 15.07, 12.51, 32.73, 166.37, 221.22),
study = c("Autism", "Conduct Problems", "Tic Disorder", "ADHD",
"OCD", "Schizophrenia", "Manic Bipolar"),
OR = c("19.92" , "41.46", "11.67", "11.69", "25.44", "105.89", "145.45"))
base_data |>
forestplot(labeltext = c(study, OR),
clip = c(0.1, 54),
xlog = TRUE) |>
fp_set_style(box = "royalblue",
line = "darkblue",
summary = "royalblue") |>
fp_add_header(study = c("", "Study"),
OR = c("", "OR")) |>
fp_append_row(mean = 60.22,
lower = 41,
upper = 83,
study = "Summary",
OR = "60.22",
is.summary = TRUE) |>
fp_set_zebra_style("#EFEFEF")
Which creates this graph:
If I set the clip to 220 I am able to plot this but the x axis will stop at 4 as shown below:
Does anyone know how to get past this issue and set the xlimit ticks to a very high number (e.g. 100+) while still using a log scale?
Keeping it on a log scale would mean there would be an equal distance between 1, 10, 100, and show the entire range of answers (up till the final value of 221)while still allowing one to see the difference between values at the lower end.
Any help is extremely appreciated. Thank you so much!

According to the docs:
xlog: The xlog outputs the axis in log() format but the input data
should be in antilog/exp format
So you could change your data using exp. To add labels you can use xticks. Here some reproducible code:
library(forestplot)
base_data$mean <- exp(base_data$mean)
base_data$lower <- exp(base_data$lower)
base_data$upper <- exp(base_data$upper)
base_data |>
forestplot(labeltext = c(study, OR),
xlog = TRUE,
xticks = c(0, 50, 100, 150, 200, 250))|>
fp_set_style(box = "royalblue",
line = "darkblue",
summary = "royalblue") |>
fp_add_header(study = c("", "Study"),
OR = c("", "OR")) |>
fp_append_row(mean = 60.22,
lower = 41,
upper = 83,
study = "Summary",
OR = "60.22",
is.summary = TRUE) |>
fp_set_zebra_style("#EFEFEF")
Created on 2023-01-24 with reprex v2.0.2

Plotting every three rows from data frame

I would like to make some plots from my data. Unfortunately, it is hard to predict how many plots I will generate because it depends on data and may be different. It is a reason why I would like to make it easy adjustable. However, it will be most often a plot from group of 3 rows each time.
So, I would like to plot from rows 1:3, 4-6,7-9, etc.
This is data:
> dput(DF_final)
structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
Column ln_AC should be set as an Y axis and column TP as X axis. First of all I would like to have all of them on separate graphs next to each other (remember about issue that the number of plots may be igh at some point) and if possible everything at the same graph. It should be a point plot with trend line.
Is it also possible to get a slope, SD slope, R^2 on a plot from linear regression ?
I manage to do it a for a single plot but regression line looks strange...
The code below was used to generate this plot and regression line.
fit <- lm(DF_final$ln_AC~DF_final$TP, data=DF_final)
plot(DF_final[1:3,7], DF_final[1:3,3], type = "p", ylim = c(-10,0), xlim=c(0,100), col = "red")
lines(DF_final$TP, fitted(fit), col="blue")

In base R (without so many packages), you can do:
# splits every 3 rows
DF = split(DF_final,gsub("_[^ ]*","",rownames(DF_final) ))
# you can also do
# DF = split(DF_final,(1:nrow(DF_final) - 1) %/%3 ))
To store your values:
slopes = vector("numeric",3)
names(slopes) = names(DF)
rsq = vector("numeric",3)
names(rsq) = names(DF)
To plot:
par(mfrow=c(1,3))
for(i in names(DF)){
fit <- lm(ln_AC~TP, data=DF[[i]])
plot(DF[[i]]$TP, DF[[i]]$ln_AC, type = "p", col = "red",main=i)
abline(fit, col="blue")
slopes[i]=round(fit$coefficients[2],digits=2)
rsq[i]=round(summary(fit)$r.squared,digits=2)
mtext(side=1,paste("slope=",slopes[i],"\nrsq=",rsq[i]),
padj=-2,cex=0.7)
}
And your values:
slopes
Jack Tim Tom
-0.01 -0.01 -0.10
rsq
Jack Tim Tom
0.29 0.99 0.75

If I understand correctly, the reason you want 3 observation per graph is because you have different individuals (Jack,Tim,Tom) . Is that so?
If you don't want to worry about that number, you can do this
# move rownames to column
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person) # remove TP from names
# better to use library(data.table) for this step
data <- melt(data,id.vars=c("person","TP","ln_AC"))
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person)
This results in a plot like #giocomai, but it will work also if you have 4,5,6 or whatever persons in your data.
---- Edit
If you want to add R2 values, you can do something like this. Note, that it may not be the best and elegant solution, but it works.
data <- data.frame(...)
data$person <- rownames(data)
data$person <- gsub("\\_.*","",data$person)
# run lm for all persons and save them in a data.frame
nomi <- unique(data$person)
#lmStats <- data.frame()
lmStats <- sapply(nomi,
function(ita){
model <- lm(ln_AC~TP,data= data[which(data$person == ita),])
lmStat <- summary(model)
# I only save r2, but you can get all the statistics you need
lmRow <- data.frame("r2" = lmStat$r.squared )
#lmStats <- rbind(lmStats,lmRow)
}
)
lmStats <- do.call(rbind,lmStats)
# format the output,and create a dataframe we will use to annotate facet_grid
lmStats <- as.data.frame(lmStats)
rownames(lmStats) <- gsub("\\..*","",rownames(lmStats))
lmStats$person <- rownames(lmStats)
colnames(lmStats)[1] <- "r2"
lmStats$r2 <- round(lmStats$r2,2)
lmStats$TP <- 40
lmStats$ln_AC <- 0
lmStats$lab <- paste0("r2= ",lmStats$r2)
# melt and add r2 column to the data (not necessary, but I like to have everything I plot in teh data)
data <- melt(data,id.vars=c("person","TP","ln_AC"))
data$r2 <- lmStats[match(data$person,rownames(lmStats)),1]
ggplot(data,aes(x=TP, y=ln_AC)) + geom_point() +
geom_smooth(method = "lm") + facet_grid(~person) +
geom_text(data=lmStats,label=lmStats$lab)
An easier way (less steps) would be to use facet_grid(~r2), so that you have the R.square value in the title.

If I understand correctly what you mean, assuming you will always have three observation per graph, your main issue would be creating a categorical variable to separate them. Here's one way to accomplish it. Depending on the layout you prefer, you may want to check facet_wrap instead of facet_grid.
library("dplyr")
library("ggplot2")
DF_final <- structure(list(AC = c(0.0031682160632777, 0.00228591145206846,
0.00142094444568728, 0.000661218113472149, 0.0010078157353918,
0.000400289437089513, 40.4634784175177, 40.5055070858594, 0.0183737773741582
), SD = c(0.00250647379467532, 0.0013244185401148, 0.000469332241199189,
0.000294558308707343, 0.000385553400676202, 0.000104447914881357,
11.0693842400794, 8.78768774254084, 0.00696532251341454), ln_AC = c(-5.75458660556339,
-6.08099044923792, -6.556433525855, -7.32142679754668, -6.89996992823399,
-7.8233226797995, 3.70039979980691, 3.70143794229703, -3.99683077355773
), ln_SD = c(-5.98887837626238, -6.62678175351058, -7.66419963690747,
-8.13003358225542, -7.86083085139947, -9.16682203300101, 2.40418312097106,
2.17335162163583, -4.96681136795312), Percent_AC = c(126.401324043689,
172.597361244303, 302.758754023937, 224.477834753288, 261.394591157605,
383.243109777925, 365.544076706723, 460.934756361151, 263.789326894369
), Percent_SD = c(100, 100, 100, 100, 100, 100, 100, 100, 100
), TP = c(0, 40, 80, 0, 40, 80, 0, 40, 80)), row.names = c("Tim_0",
"Tim_40", "Tim_80", "Jack_0", "Jack_40", "Jack_80", "Tom_0",
"Tom_40", "Tom_80"), class = "data.frame")
DF_final %>%
mutate(id = as.character(sapply(1:(nrow(DF_final)/3), rep, 3))) %>%
ggplot(aes(x=TP, y=ln_AC)) +
geom_point() +
geom_smooth(method = "lm") +
facet_grid(~id)
Created on 2020-02-06 by the reprex package (v0.3.0)

Soil profiles with coloured volume fractions with "aqp" in R

I am trying to plot a soil profile in R using the package aqp: algorithms for quantitative pedology. The profile should represent matrix colour, plus mottling colour and percentage. For that purpose, I am using the function addVolumeFraction, which works well to some extent: it plots points on the profile that correspond to the right mottling percentage for each horizon, but it doesn't assign the corresponding colours. Here an example:
#Variables for the soil profile
id <- rep(1, 4)
hor <- c("H1", "H2", "H3", "H4")
tops <- c(0,15,35,60)
bottoms <- c(15, 35, 60, 95)
mx_Hex <- c("#695F59FF", "#A59181FF", "#9E9388FF", "#A59181FF")
mot_Hex <- c("#EEB422","#EEB422", "#CD4F39", "#CD4F39")
mot_perc <- c(5, 10, 40, 8)
#Soil profile df
soildf <- data.frame(id,hor,tops,bottoms, mx_Hex, mot_Hex, mot_perc)
soildf$mx_Hex <- as.character(mx_Hex) #the class "SoilProfileCollection" needs colors as characters
soildf$mot_Hex <- as.character(mot_Hex)
# Transform df to "SoilProfileCollection"
depths(soildf) <- id ~ tops + bottoms
#Plot
plot(soildf, name = "hor", color = "mx_Hex", divide.hz = TRUE)
addVolumeFraction(soildf, "mot_perc",pch = 19, cex.min = 0.4, cex.max = 0.5, col = soildf$mot_Hex)
Soil profile plot
As you can see on the plot, the mottles' colours are mixed along the profile. I would like to have mottles of a given colour for their corresponding horizon instead. Can anybody help me to solve this problem?
Thanks!!

This works as expected in the current version of aqp available on CRAN (v1.19 released in January 2020).
I modified your example below to use alternating black and white mottles in each horizon.
library(aqp)
#Variables for the soil profile
id <- rep(1, 4)
hor <- c("H1", "H2", "H3", "H4")
tops <- c(0,15,35,60)
bottoms <- c(15, 35, 60, 95)
mx_Hex <- c("#695F59FF", "#A59181FF", "#9E9388FF", "#A59181FF")
# change mottle colors to something obviously different in each horizon
mot_Hex <- c("#FFFFFF", "#000000", "#FFFFFF","#000000")
mot_perc <- c(5, 10, 40, 8)
#Soil profile df
soildf <- data.frame(id, hor, tops, bottoms, mx_Hex, mot_Hex, mot_perc)
#the class "SoilProfileCollection" needs colors as characters
soildf$mx_Hex <- as.character(mx_Hex)
soildf$mot_Hex <- as.character(mot_Hex)
# Transform df to "SoilProfileCollection"
depths(soildf) <- id ~ tops + bottoms
#Plot
plot(soildf,
name = "hor",
color = "mx_Hex",
divide.hz = TRUE)
addVolumeFraction(
soildf,
"mot_perc",
pch = 19,
cex.min = 0.4,
cex.max = 0.5,
col = soildf$mot_Hex
)
alternating mottles

Loop functions with multiple variables for ggplot2

I want to build several plots from one large database, so that I have one plot for each Text (factor) and for each Measure (the many resulting measures of an eye tracking study). The following is a much simpler example of what I am trying to to:
Let's say this is my dataset
Text <- c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)
Position <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Modified <- c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0)
Line_on_page <- c(1, 1, 1, 1, 2,2,2,2 ,1 ,1,1,1,2,2,2,2)
IA_FIXATION_DURATION <- c(250.3, 70.82, 400, 120.12, 270, 120.5, 100.54, 212.43, 250.3, 70.82, 320.29, 123.12, 260, 121.5, 100.54, 272.43)
IA_FIXATION_COUNT <- c(1,0,1,1,3,2,0, 1, 1,0,1,2,3,2,0, 2)
IA_LABEL <- c("she", "did", "not", "know", "what", "to", "say", "to", "she", "did", "not", "know", "what", "to", "do", "to")
testDF <- data.frame(Text , Position , Line_on_page, Modified, IA_FIXATION_DURATION, IA_FIXATION_COUNT, IA_LABEL)
so I want a heatmap (or another graph) for each Text (1/2/3), and for each measure (IA_FIXATION_DURATION/IA_FIXATION_COUNT)
# so first i create my vectors
library(stringr)
library(reshape2)
library(ggplot2)
library(ggthemes)
library(tidyverse)
Text_list <- unique(testDF$Text)
Measure_list <- testDF %>% dplyr::select_if(is.numeric) %>% colnames() %>% as.vector()
# create graphing function
Heatmap_FN <- function(testDF, na.rm = TRUE, ...){
# create for loop to produce ggplot2 graphs
for (i in seq_along(Text_list)) {
for (j in seq_along(Measure_list)) {
# create plot for each text in dataset
plots <- ggplot(subset(testDF, testDF$Text==Text_list[i])) +
geom_tile(aes(x=Position,
y=Line_on_page,
fill = Measure_list[j])) +
geom_text(aes(x=Position,
y=Line_on_page,
label=IA_LABEL),
color = "white", size = 2, family = "sans") +
scale_fill_viridis_c(option = "C", na.value = "black") +
scale_y_reverse() +
facet_grid(Page ~ Modified)+
theme(legend.position = "bottom") +
ggtitle(paste(Text_list[i],j, 'Text \n'))
ggsave(plots, file=paste(Measure_list[j], "_T", Text_list[i], ".pdf", sep = ""), height = 8.27, width = 11.69, units = c("in"))
}
}
}
Heatmap_FN(testDF)
now, I am pretty sure that the problem lies in the geom_tile "fill" part, where I would like to indicate to the function that I want to use the results variables one by one to produce the plot.
Any ideas on how to fix that?
Thanks

a single ColorRamp palette yields different color schemes in two plotly scatterplots

As the title says, a single ColorRamp palette when used in two different scetterplots built by plotly, gives two slightly differently looking (note the middle part) colorbars even though the upper and the lower bounds of corresponding data sets are manually set to be identical in both plots.
I'd like to make the plots visually comparable, and for that I'd obviously have to have identical colorbars. Is there a way to do that?
Here's the code:
myxaxis <- list(range = c(16, 44), dtick=2, gridwidth = 1, title = "Length of carbon chain") #setting the visible area of x axis
myyaxis <- list(range = c(0, 8), gridwidth = 1, title = "No. of double bonds") #setting the visible area of y axis
mycolors <- colorRampPalette(c('green', 'red', 'black'))(n = 100) #creating an RColorBrewer palette
ch_new1 <- cbind.data.frame(c('PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'upper bound', 'lower bound'), c(4.571087, 6.522441, 6.522441, 5.081869, 4.471815, 5.744834, 7.329796, 5.472866, 5.744834, 1, 1), c(10.52337, 16.75454, 16.0976, 16.47356, 18.94973, 17.46351, 10.97607, 18.11186, 11.64033, 0.2085327, 71.18021), c(32, 34, 34, 36, 36, 36, 38, 38, 38, 100, 100), c(1, 1, 2, 2, 3, 4, 4, 5, 6, 100, 100), c(0.4128963, 16.68394, 26.52718, 23.50851, 16.02339, 3.971546, 6.854153, 3.24342, 2.774968, 1, 1)) #the first dataset
colnames(ch_new1) <- c('Species', 'log_inversed_pval','fold_difference', 'N_of_carbons','N_of_double_bonds', 'rel_abund')
d <- plot_ly(ch_new1, x=~N_of_carbons, y=~N_of_double_bonds, text = ~paste('Percent of total', Species, '=', round(rel_abund, 0)), size=~rel_abund, color=~fold_difference, colors = mycolors)%>% #producing the scatter plot
layout(
xaxis = myxaxis,
yaxis = myyaxis,
title = paste('PA', '2b')
)%>%
colorbar(title="Fold difference", ypad=20)
export(d)
ch_new2 <- cbind.data.frame(c('LPC', 'LPC', 'LPC', 'lower limit', 'upper limit'), c(7.329796, 7.329796, 5.081869, 1, 1), c(2.952345, 5.042931, 3.700331, 0.2085327, 71.18021), c(18, 20, 22, 100, 100), c(0, 3, 5, 100, 100), c(82.87528, 13.56943, 3.555281, 1, 1)) #the second dataset
colnames(ch_new2) <- c('Species', 'log_inversed_pval','fold_difference', 'N_of_carbons','N_of_double_bonds', 'rel_abund')
d <- plot_ly(ch_new2, x=~N_of_carbons, y=~N_of_double_bonds, text = ~paste('Percent of total', Species, '=', round(rel_abund, 0)), size=~rel_abund, color=~fold_difference, colors = mycolors)%>% #creating the second scatterplot
layout(
xaxis = myxaxis,
yaxis = myyaxis,
title = paste(unique(ch$Species)[i], fraction)
)%>%
colorbar(title="Fold difference", ypad=20)
export(d)
chart #1 with bright red middle
chart #2 with dim red middle

I've solved the problem on my own.
Turns out that by adding one or several "anchoring" dummy points placed beyond the margins of the plot (so they are not shown) helps to make the plot colorbars almost identical.
The initial dataset
ch_new1 <- cbind.data.frame(c(...)) #the first dataset
should be appended with anchoring dummy points:
ch_new1 <- cbind.data.frame(c(...)) #the first dataset
ch_new1 <- rbind(ch_new, list('middle anchor point', 1, 50, 100, 100, 1))
ch_new1 <- rbind(ch_new, list('quarter anchor point', 1, 25, 100, 100, 1))
tl;dr anchor the variable responsible for colorbar to multiple reference points (10, 20, 30, 40, 50, ...)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot2: heatmap customize legend - r

Related

How to plot very high odds ratios in R using forestplot? (Issue with max limit on clip)

Plotting every three rows from data frame

Soil profiles with coloured volume fractions with "aqp" in R

Loop functions with multiple variables for ggplot2

a single ColorRamp palette yields different color schemes in two plotly scatterplots

Categories

Resources