I have been really struggling to position the components of my heatmap.2 output.
I found this old answer explaining how the element positioning worked from #IanSudbery which seemed really clear and I thought it had given me the understanding I need, but I'm still not grasping something.
I understand that the elements are all essentially put in a lattice of windows but they aren't behaving in a way I understand.
Here is my code and the current output (at the very bottom is the bit of interest which orders the figure elements):
for(i in 1:length(ConditionsAbbr)) {
# creates its own colour palette
my_palette <- colorRampPalette(c("snow", "yellow", "darkorange", "red"))(n = 399)
# (optional) defines the colour breaks manually for a "skewed" colour transition
col_breaks = c(seq(0,0.09,length=100), #white 'snow'
seq(0.1,0.19,length=100), # for yellow
seq(0.2,0.29,length=100), # for orange 'darkorange'
seq(0.3,1,length=100)) # for red
# creates a 5 x 5 inch image
png(paste(SourceDir, "Heatmap_", ConditionsAbbr[i], "XYZ.png"), # create PNG for the heat map
width = 5*600, # 5 x 600 pixels
height = 5*600,
res = 300, # 300 pixels per inch
pointsize = 8) # smaller font size
heatmap.2(ConditionsMtx[[ConditionsAbbr[i]]],
cellnote = ConditionsMtx[[ConditionsAbbr[i]]], # same data set for cell labels
main = paste(ConditionsAbbr[i], "XYZ"), # heat map title
notecol="black", # change font color of cell labels to black
density.info="none", # turns off density plot inside color legend
trace="none", # turns off trace lines inside the heat map
margins =c(12,9), # widens margins around plot
col=my_palette, # use on color palette defined earlier
breaks=col_breaks, # enable color transition at specified limits
dendrogram="none", # No dendogram
srtCol = 0 , #correct angle of label numbers
asp = 1 , #this overrides layout methinks and for some reason makes it square
adjCol = c(NA, -35) ,
adjRow = c(53, NA) ,
keysize = 1.2 ,
Colv = FALSE , #turn off column clustering
Rowv = FALSE , # turn off row clustering
key.xlab = paste("Correlation") ,
lmat = rbind( c(0, 3), c(2,1), c(0,4) ),
lhei = c(0.9, 4, 0.5) )
dev.off() # close the PNG device
}
This gives:
As you can see, the key is right of the matrix, there are huge amounts of white space between the matrix, the title above and key below, and it's not even as if the title and matrix are centred in the PNG?
I think to myself "well I'll just create a 3x3 that is easy to understand and edit" e.g.
| |
| | (3)
| |
--------------------------
| (1) |
(2) | Matrix |
| |
--------------------------
| (4) |
| Key |
| |
And then I can get rid of the white space so it's more like this.
| |(3)
------------------
| (1) |
(2)| Matrix |
| |
------------------
|(4) Key |
I do this using:
lmat = rbind( c(0, 0, 3), c(2, 1, 0), c(0, 4, 0) ),
lhei = c(0.9, 4, 0.5) ,
lwid = c(1, 4, 1))
This is what it looks like:
As great as it is to see my matrix in the centre, my key is still aligned to the right of my matrix and my title is taking the Silk Road East? Not to mention all the excess white space?
How do I get these to align and to all move together so the figure components fit snugly together?
EDIT: reducing my margins helped to reduce the whitespace but it's still excessive.
Here are the final changes I made to get my results, however, I would recommend using the advice of Maurits Evers if you aren't too invested in heatmap.2. Don't overlook the changes I made to the image dimensions.
# creates my own colour palette
my_palette <- colorRampPalette(c("snow", "yellow", "darkorange", "red"))(n = 399)
# (optional) defines the colour breaks manually for a "skewed" colour transition
col_breaks = c(seq(0,0.09,length=100), #white 'snow'
seq(0.1,0.19,length=100), # for yellow
seq(0.2,0.29,length=100), # for orange 'darkorange'
seq(0.3,1,length=100)) # for red
# creates an image
png(paste(SourceDir, "Heatmap_XYZ.png" )
# create PNG for the heat map
width = 5*580, # 5 x 580 pixels
height = 5*420, # 5 x 420 pixels
res = 300, # 300 pixels per inch
pointsize =11) # smaller font size
heatmap.2(ConditionsMtx[[ConditionsAbbr[i]]],
cellnote = ConditionsMtx[[ConditionsAbbr[i]]], # same data set for cell labels
main = "XYZ", # heat map title
notecol="black", # change font color of cell labels to black
density.info="none", # turns off density plot inside color legend
trace="none", # turns off trace lines inside the heat map
margins=c(0,0), # widens margins around plot
col=my_palette, # use on color palette defined earlier
breaks=col_breaks, # enable color transition at specified limits
dendrogram="none", # only draw a row dendrogram
srtCol = 0 , #correct angle of label numbers
asp = 1 , #this overrides layout methinks and for some reason makes it square
adjCol = c(NA, -38.3) , #shift column labels
adjRow = c(77.5, NA) , #shift row labels
keysize = 2 , #alter key size
Colv = FALSE , #turn off column clustering
Rowv = FALSE , # turn off row clustering
key.xlab = paste("Correlation") , #add label to key
cexRow = (1.8) , # alter row label font size
cexCol = (1.8) , # alter column label font size
notecex = (1.5) , # Alter cell font size
lmat = rbind( c(0, 3, 0), c(2, 1, 0), c(0, 4, 0) ) ,
lhei = c(0.43, 2.6, 0.6) , # Alter dimensions of display array cell heighs
lwid = c(0.6, 4, 0.6) , # Alter dimensions of display array cell widths
key.par=list(mar=c(4.5,0, 1.8,0) ) ) #tweak specific key paramters
dev.off()
Here is the output, which I will continue to refine until all spacing and font sizes suit my aesthetic preference. I would tell you exactly what I've done but I'm not 100% sure, frankly it all feels like it's held together with old gum and bailer twine, but don't kick a gift horse in the code, as they say.
I don't know if you're open to non-heatmap.2-based solutions. In my opinion ggplot offers greater flexibility and with a bit of tweaking you can reproduce a heatmap similar to the one you're showing quite comfortably while maximising plotting "real-estate" and avoiding excessive whitespace.
I'm happy to remove this post if you're only looking for heatmap.2 solutions.
That aside, a ggplot2 solution may look like this:
First off, let's generate some sample data
set.seed(2018)
df <- as_tibble(matrix(runif(7*10), ncol = 10), .name_repair = ~seq(1:10))
Prior to plotting we need to reshape df from wide to long
library(tidyverse)
df <- df %>%
rowid_to_column("row") %>%
gather(col, Correlation, -row) %>%
mutate(col = as.integer(col))
Then to plot
ggplot(df, aes(row, col, fill = Correlation)) +
geom_tile() +
scale_fill_gradientn(colours = my_palette) + # Use your custom colour palette
theme_void() + # Minimal theme
labs(title = "Main title") +
geom_text(aes(label = sprintf("%2.1f", Correlation)), size = 2) +
theme(
plot.title = element_text(hjust = 1), # Right-aligned text
legend.position="bottom") + # Legend at the bottom
guides(fill = guide_colourbar(
title.position = "bottom", # Legend title below bar
barwidth = 25, # Extend bar length
title.hjust = 0.5))
An example with multiple heatmaps in a grid layout via facet_wrap
First off, let's generate more complex data.
set.seed(2018)
df <- replicate(
4,
as_tibble(matrix(runif(7*10), ncol = 10), .name_repair = ~seq(1:10)), simplify = F) %>%
setNames(., paste("data", 1:4, sep = "")) %>%
map(~ .x %>% rowid_to_column("row") %>%
gather(col, Correlation, -row) %>%
mutate(col = as.integer(col))) %>%
bind_rows(.id = "data")
Then the plotting is identical to what we did before plus an additional facet_wrap(~data, ncol = 2) statement
ggplot(df, aes(row, col, fill = Correlation)) +
geom_tile() +
scale_fill_gradientn(colours = my_palette) + # Use your custom colour palette
theme_void() + # Minimal theme
labs(title = "Main title") +
geom_text(aes(label = sprintf("%2.1f", Correlation)), size = 2) +
facet_wrap(~ data, ncol = 2) +
theme(
plot.title = element_text(hjust = 1), # Right-aligned text
legend.position="bottom") + # Legend at the bottom
guides(fill = guide_colourbar(
title.position = "bottom", # Legend title below bar
barwidth = 25, # Extend bar length
title.hjust = 0.5))
One final update
I thought it'd be fun/interesting to see how far we can get towards a complex heatmap similar to the one you link to from the paper.
The sample data is included at the end, as this takes up a bit of space.
We first construct three different ggplot2 plot objects that show the main heatmap (gg3), an additional smaller heatmap with missing values (gg2), and a bar denoting group labels for every row (gg1).
gg3 <- ggplot(df.cor, aes(col, row, fill = Correlation)) +
geom_tile() +
scale_fill_distiller(palette = "RdYlBu") +
theme_void() +
labs(title = "Main title") +
geom_text(aes(label = sprintf("%2.1f", Correlation)), size = 2) +
scale_y_discrete(position = "right") +
theme(
plot.title = element_text(hjust = 1),
legend.position="bottom",
axis.text.y = element_text(color = "black", size = 10)) +
guides(fill = guide_colourbar(
title.position = "bottom",
barwidth = 10,
title.hjust = 0.5))
gg2 <- ggplot(df.flag, aes(col, row, fill = Correlation)) +
geom_tile(colour = "grey") +
scale_fill_distiller(palette = "RdYlBu", guide = F, na.value = "white") +
theme_void() +
scale_x_discrete(position = "top") +
theme(
axis.text.x = element_text(color = "black", size = 10, angle = 90, hjust = 1, vjust = 0.5))
gg1 <- ggplot(df.bar, aes(1, row, fill = grp)) +
geom_tile() +
scale_fill_manual(values = c("grp1" = "orange", "grp2" = "green")) +
theme_void() +
theme(legend.position = "left")
We can now use egg::ggarrange to position all three plots such that the y axis ranges are aligned.
library(egg)
ggarrange(gg1, gg2, gg3, ncol = 3, widths = c(0.1, 1, 3))
Sample data
library(tidyverse)
set.seed(2018)
nrow <- 7
ncol <- 20
df.cor <- matrix(runif(nrow * ncol, min = -1, max = 1), nrow = nrow) %>%
as_tibble(.name_repair = ~seq(1:ncol)) %>%
rowid_to_column("row") %>%
gather(col, Correlation, -row) %>%
mutate(
row = factor(
paste("row", row, sep = ""),
levels = paste("row", 1:nrow, sep = "")),
col = factor(
paste("col", col, sep = ""),
levels = paste("col", 1:ncol, sep = "")))
nrow <- 7
ncol <- 10
df.flag <- matrix(runif(nrow * ncol, min = -1, max = 1), nrow = nrow) %>%
as_tibble(.name_repair = ~seq(1:ncol)) %>%
rowid_to_column("row") %>%
gather(col, Correlation, -row) %>%
mutate(
row = factor(
paste("row", row, sep = ""),
levels = paste("row", 1:nrow, sep = "")),
col = factor(
paste("col", col, sep = ""),
levels = paste("col", 1:ncol, sep = ""))) %>%
mutate(Correlation = ifelse(abs(Correlation) < 0.5, NA, Correlation))
df.bar <- data.frame(
row = 1:nrow,
grp = paste("grp", c(rep(1, nrow - 3), rep(2, 3)), sep = "")) %>%
mutate(
row = factor(
paste("row", row, sep = ""),
levels = paste("row", 1:nrow, sep = "")))
Related
I'm currently in the process of creating a heatmap with plotly. Below is the sample dataset:
library(tidyverse)
library(plotly)
library(hrbrthemes)
set.seed(9999)
df <- data.frame(group.int = rep(c(rep("Prevention", 3), "Diagnosis", rep("Intervention", 2)), 6),
int = rep(c("Prevention 1", "Prevention 2", "Prevention 3", "Diagnosis 1", "Intervention 1", "Intervention 2"), 6),
group.outcome = c(rep("Efficacy", 12), rep("Safety", 18), rep("Cost-effectiveness", 6)),
outcome = c(rep("Efficacy 1", 6), rep("Efficacy 2", 6), rep("Safety 1", 6), rep("Safety 2", 6), rep("Safety 3", 6), rep("Cost-effectiveness 1", 6)),
n = sample(50:250, 36, rep = TRUE)
)
df$group.int <- factor(df$group.int, levels = c("Prevention", "Diagnosis", "Intervention"))
df$group.outcome <- factor(df$group.outcome, levels = c("Efficacy", "Safety", "Cost-effectiveness"))
I want to make a heatmap based on the variable outcome against int, with n as the fill of each heatmap cell. Here is the desired plot:
I tried using ggplotly from the created ggplot:
plotly.df <- ggplot(df,
aes(x = int, y = outcome, fill= n)) +
geom_tile() +
scale_fill_gradient(low="white", high="darkred") +
scale_y_discrete(position = "right") +
facet_grid(group.outcome ~ group.int,
scales = "free", space = "free", switch = "x") +
theme_bw() +
theme(axis.ticks = element_blank(),
legend.position = "left",
strip.placement = "outside",
strip.background = element_blank())
ggplotly(plotly.df)
However, ggplotly seems to ignore space = "free" in facet_grid, so the size of the cells are not proportional:
Is there a way to adjust facet widths with ggplotly?
Thank you very much in advance
You don't have to reinvent the wheel. Go back to the first ggplotly object. Domain is what plotly uses to govern the spaces each facet (or as it is in plotly-subplot). You can retrieve this information by assigning the ggplotly graph to an object and calling plotly_json.
However, I've worked around layout shortcuts before. You can retrieve and modify the domains like this:
p = ggplotly(plotly.df)
p$x$layout$xaxis$domain <- c(0, 1/2) # 6 blocks, 3 in this group 1/6 * 3
p$x$layout$xaxis2$domain <- c(1/2, 2/3) # start at previous position, 1 in this group
p$x$layout$xaxis3$domain <- c(2/3, 1) # remaining space
p$x$layout$yaxis3$domain <- c(0, 1/6) # 1 block in bottom chunks
p$x$layout$yaxis2$domain <- c(1/6, 2/3) # 3 in mid group
p$x$layout$yaxis$domain <- c(2/3, 1) # remaining space
p
That got me this far:
Your bottom labels are still aligned, but the top is not. Additionally, the left bottom label is cut off.
To fix the top labels I used plotly_json to figure out where they were at then used the guess-and-check method. To adjust for labels, I modified the margin.
# prevention
p$x$layout$annotations[[3]]$x <- 1/4
# diagnosis
p$x$layout$annotations[[4]]$x <- 7/12
p %>% layout(margin = list(t = 40, r = 50, b = 80, l = 130))
Update based on comments
Consider the following as a replacement for everything that follows p = ggplotly(plotly.df) (So you won't use anything about this, but you'll see that the code above is still here.)
The facets
#------------- position and spacing facets -------------
p$x$layout$xaxis$domain <- c(0, 1/2) # 6 blocks, 3 in this group 1/6 * 3
p$x$layout$xaxis2$domain <- c(1/2, 2/3) # 1 in this group
p$x$layout$xaxis3$domain <- c(2/3, 1) # remaining space
p$x$layout$yaxis3$domain <- c(0, 1/6) # 1 block in bottom chunks
p$x$layout$yaxis2$domain <- c(1/6, 2/3) # 3 in mid group
p$x$layout$yaxis$domain <- c(2/3, 1) # remaining space
The labels
#------------- position and spacing labels -------------
# prevention
p$x$layout$annotations[[3]]$x <- 1/4
# diagnosis
p$x$layout$annotations[[4]]$x <- 7/12
# bottom group labels: prevention, diagnosis, intervention/ adjust down
lapply(3:5, function(i){
p$x$layout$annotations[[i]]$y <<- -0.1575
})
# efficacy, safety and cost effectiveness/ shift right
lapply(6:8, function(i){
p$x$layout$annotations[[i]]$x <<- 1.25
p$x$layout$annotations[[i]]$yanchor <<- "top"
})
# int
p$x$layout$annotations[[1]]$y <- -0.07
# outcome
p$x$layout$annotations[[2]]$x <- 1.475
p$x$layout$annotations[[2]]$textangle <- 90 # 180 degree flip
The legend
#------------- position and spacing legend -------------
# capture the font sizes of the other annotations
tf <- p$x$layout$xaxis$tickfont
# change the font of the group labels
lapply(3:8, function(i){
p$x$layout$annotations[[i]]$font <<- tf
})
# update the ticks to represent the values of n, not the scale
getCol <- data.frame(p$x$data[[10]]$marker$colorscale) # capture the scale
getCol$n <- seq(from = 50, to = 208, along.with = 1:300) %>% round(digits = 0)
summary(getCol)
(getVals <- filter(getCol, n %in% seq(50, 200, by = 50)))
# X1 X2 n
# 1 0.0000000 #FFFFFF 50
# 2 0.3143813 #E5B4A8 100
# 3 0.3177258 #E5B3A7 100
# 4 0.6321070 #C16B57 150
# 5 0.6354515 #C06A56 150
# 6 0.9464883 #941B0E 200
# 7 0.9498328 #931A0E 200
# the legend
p$x$data[[10]]$marker$colorbar <- list(x = -.2, tickfont = tf,
tickmode = "array",
ticktext = seq(50, 200, by = 50),
# from getVals output
tickvals = c(0, .318, .636, .95),
outlinewidth = 0,
thickness = 20)
and finally...
# legend and yaxis labels; the final plot
p %>% layout(margin = list(t = 10, r = 170, b = 120, l = 10),
yaxis = list(side = "right", anchor = "free", position = 1),
yaxis2 = list(side = "right", anchor = "free", position = 1),
yaxis3 = list(side = "right", anchor = "free", position = 1))
I just encountered such graph attached where two colors of geom_point are used (I believe it is made by ggplot2). Similarly, I would like to have dots of one color to range from size 1 to 5, and have another color for a series of dots for the range 10 to 50. I have however no clue on how to add two different ranges of point in one graph.
At the basic step I have:
a <- c(1,2,3,4,5)
b <- c(10,20,30,40,50)
Species <- factor(c("Species1","Species2","Species3","Species4","Species5"))
bubba <- data.frame(Sample1=a,Sample2=b,Species=Species)
bubba$Species=factor(bubba$Species, levels=bubba$Species)
xm=melt(bubba,id.vars = "Species", variable.name="Samples", value.name = "Size")
str(xm)
ggplot(xm,aes(x= Samples,y= fct_rev(Species)))+geom_point(aes(size=Size))+scale_size(range = range(xm$Size))+theme_bw()
Any would have clues where I should look into ? Thanks!
I've got an approach that gets 90% of the way there, but I'm not sure how to finish the deed. To get a single legend for size, I used a transformation to convert input size to display size. That makes the legend appearance conform to the display. What I don't have figured out yet is how to apply a similar transformation to the fill so that both can be integrated into the same legend.
Here's the transformation, which in this case shrinks everything 10 or more:
library(scales)
shrink_10s_trans = trans_new("shrink_10s",
transform = function(y){
yt = if_else(y >= 10, y*0.1, y)
return(yt)
},
inverse = function(yt){
return(yt) # Not 1-to-1 function, picking one possibility
}
)
Then we can use this transformation on the size to selectively shink only the dots that are 10 or larger. This works out nicely for the legend, aside from integrating the fill encoding with the size encoding.
ggplot(xm,aes(x= Samples,y= fct_rev(Species), fill = Size < 10))+
geom_point(aes(size=Size), shape = 21)+
scale_size_area(trans = shrink_10s_trans, max_size = 10,
breaks = c(1,2,3,10,20,30,40),
labels = c(1,2,3,10,20,30,40)) +
scale_fill_manual(values = c(rgb(136,93,100, maxColorValue = 255),
rgb(236,160,172, maxColorValue = 255))) +
theme_bw()
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
bubba$Species <- factor(bubba$Species, levels = bubba$Species)
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = Size, color = Size)) +
scale_color_continuous(breaks = c(1,2,3,10,20,30), guide = guide_legend()) +
scale_size(range = range(xm$Size), breaks = c(1,2,3,10,20,30)) +
theme_bw()
Here's a cludge. I haven't got time to figure out the legend at the moment. Note that 1 and 10 are the same size, but a different colour, as are 3 and 40.
# Create data frame
a <- c(1, 2, 3, 4, 5)
b <- c(10, 20, 30, 40, 50)
Species <- factor(c("Species1", "Species2", "Species3", "Species4", "Species5"))
bubba <- data.frame(Sample1 = a, Sample2 = b, Species = Species)
# Restructure data
xm <- reshape2::melt(bubba, id.vars = "Species", variable.name = "Samples", value.name = "Size")
# Calculate bubble size
bubble_size <- function(val){
ifelse(val > 3, (1/15) * val + (1/3), val)
}
# Calculate bubble colour
bubble_colour <- function(val){
ifelse(val > 3, "A", "B")
}
# Calculate bubble size and colour
xm %<>%
mutate(bub_size = bubble_size(Size),
bub_col = bubble_colour(Size))
# Plot data
ggplot(xm, aes(x = Samples, y = fct_rev(Species))) +
geom_point(aes(size = bub_size, fill = bub_col), shape = 21, colour = "black") +
theme(panel.grid.major = element_line(colour = alpha("gray", 0.5), linetype = "dashed"),
text = element_text(family = "serif"),
legend.position = "none") +
scale_size(range = c(1, 20)) +
scale_fill_manual(values = c("brown", "pink")) +
ylab("Species")
I think you are looking for bubble plots in R
https://www.r-graph-gallery.com/bubble-chart/
That said, you probably want to build the right and left the side of the graphic separately and then combine.
I have a dataframe:
gene_symbol<-c("DADA","SDAASD","SADDSD","SDADD","ASDAD","XCVXCVX","EQWESDA","DASDADS","SDASDASD","DADADASD","sdaadfd","DFSD","SADADDAD","SADDADADA","DADSADSASDWQ","SDADASDAD","ASD","DSADD")
panel<-c("growth","growth","growth","growth","big","big","big","small","small","dfgh","DF","DF","DF","DF","DF","gh","DF","DF")
ASDDA<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDb<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf1<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf2<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf3<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf4<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf5<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDA1<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDb1<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf1<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf11<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf21<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf31<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf41<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
ASDDAf51<-c("normal","over","low","over","normal","over","low","over","normal","over","DF","DF","DF","DF","DF","DF","DF","DF")
Gene_states22 <- data.frame(gene_symbol, panel, ASDDA, ASDDb, ASDDAf, ASDDAf1, ASDDAf2,
ASDDAf3, ASDDAf4, ASDDAf5, ASDDA1, ASDDb1, ASDDAf1, ASDDAf11,
ASDDAf21, ASDDAf31, ASDDAf41, ASDDAf51)
And I create a heatmap with:
library(ggplot2); library(reshape2)
HG3 <- split(Gene_states22[,1:15], Gene_states22$panel)
HG4 <- melt(HG3, id.vars= c("gene_symbol","panel"))
HG4 <- HG4[,-5]
pp <- ggplot(HG4, aes(gene_symbol,variable)) +
geom_tile(aes(fill = value),
colour = "grey50") +
facet_grid(~panel, scales = "free" ,space = "free") +
scale_fill_manual(values = c("white", "red", "blue", "black", "yellow", "green", "brown"))
As you can see I use facet_grid to separate my heatmap into groups based on panel value. The problem is that when I use ggplotly(pp) the column width differs from group to group and my plot seems ugly.
In order to fix the issue I used adapted answer of Plotly and ggplot with facet_grid in R: How to to get yaxis labels to use ticktext value instead of range value?
:
library(plotly)
library(ggplot2)
library(data.table)
library(datasets)
#add fake model for use in facet
dt<-data.table(HG4[1:50,])
dt[,variable:=rownames(HG4)]
dt[,panel:=substr(variable,1,regexpr(" ",variable)-1)][panel=="",panel:=variable]
ggplot.test<-ggplot(dt,aes(gene_symbol,variable))+facet_grid(panel~.,scales="free_y",space="free",drop=TRUE)+
geom_tile(aes(fill = value),
colour = "grey50") +
scale_fill_manual(values = c("white", "red", "blue", "black", "yellow", "green", "brown")) +
labs(title = "Heatmap", x = "gene_symbol", y = "sample", fill = "value") +
guides(fill = FALSE)+
theme(panel.background = element_rect(fill = NA),
panel.spacing = unit(0.5, "lines"), ## It was here where you had a 0 for distance between facets. I replaced it by 0.5 .
strip.placement = "outside")
p <- ggplotly(ggplot.test)
len <- length(unique(HG4$panel))
total <- 1
for (i in 2:len) {
total <- total + length(p[['x']][['layout']][[paste('yaxis', i, sep='')]][['ticktext']])
}
spacer <- 0.01 #space between the horizontal plots
total_length = total + len * spacer
end <- 1
start <- 1
for (i in c('', seq(2, len))) {
tick_l <- length(p[['x']][['layout']][[paste('yaxis', i, sep='')]][['ticktext']]) + 1
#fix the y-axis
p[['x']][['layout']][[paste('yaxis', i, sep='')]][['tickvals']] <- seq(1, tick_l)
p[['x']][['layout']][[paste('yaxis', i, sep='')]][['ticktext']][[tick_l]] <- ''
end <- start - spacer
start <- start - (tick_l - 1) / total_length
v <- c(start, end)
#fix the size
p[['x']][['layout']][[paste('yaxis', i, sep='')]]$domain <- v
}
p[['x']][['layout']][['annotations']][[3]][['y']] <- (p[['x']][['layout']][['yaxis']]$domain[2] + p[['x']][['layout']][['yaxis']]$domain[1]) /2
p[['x']][['layout']][['shapes']][[2]][['y0']] <- p[['x']][['layout']][['yaxis']]$domain[1]
p[['x']][['layout']][['shapes']][[2]][['y1']] <- p[['x']][['layout']][['yaxis']]$domain[2]
#fix the annotations
for (i in 3:len + 1) {
#fix the y position
p[['x']][['layout']][['annotations']][[i]][['y']] <- (p[['x']][['layout']][[paste('yaxis', i - 2, sep='')]]$domain[1] + p[['x']][['layout']][[paste('yaxis', i - 2, sep='')]]$domain[2]) /2
#trim the text
p[['x']][['layout']][['annotations']][[i]][['text']] <- substr(p[['x']][['layout']][['annotations']][[i]][['text']], 1, length(p[['x']][['layout']][[paste('yaxis', i - 2, sep='')]][['ticktext']]) * 3 - 3)
}
#fix the rectangle shapes in the background
for (i in seq(0,(len - 2) * 2, 2)) {
p[['x']][['layout']][['shapes']][[i+4]][['y0']] <- p[['x']][['layout']][[paste('yaxis', i /2 + 2, sep='')]]$domain[1]
p[['x']][['layout']][['shapes']][[i+4]][['y1']] <- p[['x']][['layout']][[paste('yaxis', i /2 + 2, sep='')]]$domain[2]
}
p
But the heatmap is still not correct:
So first things first:
In your case I am not even sure whether a plotly heatmap is what you need. In addition you should never convert a complicated ggplot to plotly. It will fail! In 90% of cases. Try recreating your plot in plotly or whereever you want it to end up. Anything else ends up in coding hell.
I started by doing some research:
Here is a good description how to create heatmaps with different colors in plotly
This explains how you can create titles in subplots.
From post 1 I know that I have to create a matrix for each level in your data. So I wrote a function for that:
mymat<-as.matrix(Gene_states22[,-1:-2])
### Creates a 1-NA dummy matrix for each level. The output is stored in a list
dummy_mat<-function(mat,levels,names_col){
mat_list<-lapply(levels,function(x){
mat[mat!=x]=NA
mat[mat==x]=1
mymat=t(apply(mat,2,as.numeric))
colnames(mymat)=names_col
return(mymat)
})
names(mat_list)=levels
return(mat_list)
}
my_mat_list<-dummy_mat(mymat,c('DF','low','normal','over'),Gene_states22$gene_symbol)
### Optional: The heatmap type is peculiar - I created a text-NA matrix for each category as well
text_mat<-function(mat,levels,names_col){
mat_list<-lapply(levels,function(x){
mat[mat!=x]=NA
mat=t(mat)
colnames(mat)=names_col
return(mat)
})
names(mat_list)=levels
return(mat_list)
}
my_mat_list_t<-text_mat(mymat,c('DF','low','normal','over'),as.character(Gene_states22$gene_symbol))
In addition I needed colors for each level. These colors are created using some dataframe. You may write a similar (lapply-)loop here as well:
DF_Color <- data.frame(x = c(0,1), y = c("#DEDEDE", "#DEDEDE"))
colnames(DF_Color) <- NULL
lowColor <- data.frame(x = c(0,1), y = c("#00CCFF", "#00CCFF"))
colnames(lowColor) <- NULL
normColor <- data.frame(x = c(0,1), y = c("#DEDE00", "#DEDE00"))
colnames(normColor) <- NULL
overColor <- data.frame(x = c(0,1), y = c("#DE3333", "#DE3333"))
colnames(overColor) <- NULL
In addition we need the columns in the matrix for each panel-category:
mycols<-lapply(levels(Gene_states22$panel),function(x) grep(x,Gene_states22$panel))
I stored this in a list as well.
Next I use lapply-loop to plot. I store the values in a list and use subplot to put everything together:
library(plotly)
p_list<-lapply(1:length(mycols),function(j){
columns<-mycols[[j]]
p<-plot_ly(
type = "heatmap"
) %>% add_trace(
y=rownames(my_mat_list$DF),x=colnames(my_mat_list$DF)[columns],
z = my_mat_list$DF[,columns],
xgap=3,ygap=3, text=my_mat_list_t$DF[,columns],hoverinfo="x+y+text",
colorscale = DF_Color,
colorbar = list(
len = 0.3,
y = 0.3,
yanchor = 'top',
title = 'DF series',
tickvals = ''
)
) %>% add_trace(
y=rownames(my_mat_list$low),x=colnames(my_mat_list$low)[columns],
z = my_mat_list$low[,columns],
xgap=3,ygap=3,text=my_mat_list_t$low[,columns],hoverinfo="x+y+text",
colorscale = lowColor,
colorbar = list(
len = 0.3,
y = 0.3,
yanchor = 'top',
title = 'low series',
tickvals = ''
)
) %>% add_trace(
y=rownames(my_mat_list$normal),x=colnames(my_mat_list$normal)[columns],
z = my_mat_list$normal[,columns],
xgap=3,ygap=3,text=my_mat_list_t$normal[,columns],hoverinfo="x+y+text",
colorscale = normColor,
colorbar = list(
len = 0.3,
y = 1,
yanchor = 'top',
title = 'normal series',
tickvals = ''
)
) %>% add_trace(
y=rownames(my_mat_list$over),x=colnames(my_mat_list$over)[columns],
z = my_mat_list$over[,columns],
xgap=3,ygap=3,text=my_mat_list_t$over[,columns],hoverinfo="x+y+text",
colorscale = overColor,
colorbar = list(
len = 0.3,
y = 1,
yanchor = 'top',
title = 'over series',
tickvals = ''
)
)
return(p)
})
subplot(p_list[[1]],p_list[[2]],shareY = TRUE) %>%
layout(annotations = list(
list(x = 0.2 , y = 1.05, text = levels(Gene_states22$panel)[1], showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 1.05, text = levels(Gene_states22$panel)[2], showarrow = F, xref='paper', yref='paper'))
)
POSSIBLE ISSUES:
You have to become create around categories like dfgh which occur only once. If only one column is selected in R, the output is automatically transformed into a (numeric or character) vector-type. Thus maybe add an as.matrix() to all z and text arguments
hover-text doesn't really work. But plotly has a good documentation there. You should be able to figure that out.
You also have to specify the width in the subplot-function. That will be fiddly if you have more than 10 categories.
Interactivity doesn't really work. You can't remove traces. Why? No idea. Do some research if you need it. I guess it is connected with the plot type.
I recommend specifying the extend of the plot(s) in px. That might make the tiles more similar.
Finally you will need some reference for the (subplot) titles and you will need to adjust the margins of your plot. Such that the titles are visible.
I have a dataframe like so:
set.seed(453)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= c(sample(1:15, 20, replace=T), sample(1:18, 16,replace=T))
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
Frequencies freq of each lepsp within each year are ranked in the rank column. Larger freq values correspond to the smallest rank value and smaller freq values have the largest rank values. Some rankings are repeated if levels of lepsp have the same abundance.
I would like to split the df into multiple subsets by year. Then I would like to plot each subsetted dataframe in a multipanel figure. Essentially this is to create species abundance curves. The x-axis would be rank and the yaxis needs to be freq.
In my real dataframe I have 22 years of data. I would prefer the graphs to be displayed as 2 columns of 4 rows for a total of 8 graphs per page. Essentially I would have to repeat the solution offered here 3 times.
I also need to demarcate the 25%, 50% and 75% quartiles with vertical lines to look like this (desired result):
It would be great if each graph specified the year to which it belonged, but since all axis are the same name, I do not want x and y labels to be repeated for each graph.
I have tried to plot multiple lines on the same graph but it gets messy.
year.vec<-unique(df$year)
plot(sort(df$freq[df$year==year.vec[1]],
decreasing=TRUE),bg=1,type="b", ylab="Abundance", xlab="Rank",
pch=21, ylim=c(0, max(df$freq)))
for (i in 2:22){
points(sort(df$freq[df$year==year.vec[i]], decreasing=TRUE), bg=i,
type="b", pch=21)
}
legend("topright", legend=year.vec, pt.bg=1:22, pch=21)
I have also tried a loop, however it does not produce an output and is missing some of the arguments I would like to include:
jpeg('pract.jpg')
par(mfrow = c(6, 4)) # 4 rows and 2 columns
for (i in unique(levels(year))) {
plot(df$rank,df$freq, type="p", main = i)
}
dev.off()
Update
(Attempted result)
I found the following code after my post which gets me a little closer, but is still missing all the features I would like:
library(reshape2)
library(ggplot2)
library (ggthemes)
x <- ggplot(data = df2, aes(x = rank, y = rabun)) +
geom_point(aes(fill = "dodgerblue4")) +
theme_few() +
ylab("Abundance") + xlab("Rank") +
theme(axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_blank(), # we don't want individual plot titles as the facet "strip" will give us this
legend.position = "none", # we don't want a legend either
panel.border = element_rect(fill = NA, color = "darkgrey", size = 1.25, linetype = "solid"),
axis.ticks = element_line(colour = 'darkgrey', size = 1.25, linetype = 'solid')) # here, I just alter to colour and thickness of the plot outline and tick marks. You generally have to do this when faceting, as well as alter the text sizes (= element_text() in theme also)
x
x <- x + facet_wrap( ~ year, ncol = 4)
x
I prefer base R to modify graph features, and have not been able to find a method using base R that meets all my criteria above. Any help is appreciated.
Here's a ggplot approach. First off, I made some more data to get the 3x2 layout:
df = rbind(df, mutate(df, year = year + 4), mutate(df, year = year + 8))
Then We do a little manipulation to generate the quantiles and labels by group:
df_summ =
df %>% group_by(year) %>%
do(as.data.frame(t(quantile(.$rank, probs = c(0, 0.25, 0.5, 0.75)))))
names(df_summ)[2:5] = paste0("q", 0:3)
df_summ_long = gather(df_summ, key = "q", value = "value", -year) %>%
inner_join(data.frame(q = paste0("q", 0:3), lab = c("Common", "Rare-75% -->", "Rare-50% -->", "Rare-25% -->"), stringsAsFactors = FALSE))
With the data in good shape, plotting is fairly simple:
library(ggthemes)
library(ggplot2)
ggplot(df, aes(x = rank, y = freq)) +
geom_point() +
theme_few() +
labs(y = "Abundance (% of total)", x = "Rank") +
geom_vline(data = df_summ_long[df_summ_long$q != "q0", ], aes(xintercept = value), linetype = 4, size = 0.2) +
geom_text(data = df_summ_long, aes(x = value, y = Inf, label = lab), size = 3, vjust = 1.2, hjust = 0) +
facet_wrap(~ year, ncol = 2)
There's some work left to do - mostly in the rarity text overlapping. It might not be such an issue with your actual data, but if it is you could pull the max y values into df_summ_long and stagger them a little bit, actually using y coordinates instead of just Inf to get it at the top like I did.
From a data frame I want to plot a pie chart for five categories with their percentages as labels in the same graph in order from highest to lowest, going clockwise.
My code is:
League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables
p<-ggplot(data,aes(x="",fill=League))
p<-p+geom_bar(width=1)
p<-p+coord_polar(theta="y")
p<-p+geom_text(data,aes(y=cumsum(sort(table(data)))-0.5*sort(table(data)),label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")))
p
I use
cumsum(sort(table(data)))-0.5*sort(table(data))
to place the label in the corresponding portion and
label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")
for the labels which is the percentages.
I get the following output:
Error: ggplot2 doesn't know how to deal with data of class uneval
I've preserved most of your code. I found this pretty easy to debug by leaving out the coord_polar... easier to see what's going on as a bar graph.
The main thing was to reorder the factor from highest to lowest to get the plotting order correct, then just playing with the label positions to get them right. I also simplified your code for the labels (you don't need the as.character or the rep, and paste0 is a shortcut for sep = "".)
League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables
data$League <- reorder(data$League, X = data$League, FUN = function(x) -length(x))
at <- nrow(data) - as.numeric(cumsum(sort(table(data)))-0.5*sort(table(data)))
label=paste0(round(sort(table(data))/sum(table(data)),2) * 100,"%")
p <- ggplot(data,aes(x="", fill = League,fill=League)) +
geom_bar(width = 1) +
coord_polar(theta="y") +
annotate(geom = "text", y = at, x = 1, label = label)
p
The at calculation is finding the centers of the wedges. (It's easier to think of them as the centers of bars in a stacked bar plot, just run the above plot without the coord_polar line to see.) The at calculation can be broken out as follows:
table(data) is the number of rows in each group, and sort(table(data)) puts them in the order they'll be plotted. Taking the cumsum() of that gives us the edges of each bar when stacked on top of each other, and multiplying by 0.5 gives us the half the heights of each bar in the stack (or half the widths of the wedges of the pie).
as.numeric() simply ensures we have a numeric vector rather than an object of class table.
Subtracting the half-widths from the cumulative heights gives the centers each bar when stacked up. But ggplot will stack the bars with the biggest on the bottom, whereas all our sort()ing puts the smallest first, so we need to do nrow - everything because what we've actually calculate are the label positions relative to the top of the bar, not the bottom. (And, with the original disaggregated data, nrow() is the total number of rows hence the total height of the bar.)
Preface: I did not make pie charts of my own free will.
Here's a modification of the ggpie function that includes percentages:
library(ggplot2)
library(dplyr)
#
# df$main should contain observations of interest
# df$condition can optionally be used to facet wrap
#
# labels should be a character vector of same length as group_by(df, main) or
# group_by(df, condition, main) if facet wrapping
#
pie_chart <- function(df, main, labels = NULL, condition = NULL) {
# convert the data into percentages. group by conditional variable if needed
df <- group_by_(df, .dots = c(condition, main)) %>%
summarize(counts = n()) %>%
mutate(perc = counts / sum(counts)) %>%
arrange(desc(perc)) %>%
mutate(label_pos = cumsum(perc) - perc / 2,
perc_text = paste0(round(perc * 100), "%"))
# reorder the category factor levels to order the legend
df[[main]] <- factor(df[[main]], levels = unique(df[[main]]))
# if labels haven't been specified, use what's already there
if (is.null(labels)) labels <- as.character(df[[main]])
p <- ggplot(data = df, aes_string(x = factor(1), y = "perc", fill = main)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = perc_text), size = 4) +
# add the category labels to the chart
# increase x / play with label strings if labels aren't pretty
geom_text(aes(x = 1.82, y = label_pos, label = labels), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y") +
# formatting
scale_y_continuous(breaks = NULL) +
scale_fill_discrete(name = "", labels = unique(labels)) +
theme(text = element_text(size = 22),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank())
# facet wrap if that's happening
if (!is.null(condition)) p <- p + facet_wrap(condition)
return(p)
}
Example:
# sample data
resps <- c("A", "A", "A", "F", "C", "C", "D", "D", "E")
cond <- c(rep("cat A", 5), rep("cat B", 4))
example <- data.frame(resps, cond)
Just like a typical ggplot call:
ex_labs <- c("alpha", "charlie", "delta", "echo", "foxtrot")
pie_chart(example, main = "resps", labels = ex_labs) +
labs(title = "unfacetted example")
ex_labs2 <- c("alpha", "charlie", "foxtrot", "delta", "charlie", "echo")
pie_chart(example, main = "resps", labels = ex_labs2, condition = "cond") +
labs(title = "facetted example")
It worked on all included function greatly inspired from here
ggpie <- function (data)
{
# prepare name
deparse( substitute(data) ) -> name ;
# prepare percents for legend
table( factor(data) ) -> tmp.count1
prop.table( tmp.count1 ) * 100 -> tmp.percent1 ;
paste( tmp.percent1, " %", sep = "" ) -> tmp.percent2 ;
as.vector(tmp.count1) -> tmp.count1 ;
# find breaks for legend
rev( tmp.count1 ) -> tmp.count2 ;
rev( cumsum( tmp.count2 ) - (tmp.count2 / 2) ) -> tmp.breaks1 ;
# prepare data
data.frame( vector1 = tmp.count1, names1 = names(tmp.percent1) ) -> tmp.df1 ;
# plot data
tmp.graph1 <- ggplot(tmp.df1, aes(x = 1, y = vector1, fill = names1 ) ) +
geom_bar(stat = "identity", color = "black" ) +
guides( fill = guide_legend(override.aes = list( colour = NA ) ) ) +
coord_polar( theta = "y" ) +
theme(axis.ticks = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text( colour = "black"),
axis.title = element_blank(),
plot.title = element_text( hjust = 0.5, vjust = 0.5) ) +
scale_y_continuous( breaks = tmp.breaks1, labels = tmp.percent2 ) +
ggtitle( name ) +
scale_fill_grey( name = "") ;
return( tmp.graph1 )
} ;
An example :
sample( LETTERS[1:6], 200, replace = TRUE) -> vector1 ;
ggpie(vector1)
Output