Draw heatmap tiles for all combination of x-y - r

I asked a question about the heatmap which was solved here: custom colored heatmap of categorical variables. I defined my scale_fill_manual for all combinations as suggested in the accepted answer.
Based on this question, I would like to know how to tell ggplot2 to plot a heatmap with all combination of variables and not just the ones that are available in the dataframe (given that they are already in the scale_fill_manual but are not showing in the final plot).
How can I do this?
The current plotting code:
df <- data.frame(X = LETTERS[1:3],
Likelihood = c("Almost Certain","Likely","Possible"),
Impact = c("Catastrophic", "Major","Moderate"),
stringsAsFactors = FALSE)
df$color <- paste0(df$Likelihood,"-",df$Impact)
ggplot(df, aes(Impact, Likelihood)) + geom_tile(aes(fill = color),colour = "white") + geom_text(aes(label=X)) +
scale_fill_manual(values = c("Almost Certain-Catastrophic" = "red","Likely-Major" = "yellow","Possible-Moderate" = "blue"))
scale_fill_manual contains all combination of Impact, Likelihood with their respective colors.

Similar to #aosmith I tried expand.grid to get a finite set of combinations but tidyr::complete() works pretty nice as well. Add the colors and letters and fill using a set color range.
df <- data.frame(Likelihood = c("Almost Certain","Likely","Possible"),
Impact = c("Catastrophic", "Major","Moderate"),
stringsAsFactors = FALSE)
df2 <- df %>% tidyr::complete(Likelihood,Impact) # alt expand.grid(df)
df2$X <- LETTERS[1:9] # Add letters here
df2$color <- paste0(df2$Likelihood,"-",df2$Impact) # Add colors
ggplot(df2, aes(Impact, Likelihood)) + geom_tile(aes(fill = color),colour = "white") + geom_text(aes(label=X)) +
scale_fill_manual(values = RColorBrewer::brewer.pal(9,"Pastel1"))

Related

Plot legend for multiple histograms plotted on top of each other ggplot

I've made this multiple histogram plot in ggplot and now I want to add a legend for both the light purple part and the dark purple part. I know the conventional way is to to it with aes, but I can't seem to figure out how I integrate this feature as one into my multiple histogram plot.
I don't shy manual labour, but more sophisticated solutions are preferred. Anyone help me out?
#dataframe
set.seed(20)
df <- data.frame(expl = rbinom(n=100, size = 1, prob=0.08),
resp = sample(50:100, size = 100, replace = T))
#graph
graph <- ggplot(data = df, aes(x = resp))
graph +
geom_histogram(fill = "#BEBADA", alpha = 0.5, bins = 10) +
geom_histogram(data = subset(df, expl == '1'), fill = "#BEBADA", bins = 10)
Your data is already in the long format that is well suited for ggplot; you just need to map expl to alpha. In general, if you find yourself making multiples of the same geom, you probably want to rethink either the shape of your data or your approach for feeding it into geoms.
library(tidyverse)
set.seed(20)
df <- data.frame(expl = rbinom(n=100, size = 1, prob=0.08),
resp = sample(50:100, size = 100, replace = T))
To map expl onto alpha, make it a factor, and then assign that to alpha inside your aes. Then you can set the alpha scale to values of 0.5 and 1.
ggplot(df, aes(x = resp, alpha = as.factor(expl))) +
geom_histogram(fill = "#bebada", bins = 10) +
scale_alpha_manual(values = c(0.5, 1))
However, differentiating by alpha is a little awkward. You could instead map to fill and use light and dark purples:
ggplot(df, aes(x = resp, fill = as.factor(expl))) +
geom_histogram(bins = 10) +
scale_fill_manual(values = c("0" = "mediumpurple1", "1" = "mediumpurple4"))
Note also that you can adjust the position of the histogram bars if you need to, by assigning geom_histogram(position = ...), where you could fill in with something such as "dodge" if that's what you'd like.
If you want a legend on the alpha value, the idea is to include it as an aesthetic rather than as a direct argument as you tried. In order to do this, a simple solution is to enrich the data frame used by ggplot:
df2 <- rbind(
cbind(df, filter="all lines"),
cbind(subset(df, expl == '1'), filter="expl==1")
)
df2 corresponds to df after appending the lines from your subset of interest (with a field filter telling from which copy each record comes)
Then, this solves your problem
ggplot(df2, aes(resp, alpha=filter)) +
geom_histogram(fill="#BEBADA", bins=10, position="identity") +
scale_alpha_discrete(range=c(.5,1))

ggplot not respectig order of colours in scale_fill_manual()?

I am creating some violin plots and want to colour them. It works for a matrix of dimension 9, as in my previous question
ggplot violin plot, specify different colours by group?
but when I increase the dimension to 15, the order of the colours are not respected. Why is this happening?
Here it works (9 columns):
library(ggplot2)
dat <- matrix(rnorm(250*9),ncol=9)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:9,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),3))
pp
Here it does not work (15 columns):
library(ggplot2)
dat <- matrix(rnorm(250*15),ncol=15)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:15,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),5))
pp
This is related to setting factor levels. Since variable_grouping is a character, ggplot2 converts it to a factor for plotting. It uses the default factor order, where 1 always comes before 2. So in your example 11-15 all come before 2 in the legend.
You can manually set the factor order to avoid the default order. I use forcats::fct_inorder() for this because it's convenient in this case where you want the order of the factor to match the order of the variable. Note you can also use factor() directly and set the level order via the levels argument.
ggplot(mat, aes(x = variable, y = value, fill = forcats::fct_inorder(variable_grouping))) +
geom_violin(scale="width", adjust = 1, width = 0.5) +
scale_fill_manual(values=rep(c("red","green","blue"),5)
You can also name the color vector. For example:
my_values <- rep(c("red","green","blue"),5)
names(my_values) <- rep(c("Data1","Data2","Data3"),5)
... +
scale_fill_manual(values=my_values)

ggplot violin plot, specify different colours by group?

I have a matrix of 9 columns and I want to create a violin plot using ggplot2. I would like to have different colours for groups of three columns, basically increasing order of "grayness". How can I do this?
I have tried imputing lists of colours on the option "fill=" but it does not work. See my example below. At the moment, it indicates "gray80", but I want to be able to specify the colour for each violin plot, in order to be able to specify the colour for groups of 3.
library(ggplot2)
dat <- matrix(rnorm(100*9),ncol=9)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
pp <- ggplot(mat, aes(x = variable, y = value)) + geom_violin(scale="width",adjust = 1,width = 0.5,fill = "gray80")
pp
We can add a new column, called variable_grouping to your data, and then specify fill in aes:
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- ifelse(mat$variable %in% c('X1', 'X2', 'X3'), 'g1',
ifelse(mat$variable %in% c('X4','X5','X6'),
'g2', 'g3'))
ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5)
You can control the groupings using the ifelse statement. scale_fill_manual can be used to specify the different colors used to fill the violins.

Coloring a geom_histogram by gradient

I'm trying to plot a geom_histogram where the bars are colored by a gradient.
This is what I'm trying to do:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
ggplot(df,aes_string(x="val",y="..count..+1",fill="val"))+geom_histogram(binwidth=1,pad=TRUE)+scale_y_log10()+scale_fill_gradient2("val",low="darkblue",high="darkred")
But getting:
Any idea how to get it colored by the defined gradient?
Not sure you can fill by val because each bar of the histogram represents a collection of points.
You can, however, fill by categorical bins using cut. For example:
ggplot(df, aes(val, fill = cut(val, 100))) +
geom_histogram(show.legend = FALSE)
Just for completeness.
If the colors I'd like to have the gradient on to be manually selected here's what I suggest:
data:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
colors:
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
cuts <- cut(df$val,bins)
names(cuts) <- sapply(cuts,function(t) cut.cols[which(as.character(t) == levels(cuts))])
plot:
ggplot(df,aes(val,fill=cut(val,bins))) +
geom_histogram(show.legend=FALSE) +
scale_color_manual(values=cut.cols,labels=levels(cuts)) +
scale_fill_manual(values=cut.cols,labels=levels(cuts))
Instead of binning manually another option would be to make use of the bins computed by stat_bin by mapping ..x.. (or factor(..x..) in case of a discrete scale) or after_stat(x) on the fill aesthetic.
An issue with computing the bins manually is that we end up with multiple groups per bin for which the count has to be computed (even if the count is zero most of the time) and which get stacked on top of each other in the histogram. Especially, this gets problematic if one would add labels of counts to the histogram as can be seen in this post, because in that case one ends up with multiple labels per bin.
library(ggplot2)
set.seed(1)
df <- data.frame(id = paste("ID", 1:1000, sep = "."), val = rnorm(1000), stringsAsFactors = F)
ggplot(df, aes(x = val, y = ..count.. + 1, fill = ..x..)) +
geom_histogram(binwidth = .1, pad = TRUE) +
scale_y_log10() +
scale_fill_gradient2(name = "val", low = "darkblue", high = "darkred")
#> Warning: Duplicated aesthetics after name standardisation: pad

How can I integrate multiple distinctive plot for file bar with common label and legend?

I am trying to integrate multiple plot for file bar, where each file has two stack bar plot, and plot easily confused without wrap them into one single grid. However, I intend to improve the result of this plot that add common legend and label for whole graph. I tried several away to integrate multiple plot for each file in more clear way, so putting these into one grid for file bar could be more elegant and easy to understand the output. I confused about the answer from several similar post in SO, bit of new with ggplot2, I couldn't produce my desired plot at the end. Can any give me possible idea to improve this current plot in better way ? How can I add common label and legend for multiple graph ? Any idea please ?
reproducible data.frame :
Qualified <- list(
hotan = data.frame( begin=c(7,13,19,25,31,37,43,49,55,67,79,103,31,49,55,67),
end= c(10,16,22,28,34,40,46,52,58,70,82,106,34,52,58,70),
pos.score=c(11,19,8,2,6,14,25,10,23,28,15,17,6,10,23,28)),
aksu = data.frame( begin=c(12,21,30,39,48,57,66,84,111,30,48,66,84),
end= c(15,24,33,42,51,60,69,87,114,33,51,69,87),
pos.score=c(5,11,15,23,9,13,2,10,16,15,9,2,10)),
korla = data.frame( begin=c(6,14,22,30,38,46,54,62,70,78,6,30,46,70),
end=c(11,19,27,35,43,51,59,67,75,83,11,35,51,75),
pos.score=c(9,16,12,3,20,7,11,13,14,17,9,3,7,14))
)
unQualified <- list(
hotan = data.frame( begin=c(21,33,57,69,81,117,129,177,225,249,333,345,33,81,333),
end= c(26,38,62,74,86,122,134,182,230,254,338,350,38,86,338),
pos.score=c(7,34,29,14,23,20,11,30,19,17,6,4,34,23,6)),
aksu = data.frame( begin=c(13,23,33,43,53,63,73,93,113,123,143,153,183,33,63,143),
end= c(19,29,39,49,59,69,79,99,119,129,149,159,189,39,69,149),
pos.score=c(5,13,32,28,9,11,22,12,23,3,6,8,16,32,11,6)),
korla = data.frame( begin=c(23,34,45,56,67,78,89,122,133,144,166,188,56,89,144),
end=c(31,42,53,64,75,86,97,130,141,152,174,196,64,97,152),
pos.score=c(3,10,19,17,21,8,18,14,4,9,12,22,17,18,9))
)
I am categorzing data and get multiple plot in this way (mainly influenced by #Jake Kaupp's idea) :
multi_plot <- function(x) {
p1 <- ggplot(x, aes(x = group)) +
geom_bar(aes(fill = elm), color = "black")
p2 <- ggplot(distinct(x), aes(x = elm)) +
geom_bar(aes(fill = group), color = "black")
arrangeGrob(p1, p2,nrow = 1, top = unique(x$list))
}
singleDF <-
bind_rows(c(Qualified = Qualified, Unqualified = unQualified), .id = "id") %>%
tidyr::separate(id, c("group", "list")) %>%
mutate(elm = ifelse(pos.score >= 10, "valid", "invalid")) %>%
arrange(list, group, desc(elm))
plot_data <- singleDF %>%
split(.$list) %>%
map(~split_plot(.x))
grid.arrange(grobs = plot_data, nrow = 1)
I am trying to integrate multiple plot for file bar with common label and common legend position. In terms of common legend, I intend to call X axis as sample, Y axis as observation; in terms of common legend position, I intend to indicate legend at right side of plot (only four common legend).
EDIT:
In my desired output plot, stack bar plot of group and elm must be put in one single grid for file bar. Regarding whole graph, pursuing common label and legend is desired.
How can I achieve my desired output ? What change has to be taken in original implementation ? sorry for this simple question in SO. Thanks in advance
combinedDF <-
bind_rows(mutate(singleDF, x = group, fill = elm),
mutate(singleDF, x = elm, fill = group) %>% distinct()) %>%
mutate(x = factor(x, levels = c('invalid', 'valid', 'Unqualified', 'Qualified')),
fill = factor(fill, levels = c('invalid', 'valid', 'Unqualified', 'Qualified')))
ggplot(combinedDF, aes(x = x, fill = fill)) +
geom_bar() +
geom_text(aes(label = ..count..), stat = 'count', position = 'stack') +
facet_grid(~list)

Resources