I've made this multiple histogram plot in ggplot and now I want to add a legend for both the light purple part and the dark purple part. I know the conventional way is to to it with aes, but I can't seem to figure out how I integrate this feature as one into my multiple histogram plot.
I don't shy manual labour, but more sophisticated solutions are preferred. Anyone help me out?
#dataframe
set.seed(20)
df <- data.frame(expl = rbinom(n=100, size = 1, prob=0.08),
resp = sample(50:100, size = 100, replace = T))
#graph
graph <- ggplot(data = df, aes(x = resp))
graph +
geom_histogram(fill = "#BEBADA", alpha = 0.5, bins = 10) +
geom_histogram(data = subset(df, expl == '1'), fill = "#BEBADA", bins = 10)
Your data is already in the long format that is well suited for ggplot; you just need to map expl to alpha. In general, if you find yourself making multiples of the same geom, you probably want to rethink either the shape of your data or your approach for feeding it into geoms.
library(tidyverse)
set.seed(20)
df <- data.frame(expl = rbinom(n=100, size = 1, prob=0.08),
resp = sample(50:100, size = 100, replace = T))
To map expl onto alpha, make it a factor, and then assign that to alpha inside your aes. Then you can set the alpha scale to values of 0.5 and 1.
ggplot(df, aes(x = resp, alpha = as.factor(expl))) +
geom_histogram(fill = "#bebada", bins = 10) +
scale_alpha_manual(values = c(0.5, 1))
However, differentiating by alpha is a little awkward. You could instead map to fill and use light and dark purples:
ggplot(df, aes(x = resp, fill = as.factor(expl))) +
geom_histogram(bins = 10) +
scale_fill_manual(values = c("0" = "mediumpurple1", "1" = "mediumpurple4"))
Note also that you can adjust the position of the histogram bars if you need to, by assigning geom_histogram(position = ...), where you could fill in with something such as "dodge" if that's what you'd like.
If you want a legend on the alpha value, the idea is to include it as an aesthetic rather than as a direct argument as you tried. In order to do this, a simple solution is to enrich the data frame used by ggplot:
df2 <- rbind(
cbind(df, filter="all lines"),
cbind(subset(df, expl == '1'), filter="expl==1")
)
df2 corresponds to df after appending the lines from your subset of interest (with a field filter telling from which copy each record comes)
Then, this solves your problem
ggplot(df2, aes(resp, alpha=filter)) +
geom_histogram(fill="#BEBADA", bins=10, position="identity") +
scale_alpha_discrete(range=c(.5,1))
I am creating some violin plots and want to colour them. It works for a matrix of dimension 9, as in my previous question
ggplot violin plot, specify different colours by group?
but when I increase the dimension to 15, the order of the colours are not respected. Why is this happening?
Here it works (9 columns):
library(ggplot2)
dat <- matrix(rnorm(250*9),ncol=9)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:9,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),3))
pp
Here it does not work (15 columns):
library(ggplot2)
dat <- matrix(rnorm(250*15),ncol=15)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:15,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),5))
pp
This is related to setting factor levels. Since variable_grouping is a character, ggplot2 converts it to a factor for plotting. It uses the default factor order, where 1 always comes before 2. So in your example 11-15 all come before 2 in the legend.
You can manually set the factor order to avoid the default order. I use forcats::fct_inorder() for this because it's convenient in this case where you want the order of the factor to match the order of the variable. Note you can also use factor() directly and set the level order via the levels argument.
ggplot(mat, aes(x = variable, y = value, fill = forcats::fct_inorder(variable_grouping))) +
geom_violin(scale="width", adjust = 1, width = 0.5) +
scale_fill_manual(values=rep(c("red","green","blue"),5)
You can also name the color vector. For example:
my_values <- rep(c("red","green","blue"),5)
names(my_values) <- rep(c("Data1","Data2","Data3"),5)
... +
scale_fill_manual(values=my_values)
I'm trying to plot a geom_histogram where the bars are colored by a gradient.
This is what I'm trying to do:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
ggplot(df,aes_string(x="val",y="..count..+1",fill="val"))+geom_histogram(binwidth=1,pad=TRUE)+scale_y_log10()+scale_fill_gradient2("val",low="darkblue",high="darkred")
But getting:
Any idea how to get it colored by the defined gradient?
Not sure you can fill by val because each bar of the histogram represents a collection of points.
You can, however, fill by categorical bins using cut. For example:
ggplot(df, aes(val, fill = cut(val, 100))) +
geom_histogram(show.legend = FALSE)
Just for completeness.
If the colors I'd like to have the gradient on to be manually selected here's what I suggest:
data:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
colors:
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
cuts <- cut(df$val,bins)
names(cuts) <- sapply(cuts,function(t) cut.cols[which(as.character(t) == levels(cuts))])
plot:
ggplot(df,aes(val,fill=cut(val,bins))) +
geom_histogram(show.legend=FALSE) +
scale_color_manual(values=cut.cols,labels=levels(cuts)) +
scale_fill_manual(values=cut.cols,labels=levels(cuts))
Instead of binning manually another option would be to make use of the bins computed by stat_bin by mapping ..x.. (or factor(..x..) in case of a discrete scale) or after_stat(x) on the fill aesthetic.
An issue with computing the bins manually is that we end up with multiple groups per bin for which the count has to be computed (even if the count is zero most of the time) and which get stacked on top of each other in the histogram. Especially, this gets problematic if one would add labels of counts to the histogram as can be seen in this post, because in that case one ends up with multiple labels per bin.
library(ggplot2)
set.seed(1)
df <- data.frame(id = paste("ID", 1:1000, sep = "."), val = rnorm(1000), stringsAsFactors = F)
ggplot(df, aes(x = val, y = ..count.. + 1, fill = ..x..)) +
geom_histogram(binwidth = .1, pad = TRUE) +
scale_y_log10() +
scale_fill_gradient2(name = "val", low = "darkblue", high = "darkred")
#> Warning: Duplicated aesthetics after name standardisation: pad
I am trying to integrate multiple plot for file bar, where each file has two stack bar plot, and plot easily confused without wrap them into one single grid. However, I intend to improve the result of this plot that add common legend and label for whole graph. I tried several away to integrate multiple plot for each file in more clear way, so putting these into one grid for file bar could be more elegant and easy to understand the output. I confused about the answer from several similar post in SO, bit of new with ggplot2, I couldn't produce my desired plot at the end. Can any give me possible idea to improve this current plot in better way ? How can I add common label and legend for multiple graph ? Any idea please ?
reproducible data.frame :
Qualified <- list(
hotan = data.frame( begin=c(7,13,19,25,31,37,43,49,55,67,79,103,31,49,55,67),
end= c(10,16,22,28,34,40,46,52,58,70,82,106,34,52,58,70),
pos.score=c(11,19,8,2,6,14,25,10,23,28,15,17,6,10,23,28)),
aksu = data.frame( begin=c(12,21,30,39,48,57,66,84,111,30,48,66,84),
end= c(15,24,33,42,51,60,69,87,114,33,51,69,87),
pos.score=c(5,11,15,23,9,13,2,10,16,15,9,2,10)),
korla = data.frame( begin=c(6,14,22,30,38,46,54,62,70,78,6,30,46,70),
end=c(11,19,27,35,43,51,59,67,75,83,11,35,51,75),
pos.score=c(9,16,12,3,20,7,11,13,14,17,9,3,7,14))
)
unQualified <- list(
hotan = data.frame( begin=c(21,33,57,69,81,117,129,177,225,249,333,345,33,81,333),
end= c(26,38,62,74,86,122,134,182,230,254,338,350,38,86,338),
pos.score=c(7,34,29,14,23,20,11,30,19,17,6,4,34,23,6)),
aksu = data.frame( begin=c(13,23,33,43,53,63,73,93,113,123,143,153,183,33,63,143),
end= c(19,29,39,49,59,69,79,99,119,129,149,159,189,39,69,149),
pos.score=c(5,13,32,28,9,11,22,12,23,3,6,8,16,32,11,6)),
korla = data.frame( begin=c(23,34,45,56,67,78,89,122,133,144,166,188,56,89,144),
end=c(31,42,53,64,75,86,97,130,141,152,174,196,64,97,152),
pos.score=c(3,10,19,17,21,8,18,14,4,9,12,22,17,18,9))
)
I am categorzing data and get multiple plot in this way (mainly influenced by #Jake Kaupp's idea) :
multi_plot <- function(x) {
p1 <- ggplot(x, aes(x = group)) +
geom_bar(aes(fill = elm), color = "black")
p2 <- ggplot(distinct(x), aes(x = elm)) +
geom_bar(aes(fill = group), color = "black")
arrangeGrob(p1, p2,nrow = 1, top = unique(x$list))
}
singleDF <-
bind_rows(c(Qualified = Qualified, Unqualified = unQualified), .id = "id") %>%
tidyr::separate(id, c("group", "list")) %>%
mutate(elm = ifelse(pos.score >= 10, "valid", "invalid")) %>%
arrange(list, group, desc(elm))
plot_data <- singleDF %>%
split(.$list) %>%
map(~split_plot(.x))
grid.arrange(grobs = plot_data, nrow = 1)
I am trying to integrate multiple plot for file bar with common label and common legend position. In terms of common legend, I intend to call X axis as sample, Y axis as observation; in terms of common legend position, I intend to indicate legend at right side of plot (only four common legend).
EDIT:
In my desired output plot, stack bar plot of group and elm must be put in one single grid for file bar. Regarding whole graph, pursuing common label and legend is desired.
How can I achieve my desired output ? What change has to be taken in original implementation ? sorry for this simple question in SO. Thanks in advance
combinedDF <-
bind_rows(mutate(singleDF, x = group, fill = elm),
mutate(singleDF, x = elm, fill = group) %>% distinct()) %>%
mutate(x = factor(x, levels = c('invalid', 'valid', 'Unqualified', 'Qualified')),
fill = factor(fill, levels = c('invalid', 'valid', 'Unqualified', 'Qualified')))
ggplot(combinedDF, aes(x = x, fill = fill)) +
geom_bar() +
geom_text(aes(label = ..count..), stat = 'count', position = 'stack') +
facet_grid(~list)