I am creating some violin plots and want to colour them. It works for a matrix of dimension 9, as in my previous question
ggplot violin plot, specify different colours by group?
but when I increase the dimension to 15, the order of the colours are not respected. Why is this happening?
Here it works (9 columns):
library(ggplot2)
dat <- matrix(rnorm(250*9),ncol=9)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:9,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),3))
pp
Here it does not work (15 columns):
library(ggplot2)
dat <- matrix(rnorm(250*15),ncol=15)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- as.character(sort(rep(1:15,250)))
pp <- ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5) + scale_fill_manual(values=rep(c("red","green","blue"),5))
pp
This is related to setting factor levels. Since variable_grouping is a character, ggplot2 converts it to a factor for plotting. It uses the default factor order, where 1 always comes before 2. So in your example 11-15 all come before 2 in the legend.
You can manually set the factor order to avoid the default order. I use forcats::fct_inorder() for this because it's convenient in this case where you want the order of the factor to match the order of the variable. Note you can also use factor() directly and set the level order via the levels argument.
ggplot(mat, aes(x = variable, y = value, fill = forcats::fct_inorder(variable_grouping))) +
geom_violin(scale="width", adjust = 1, width = 0.5) +
scale_fill_manual(values=rep(c("red","green","blue"),5)
You can also name the color vector. For example:
my_values <- rep(c("red","green","blue"),5)
names(my_values) <- rep(c("Data1","Data2","Data3"),5)
... +
scale_fill_manual(values=my_values)
Related
The grouping variable for creating a geom_violin() plot in ggplot2 is expected to be discrete for obvious reasons. However my discrete values are numbers, and I would like to show them on a continuous scale so that I can overlay a continuous function of those numbers on top of the violins. Toy example:
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df) + geom_violin(aes(x=factor(x), y=y))
This works as you'd imagine: violins with their x axis values (equally spaced) labelled 1, 2, and 5, with their means at y=1,2,5 respectively. I want to overlay a continuous function such as y=x, passing through the means. Is that possible? Adding + scale_x_continuous() predictably gives Error: Discrete value supplied to continuous scale. A solution would presumably spread the violins horizontally by the numeric x values, i.e. three times the spacing between 2 and 5 as between 1 and 2, but that is not the only thing I'm trying to achieve - overlaying a continuous function is the key issue.
If this isn't possible, alternative visualisation suggestions are welcome. I know I could replace violins with a simple scatter plot to give a rough sense of density as a function of y for a given x.
The functionality to plot violin plots on a continuous scale is directly built into ggplot.
The key is to keep the original continuous variable (instead of transforming it into a factor variable) and specify how to group it within the aesthetic mapping of the geom_violin() object. The width of the groups can be modified with the cut_width argument, depending on the data at hand.
library(tidyverse)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T),
y = rnorm(1000, mean = x))
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'lm')
By using this approach, all geoms for continuous data and their varying functionalities can be combined with the violin plots, e.g. we could easily replace the line with a loess curve and add a scatter plot of the points.
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(group = cut_width(x, 1)), scale = "width") +
geom_smooth(method = 'loess') +
geom_point()
More examples can be found in the ggplot helpfile for violin plots.
Try this. As you already guessed, spreading the violins by numeric values is the key to the solution. To this end I expand the df to include all x values in the interval min(x) to max(x) and use scale_x_discrete(drop = FALSE) so that all values are displayed.
Note: Thanks #ChrisW for the more general example of my approach.
library(tidyverse)
set.seed(42)
df <- tibble(x = sample(c(1,2,5), size = 1000, replace = T), y = rnorm(1000, mean = x^2))
# y = x^2
# add missing x values
x.range <- seq(from=min(df$x), to=max(df$x))
df <- df %>% right_join(tibble(x = x.range))
#> Joining, by = "x"
# Whatever the desired continuous function is:
df.fit <- tibble(x = x.range, y=x^2) %>%
mutate(x = factor(x))
ggplot() +
geom_violin(data=df, aes(x = factor(x, levels = 1:5), y=y)) +
geom_line(data=df.fit, aes(x, y, group=1), color = "red") +
scale_x_discrete(drop = FALSE)
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).
Created on 2020-06-11 by the reprex package (v0.3.0)
I have a matrix of 9 columns and I want to create a violin plot using ggplot2. I would like to have different colours for groups of three columns, basically increasing order of "grayness". How can I do this?
I have tried imputing lists of colours on the option "fill=" but it does not work. See my example below. At the moment, it indicates "gray80", but I want to be able to specify the colour for each violin plot, in order to be able to specify the colour for groups of 3.
library(ggplot2)
dat <- matrix(rnorm(100*9),ncol=9)
# Violin plots for columns
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
pp <- ggplot(mat, aes(x = variable, y = value)) + geom_violin(scale="width",adjust = 1,width = 0.5,fill = "gray80")
pp
We can add a new column, called variable_grouping to your data, and then specify fill in aes:
mat <- reshape2::melt(data.frame(dat), id.vars = NULL)
mat$variable_grouping <- ifelse(mat$variable %in% c('X1', 'X2', 'X3'), 'g1',
ifelse(mat$variable %in% c('X4','X5','X6'),
'g2', 'g3'))
ggplot(mat, aes(x = variable, y = value, fill = variable_grouping)) +
geom_violin(scale="width",adjust = 1,width = 0.5)
You can control the groupings using the ifelse statement. scale_fill_manual can be used to specify the different colors used to fill the violins.
I'm trying to plot a geom_histogram where the bars are colored by a gradient.
This is what I'm trying to do:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
ggplot(df,aes_string(x="val",y="..count..+1",fill="val"))+geom_histogram(binwidth=1,pad=TRUE)+scale_y_log10()+scale_fill_gradient2("val",low="darkblue",high="darkred")
But getting:
Any idea how to get it colored by the defined gradient?
Not sure you can fill by val because each bar of the histogram represents a collection of points.
You can, however, fill by categorical bins using cut. For example:
ggplot(df, aes(val, fill = cut(val, 100))) +
geom_histogram(show.legend = FALSE)
Just for completeness.
If the colors I'd like to have the gradient on to be manually selected here's what I suggest:
data:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
colors:
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
cuts <- cut(df$val,bins)
names(cuts) <- sapply(cuts,function(t) cut.cols[which(as.character(t) == levels(cuts))])
plot:
ggplot(df,aes(val,fill=cut(val,bins))) +
geom_histogram(show.legend=FALSE) +
scale_color_manual(values=cut.cols,labels=levels(cuts)) +
scale_fill_manual(values=cut.cols,labels=levels(cuts))
Instead of binning manually another option would be to make use of the bins computed by stat_bin by mapping ..x.. (or factor(..x..) in case of a discrete scale) or after_stat(x) on the fill aesthetic.
An issue with computing the bins manually is that we end up with multiple groups per bin for which the count has to be computed (even if the count is zero most of the time) and which get stacked on top of each other in the histogram. Especially, this gets problematic if one would add labels of counts to the histogram as can be seen in this post, because in that case one ends up with multiple labels per bin.
library(ggplot2)
set.seed(1)
df <- data.frame(id = paste("ID", 1:1000, sep = "."), val = rnorm(1000), stringsAsFactors = F)
ggplot(df, aes(x = val, y = ..count.. + 1, fill = ..x..)) +
geom_histogram(binwidth = .1, pad = TRUE) +
scale_y_log10() +
scale_fill_gradient2(name = "val", low = "darkblue", high = "darkred")
#> Warning: Duplicated aesthetics after name standardisation: pad
I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)
I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)
To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.
Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen
Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)
When I use geom_tile() with ggplot2 and discrete scales the labels are in ascending order on the x-axis and in descending order on the y-axis:
#some sample data
a <- runif(400)
a <- matrix(a, ncol=20)
colnames(a) <- letters[seq( from = 1, to = 20 )]
rownames(a) <- letters[seq( from = 1, to = 20 )]
a <- melt(a)
When I plot the dataframe a this comes out:
ggplot(a, aes(X1, X2, fill = value)) + geom_tile() +
scale_fill_gradient(low = "white", high = "black", breaks=seq(from=0, to=1, by=.1), name="value") +
opts(axis.text.x=theme_text(angle=-90, hjust=0)) +
scale_x_discrete(name="") + scale_y_discrete(name="")
and the coords are labeled differently for x and y:
I would like to have the labels sorted from a-z from top to bottom and from left to right. is there a quick way to do this?
The important point here is the order of the factor levels. The order in the levels is also the order in the plot. You can use rev to reverse the order of the levels like this (note that I just reorder one column in a data.frame):
df$X1 = with(df, factor(X1, levels = rev(levels(X1))))
Use this syntax to reorder your factors as needed.
For the cases where you would prefer to not modify the order of the factor in underlying data, you can get the same result using the limits argument to scale_y_discrete:
ggplot(a, aes(X1, X2, fill = value)) +
geom_tile() +
scale_y_discrete(name="", limits = rev(levels(a$X2)))
Giving this output: