Grouping scale_fill_gradient/continuous grouped bar chart - r

I am trying to make a grouped bar chart in which the bars are colored based on one variable(binary/ e.g. Group 1 and group2), and then the transparency of the bars are based on another value(continuous/ e.g. p-value), but I want the transparency to be specific to each groups color, and I want the gradient to and legend to be continuous.
I have been able to get close using the color, group, and fill options in geom_bar. You will see that I can get the over all gradient to work and the outlines of the bars are colored correctly. But I would like the fill to be the colors of the outlines and retain the transparency. I also tried using scale_alpha, which maps the transparencies correctly, but doesn't produce a continuous legend.
Here is a small data set like the one I am working with
## data set
d <- data.frame(ID = rep(c(123, 456), 2),
description = rep(c("cancer", "infection"), 2),
variable = c("G2", "G2", "G1", "G1"),
value = c(1.535709, 1.582127, 4.093683, 4.658328),
pvals = c(9.806872e-12, 1.160182e-09, 3.179635e-05, 1.132216e-04))
Here is the ggplot code
ggplot(d, aes(x=reorder(description, -pvals), y=value)) +
geom_bar(stat="identity", aes(col=variable, group=variable, fill=pvals), position="dodge") +
ylim(0, max(d$value) + 0.6) + xlab("") +
coord_flip() +
scale_fill_brewer(palette = "Set1",
name="",
breaks=c("G1", "G2"),
labels=c("Group 1", "Group 2")) +
scale_fill_continuous(trans = 'log10') # I am using log10 transformation because I have many small p-values and this makes the shading look better
Here is attempt 2 where the fill works but the legend does not.
ggplot(d, aes(x=reorder(description, -pvals), y=value)) +
geom_bar(stat="identity", aes(fill=variable, alpha = pvals), position="dodge") +
ylim(0, max(d$value) + 0.6) + xlab("") +
coord_flip() +
scale_fill_brewer(palette = "Set1",
name="",
breaks=c("G1", "G2"),
labels=c("G1", "G2")) +
scale_alpha(trans = "log10")

I've come up with an ugly hack, but it works so here we are. The idea is to first plot your plot as you would per usual, take the layer data and use that as input in a new plot. In this new plot, we make two layers for G1 and G2 and use the ggnewscales package to map these layers to different aesthetics. There are a few caveats I'll warn about.
First, we'll make a plot and save it as a variable:
g <- ggplot(d, aes(x=reorder(description, -pvals), y=value)) +
geom_bar(stat="identity", aes(col=variable, group=variable, fill=pvals), position="dodge") +
ylim(0, max(d$value) + 0.6) + xlab("") +
coord_flip() +
scale_fill_brewer(palette = "Set1",
name="",
breaks=c("G1", "G2"),
labels=c("Group 1", "Group 2")) +
scale_fill_continuous(trans = 'log10')
Next, we'll take the coordinates of this layers data and match them back to the original data. Note that this highly dependent on having unique y-values in your original plot, but I suppose you could also figure this out in other ways.
ld <- layer_data(g)
ld <- ld[, c("xmin", "xmax", "ymin", "ymax")]
# Match back to original data
matches <- match(ld$ymax, d$value)
# Supplement with original data
ld$pvals <- log10(d$pvals[matches])
ld$descr <- d$description[matches]
ld$vars <- d$variable[matches]
Now we'll make a new plot with geom_rects as layers, separated by the vars. In between these layers, we the first fill scale for G1 and use the new_scale_fill() afterwards. Afterwards, we'll do the second geom_rect() and the second fill scale. Then we'll muddle around with the x-axis to have it resemble the original plot somewhat.
library(ggnewscale)
ggplot(mapping = aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax)) +
geom_rect(data = ld[ld$vars == "G1", ], aes(fill = pvals)) +
scale_fill_gradient(low = "red", high = "transparent",
limits = c(min(ld$pvals), 0),
name = "Log10 P-values G1") +
new_scale_fill() +
geom_rect(data = ld[ld$vars == "G2", ], aes(fill = pvals)) +
scale_fill_gradient(low = "blue", high = "transparent",
limits = c(min(ld$pvals), 0),
name = "Log10 P-values G2") +
scale_x_continuous(breaks = seq_along(unique(d$description)),
labels = c("cancer", "infection")) +
coord_flip()
And that's the ugly hack. I might have the x-axis labels wrong, but I've found no elegant way to automatically reproduce the x-axis labels without the code getting too long.
Note: ggnewscales is known to throw errors in older versions of R, but if you use the github version they've fixed that error.

To make the script less verbose and the output is shown below if that is what you're after.
library(ggplot2)
base <- ggplot(d, aes(reorder(description, -pvals), value)) + geom_bar(stat = "identity", aes(col=variable, group=variable, fill=pvals), position = "dodge")
base_axes_flip <- base + ylim(0, max(d$value) + 0.6) + xlab("") + coord_flip()
bax_color <- base_axes_flip + scale_color_manual(values=c('#800020','#00FFFF'),
name="",
breaks=c("G1", "G2"),
labels=c("Group 1", "Group 2"))
# Note here the scale_color_manual
bax_color + scale_fill_continuous(trans = 'log10')
This produces the following output and hope it helps.

Related

How to flip a geom_area to be under the line when using scale_y_reverse()

I had to flip the axis of my line, but still need the geom_area to be under the curve. However I cannot figure out how to do so.
This is the line of code I tried
ggplot(PalmBeachWell, aes(x=Date, y=Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_area(position= "identity", fill='lightblue') +
theme_classic() +
geom_line(color="blue") +
scale_y_reverse()
and here is what i got
One option would be to use a geom_ribbon to fill the area above the curve which after applying scale_y_reverse will result in a fill under the curve.
Using some fake example data based on the ggplot2::economics dataset:
library(ggplot2)
PalmBeachWell <- economics[c("date", "psavert")]
names(PalmBeachWell) <- c("Date", "Depth.to.Water.Below.Land.Surface.in.ft.")
ggplot(PalmBeachWell, aes(x = Date, y = Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_ribbon(aes(ymin = Depth.to.Water.Below.Land.Surface.in.ft., ymax = Inf),
fill = "lightblue"
) +
geom_line(color = "blue") +
scale_y_reverse() +
theme_classic()

Problem: qqplot legend different linetypes

legend <- c("score" = "black", "answer" = "red")
plot <- df_l %>% ggplot(aes(date, score, color = "score")) + geom_line() +
geom_vline(aes(xintercept = getDate(df_all %>% filter(name == List[5])), color = "answer"), linetype = "dashed", size = 1,) +
scale_color_manual(name = "Legend", values = legend) +
scale_x_date(labels = date_format("%m/%y"), breaks = date_breaks("months")) +
theme(axis.text.x = element_text(angle=45)) +
labs(title = "", x = "", y = "", colors = "Legend")
I get the result above and could not figure out how to resolve the problem that in the legend always both lines are mixed up. One legend should of course show the slim black line only and the other the dashed black line. Thanks in advance!
The issue you have is that geom_vline results in a legend item that is a vertical line and geom_line gives you a horizontal line item. One solution is to create the legend kind of manually by specifying the color= aesthetic in geom_line... but not in geom_vline. You can then create a kind of "dummy" geom with geom_blank that serves as a holding object for the aesthetics of color=. You can then specify the colors for both of those items via scale_color_manual. Here's an example:
set.seed(12345)
df <- data.frame(x=1:100,y=rnorm(100))
ggplot(df, aes(x,y)) + theme_bw() +
geom_line(aes(color='score')) +
geom_vline(aes(xintercept=4), linetype=2, color='red', show.legend = FALSE) +
geom_blank(aes(color='my line')) +
scale_color_manual(name='Legend', values=c('my line'='red','score'='black'))
That creates the one legend for color... but unfortunately "my line" is solid red, when it should be dashed. To fix that, you just apply the linetype= aesthetic in the same way.
ggplot(df, aes(x,y)) + theme_bw() +
geom_line(aes(color='score', linetype='score')) +
geom_vline(aes(xintercept=4), linetype=2, color='red', show.legend = FALSE) +
geom_blank(aes(color='my line', linetype='my line')) +
scale_linetype_manual(name='Legend', values=c('my line'=2,'score'=1)) +
scale_color_manual(name='Legend', values=c('my line'='red','score'='black'))

Change the scale of x axis in ggplot

I have a ggplot bar and don't know how to change the scale of the x axis. At the moment it looks like on the image below. However I'd like to reorder the scale of the x axis so that 21% bar is higher than the 7% bar. How could I get the % to the axis? Thanks in advance!
df= data.frame("number" = c(7,21), "name" = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Use the prop.table function to in y variable in the geom plot.
ggplot(df, aes(x=name, y=100*prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total")
If you want to have the character, % in the y axis, you can add scale_y_continuous to the plot as below:
library(scales)
ggplot(df, aes(x=name, y=prop.table(number))) +
geom_bar(stat="identity", fill = "blue") +
xlab("Stichprobe") + ylab("Paketmenge absolut") +
ggtitle("Menge total") +
scale_y_continuous(labels=percent)
The only way I am able to duplicate the original plot is, as #sconfluentus noted, for the 7% and 21% to be character strings. As an aside the data frame column names need not be quoted.
df= data.frame(number = c('7%','21%'), name = c("x","y"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title")
Changing the numbers to c(0.07, 0.21) and adding, as #Mohanasundaram noted, scale_y_continuous(labels = scales::percent) corrects the situation:
To be pedantic using breaks = c(0.07, 0.21) creates nearly an exact duplicate. See also here.3
Hope this is helpful.
library(ggplot2)
library(scales)
df= data.frame(number = c(0.07,0.21), name = c("KG","MS"))
df
ggplot(df, aes(x=name, y=number)) +
geom_bar(stat="identity", fill = "blue") + xlab("Title") + ylab("Title") +
ggtitle("Title") + scale_y_continuous(labels = scales::percent, breaks = c(.07, .21)))

Breaks in scale_x_continuous doesn't seem to work

I am simply trying to show the breaks on the x axis of a plot (5,4,3,2,1,.5) but it will not show the .5. When I tried to code below, it resulted in not showing any x marks whatsoever and I don't know why.
labels <- c(5,4,3,2,1,.5)
ggplot(kobe_vs_kawhi, aes(desc(Time_Left), FG_Percentage, color = Player)) +
geom_point() +
geom_smooth() +
scale_x_continuous(breaks = labels) +
scale_color_manual(values = c("red4", "gold2"))
Difficult to answer without data but if you want all of those x-axis values displayed and in reverse try:
ggplot(kobe_vs_kawhi, aes(Time_Left, FG_Percentage, color = Player)) +
geom_point() +
geom_smooth() +
scale_x_reverse(breaks = labels) +
expand_limits(x = c(0, 5) +
scale_color_manual(values = c("red4", "gold2"))
Note that there is no need to sort Time_Left or for labels to be in reverse order using this approach.

Grouped scatterplot over grouped boxplot in R using ggplot2

I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.
However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:
library(scales)
library(ggplot2)
# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900),
rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
value <- sqrt(value*value)
Tdata <- cbind(Gene, Clone, variable)
Tdata <- data.frame(Tdata)
Tdata <- cbind(Tdata,value)
# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")
lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
pch=15)
lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Grouped-Wrong Symbols.png")
#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
aes(shape=Clone))
lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Ungrouped-Right Symbols.png")
If anyone has any suggestions I'd really appreciate it.
Thank you
Nathan
To get the boxplots to appear, the shape aesthetic needs to be inside geom_point, rather than in the main call to ggplot. The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot. However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone. Since there's only one row of data for each combination of variable and Clone, no boxplot is produced.
That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point.
Then, to get the points to appear with the correct boxplot, we need to group by Gene. I also added theme_classic to make it easier to see the plot (although it's still very busy):
ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7,
aes(shape=Clone, group=Gene)) +
scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x))) +
theme_classic()
I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable. Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version.
library(dplyr)
ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)),
aes(x=gsub("Day","",variable), y=value)) +
stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y = "Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging:
set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
Gene = gsub("Gene","Gene ", Gene)),
aes(x=Day - 2, y=value, group=Day)) +
stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
position=position_jitter(width=1, height=0)) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y="Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
One more thing: For future reference, you can simplify your data creation code:
Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24),
sd=rep(c(0.5,8,900,9000,3000), each=24))
Tdata = data.frame(Gene, Clone, variable, value)

Resources