ggplot2- geom_linerange with stat_smooth - r

Oh wise ones: I've got a question about the use of geom_linerange(), attached is what I hope is a workable example to illustrate my problem.
b=c(100,110,90,100,120,130,170,150,150,120,140,150,120,90,90,100,40,50,40,40,20,60,30)
test<-data.frame(a=c(2,2,2,4,4,4,4,6,6,6,6,6,6,8,8,8,10,10,10,10,10,10,10),
b=b,c=c(b-15))
testMelt <- melt(
test,
id = c("a"),
measured = c("b", "c")
)
p <- ggplot(
aes(
x = factor(a),
y = value,
fill= variable
),
data = testMelt) +
geom_boxplot() +
stat_smooth(aes(group=variable,x=factor(a),y=value,fill=factor(variable)),data=testMelt)
My actual dataset is much larger, and the boxplots are a bit overwhelming. I think what I want is to use geom_linerange() somehow to show the range of the data, at "b" and "c", at each value of "a".
The best I've come up with is:
p<- p+ geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable))
I can assume the "c" values are always equal to or less than "b", but if the range is smaller, this "covers it up". Can I jitter the lines somehow? Is there a better solution?

In your geom_linerange call, add an additional argument position=position_dodge(width=0.3). You can adjust the absolute width to change the separation between the vertical lines.

My understanding of the question is that you want the line range to reflect the range for the combination a:b:c.
geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable)) will set the minimum value to the whole-dataset minimum (hence all the lines appear with the same minimum value.
A couple of solutions.
Calculate the minima and maxima yourself
test_range <- ddply(testMelt, .(a,variable), summarize,
val_min = min(value), val_max = max(value))
then run
ggplot(data = testMelt) +
geom_boxplot(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable))) +
geom_linerange(data = test_range, aes(x = as.factor(a), ymin = val_min,
ymax = val_max, color = variable),
position = position_dodge(width = 0.3))
Or, for an alternative to boxplots / line range use a violin plot.
ggplot(data = testMelt) +
geom_violin(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable)))

Related

R: How to fill geom_hex with a numerical value and heat scale it?

I was wondering how I can scale geom_hex not on count, but rather by a variable and heat scale it? I am also having overfitting in my actual model and was wondering how to eliminate that? Here's an examples:
'''
ggplot(data = diamonds)+
geom_hex(mapping = aes(x = x, y = price, fill = depth, bins =
25))+
scale_fill_continuous(type = "viridis")
'''
Thanks!
I think this will do the trick, assuming you want to colour the hexagons according to the mean of depth...
ggplot(diamonds, aes(x = x, y = price, z = depth)) +
stat_summary_hex(fun = mean, bins = 25) +
scale_fill_continuous(type = "viridis")

How do I line up my error bars with my bars in ggplot?

I'm creating a bar chart with a pattern for a subset of the bars, and I want to add error bars.
However, I'm having trouble lining up the error bars with with the bar charts—I want to have them appear centered on each bar. How do I do this? Moreover, the legend currently does not clearly distinguish the striped and non-striped bars as corresponding to not treated and treated groups.
Finally, I'd like to create version of this plot which stacks adjacent bars (i.e. bars within each facet_grid)—any tips on how to do that would be much appreciated.
The code I'm using is:
library(ggplot2)
library(tidyverse)
library(ggpattern)
models = c("a", "b")
task = c("1","2")
ratios = c(0.3, 0.4)
standard_errors = c(0.02, 0.02)
ymax = ratios + standard_errors
ymin = ratios - standard_errors
colors = c("#F39B7FFF", "#8491B4FF")
df <- data.frame(task = task, ratios = ratios)
df <- df %>% mutate(filler = 1-ratios)
df <- df %>% gather(key = "obs", value = "ratios", -1)
df$upper <- df$ratios + c(standard_errors,standard_errors)
df$models <- c(models,models)
df$lower <- df$ratios - c(standard_errors,standard_errors)
df$col <- c(colors,colors)
df$group <- paste(df$task, df$models, sep="-")
df$treated <- "yes"
df[df$ratios<0.5,]$treated = "no"
p <- ggplot(df, aes(x = group, y = ratios, fill = col, ymin = lower, ymax = upper)) +
stat_summary(aes(pattern=treated),
fun = "mean", position=position_dodge(),
geom = "bar_pattern", pattern_fill="black", colour="black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, position=position_dodge(0.9)) +
scale_pattern_manual(values=c("none", "stripe"))+ #edited part
facet_grid(.~task,
scales = "free_x", # Let the x axis vary across facets.
space = "free_x", # Let the width of facets vary and force all bars to have the same width.
switch = "x") + guides(colour = guide_legend(nrow = 1)) +
guides(fill = "none")
p
Here is an option
df %>%
ggplot(aes(x = models, y = ratios)) +
geom_col_pattern(
aes(fill = col, pattern = treated),
pattern_fill = "black",
colour = "black",
pattern_key_scale_factor = 0.2,
position = position_dodge()) +
geom_errorbar(
aes(ymin = lower, ymax = upper, group = interaction(task, treated)),
width = 0.2,
position = position_dodge(0.9)) +
facet_grid(~ task, scales = "free_x") +
scale_pattern_manual(values = c("none", "stripe")) +
scale_fill_identity()
A few comments:
I don't understand the point of creating group. IMO this is unnecessary. TBH, I also don't understand the point of models and task: if task = "1" then models = "a"; if task = "2" then models = "b"; so task and models are redundant as they encode the same thing (whether you call it "1"/"2" or "a"/"b").
The reason why you (originally) didn't see a pattern in the legend is because of the scale factor in the legend key. As per ?scale_col_pattern, you can adjust this with the pattern_key_scale_factor parameter. Here, I've chosen pattern_key_scale_factor = 0.2 but you may want to play with different values.
The reason why the error bars didn't align with the dodged bars was because geom_errorbar didn't know that there are different task-treated combinations. We can fix this by explicitly defining a group aesthetic given by the combination of task & treated values. The reason why you don't need this in geom_col_pattern is because you already allow for different treated values through the pattern aesthetic.
You want to use scale_fill_identity() if you already have actual colour values defined in the data.frame.

Smoothen Heatmap in ggplot

I have a dataframe that looks as follows:
X = c(6,6.2,6.4,6.6,6.8,5.6,5.8,6,6.2,6.4,6.6,6.8,7,7.2,7.4,7.6,7.8,8,2.8,3,3.2,3.4,3.6,3.8,4,4.2,4.4,4.6,4.8,5)
Y = c(2.2,2.2,2.2,2.2,2.2,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.6,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8)
Value = c(0,0.00683254,0,0.007595654,0.015517884,0,0,0,0,0,0,0,0,0,0.005219395,0,0,0,0,0,0,0,0,0,0,0,0.002892342,0,0.002758141,0)
table = data.frame(X, Y, Value)
I have put together a heatmap in R, based on the following command:
ggplot(data = table, mapping = aes(x = X, y = Y)) +
geom_tile(aes(fill = Value), colour = 'black') +
theme_void() +
scale_fill_gradient2(low = "white", high = "black") + xlab(label = "X") + ylab(label = "Y")
Since there is not a value for every X and Y, it leads to plots that appear as follows.
I am attempting to smoothen the plot and have the following question:
As there are small white spaces between the plotted values, how could one color these white spaces to be the median intensity? Said differently, how would I first create an initial layer with non-zero median 'Value' before plotting the non-zero 'Value' on top (overlayed)?
A sample is shown below, which has been 'smoothed', which looks closer to the desired output.
I'm not sure if it will totally fit your need but from my understanding you have some missing values and combination of X and Y.
So, you can use complete function from tidyr to get all different combinations of X and Y (those without values will be filled with NA) and then by using na.value argument in scale_fill_gradient2 function, you can set the values of these NA values to the same color of the midpoint value:
library(tidyr)
library(dplyr)
library(ggplot2)
table %>% complete(X,Y) %>%
ggplot(aes(x = X, y = Y))+
geom_raster(aes(fill = Value), interpolate = TRUE)+
scale_fill_gradient2(low = "white", mid = "grey",high = "black",
na.value = "grey")
Does it answer your question ?

ggplot: remove NA factor level in legend

How can I omit the NA level of a factor from a legend?
From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:
library(nycflights13); library(ggplot2)
flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,
c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
max(flights$tot_delay, na.rm = TRUE)),
labels = c("none", "short","medium","long"))
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:
# default
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4)
# with na.translate = F
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4) +
scale_colour_discrete(na.translate = F)
This works in ggplot2 3.1.0.
You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:
filter(flights, !is.na(delay_class)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
does the trick:
Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_manual( breaks = c("none","short","medium","long"),
values = scales::hue_pal()(4) )
UPDATE: As pointed out in #gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_discrete(na.translate=FALSE)
I like #Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:
na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

ggplot2, error in filling the area under lines

I have this data set and I want to fill the area under each line. However I get an error saying:
Error: stat_bin() must not be used with a y aesthetic.
Additionally, I need to use alpha value for transparency. Any suggestions?
library(reshape2)
library(ggplot2)
dat <- data.frame(
a = rnorm(12, mean = 2, sd = 1),
b = rnorm(12, mean = 4, sd = 2),
month = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"))
dat$month <- factor(dat$month,
levels = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"),
ordered = TRUE)
dat <- melt(dat, id="month")
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(stat ="bin")
I want to fill the area under each line
This means we will need to specify the fill aesthetic.
I get an error saying "Error: stat_bin() must not be used with a y aesthetic."
This means we will need to delete your stat ="bin" code.
Additionally, I need to use alpha value for transparency.
This means we need to put alpha = <some value> in the geom_area layer.
Two other things: (1) since you have a factor on the x-axis, we need to specify a grouping so ggplot knows which points to connect. In this case we can use variable as the grouper. (2) The default "position" of geom_area is to stack the areas rather than overlap them. Because you ask about transparency I assume you want them overlapping, so we need to specify position = 'identity'.
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(aes(fill = variable, group = variable),
alpha = 0.5, position = 'identity')
To get lines across categorical variables, use the group aesthetic:
ggplot(data = dat, aes(x = month, y = value, colour = variable, group = variable)) +
#geom_line(position = 'stack') + # redundant, but this is where lines are drawn
geom_area(alpha = 0.5)
To change the color inside, use the fill aesthetic.

Resources