ggplot2, error in filling the area under lines - r

I have this data set and I want to fill the area under each line. However I get an error saying:
Error: stat_bin() must not be used with a y aesthetic.
Additionally, I need to use alpha value for transparency. Any suggestions?
library(reshape2)
library(ggplot2)
dat <- data.frame(
a = rnorm(12, mean = 2, sd = 1),
b = rnorm(12, mean = 4, sd = 2),
month = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"))
dat$month <- factor(dat$month,
levels = c("JAN","FEB","MAR",'APR',"MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC"),
ordered = TRUE)
dat <- melt(dat, id="month")
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(stat ="bin")

I want to fill the area under each line
This means we will need to specify the fill aesthetic.
I get an error saying "Error: stat_bin() must not be used with a y aesthetic."
This means we will need to delete your stat ="bin" code.
Additionally, I need to use alpha value for transparency.
This means we need to put alpha = <some value> in the geom_area layer.
Two other things: (1) since you have a factor on the x-axis, we need to specify a grouping so ggplot knows which points to connect. In this case we can use variable as the grouper. (2) The default "position" of geom_area is to stack the areas rather than overlap them. Because you ask about transparency I assume you want them overlapping, so we need to specify position = 'identity'.
ggplot(data = dat, aes(x = month, y = value, colour = variable)) +
geom_line() +
geom_area(aes(fill = variable, group = variable),
alpha = 0.5, position = 'identity')

To get lines across categorical variables, use the group aesthetic:
ggplot(data = dat, aes(x = month, y = value, colour = variable, group = variable)) +
#geom_line(position = 'stack') + # redundant, but this is where lines are drawn
geom_area(alpha = 0.5)
To change the color inside, use the fill aesthetic.

Related

How to control the order of ggplot2::geom_pointrange elements by colour, shape and linetype

data=data.frame("X"=c(22,5,8,17,7,22),
"XMIN"=c(17.6,4,6.4,13.6,5.6,17.6),
"XMAX"=c(26.4,6,9.6,20.4,8.4,26.4),
"VAR"=c('A','B','C','A','B','C'),
"L1"=c(1,2,3,1,2,3),
"L2"=c(1,1,1,2,2,2))
ggplot(data) +
geom_pointrange(aes(
ymin = XMIN,
ymax = XMAX,
y = X,
x = reorder(VAR, -X),
colour = factor(L1),
shape = factor(L1),
linetype = factor(L2)))
I wish to add space between the lines for each variable A,B,C. Also within (A,B,C) for each variable I wish to sort the line from lowest to highest by X value.
See photo here,enter image description here
This seems to do the trick:
Updated in response to comments so that variables L1 and L2 control the colour, shape and linetype aesthetics.
The really tricky problem was overcoming the conflict between the order imposed by using factor(L2) and that wished for as a combination of VAR and X.
The axis order will trump the linetype order where the x values are distinct.
So created a continuous variable x_loc to locate observations on the x axis which are then re-labelled with the required values from VAR.
library(ggplot2)
library(dplyr)
data=data.frame("X"=c(22,5,8,17,7,22),
"XMIN"=c(17.6,4,6.4,13.6,5.6,17.6),
"XMAX"=c(26.4,6,9.6,20.4,8.4,26.4),
"VAR"=c('A','B','C','A','B','C'),
"L1"=c(1,2,3,1,2,3),
"L2"=c(1,1,1,2,2,2))
# reorder the data to be in the right plotting order: grouping by VAR with X in ascending order, then everything follows quite nicely.
data1 <-
data %>%
arrange(VAR, X) %>%
mutate(x_loc = c(0.8, 1.1) + rep(0:2, each = 2))
data1
ggplot(data1) +
labs(x = "VAR") +
scale_x_continuous(breaks = 1:3, labels = data1$VAR[1:3])+
theme_minimal()+
theme(legend.position = "top")+
geom_pointrange(aes(
ymin = XMIN,
ymax = XMAX,
y = X,
x = x_loc,
linetype = factor(L2),
colour = factor(L1),
shape = factor(L1)))
Which results in:
Note: for some reason I do not fully understand adding additional ggplot layers after the geom_pointrange function resulted in revealing the list elements of the 'ggplot' layer. Something to follow up another time.

plotting stacked points using ggplot

I have a data frame and I would like to stack the points that have overlaps exactly on top of each other.
here is my example data:
value <- c(1.080251e-04, 1.708859e-01, 1.232473e-05, 4.519876e-03,2.914256e-01, 5.869711e-03, 2.196347e-01,4.124873e-01, 5.914052e-03, 2.305623e-03, 1.439013e-01, 5.407597e-03, 7.530298e-02, 7.746897e-03)
names = letters[1:7]
data <- data.frame(names = rep(names,), group = group, value = value, stringsAsFactors = T)
group <- c(rep("AA", 7) , rep("BB", 7))
I am using the following command:
p <- ggplot(data, aes(x = names, y = "", color = group)) +
geom_point(aes(size = -log(value)), position = "stack")
plot(p)
But the stacked circle outlines out of the grid. I want it close or exactly next to the bottom circle. do you have any idea how I can fix the issue?
Thanks,
The y-axis has no numeric value, so use the group instead. And we don't need the color legend now since the group labels are shown on the y-axis.
ggplot(data, aes(x = names, y = group, color = group)) +
geom_point(aes(size = -log(value))) +
guides(color=FALSE)

Why does ggplot2 allow grouping by invalid aesthetic mappings?

I'm having trouble understanding how grouping works in ggplot. Suppose I have a dataframe with date, value, categorical, and a dichotomous variable like so:
library(ggplot2)
set.seed(42)
data <- data.frame(date = sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), by="day"), 20),
values = rnorm(20, 0, 1),
categories = sample(c("a", "b", "c"), 20, replace = TRUE))
data$pre_post <- ifelse(data$date <= '2018/01/01', "pre", "post")
If I group by the 'pre_post' variable (dichotomous) using aes(group =), I get:
ggplot(data, aes(x = date, y = values, color = categories, group = pre_post)) +
geom_line()
Using aes(by =), the results change to:
ggplot(data, aes(x = date, y = values, color = categories, by = pre_post)) +
geom_line()
In my current use case, 'by=' gives me the desired results but I don't know how to explain what exactly it is doing and why. Especially since, as #markus points out that 'by=' isn't even a valid aesthetic and that replacing it with 'foo=' would do the same thing:
ggplot(data, aes(x = date, y = values, color = categories, foo = pre_post)) +
geom_line()
If I understand the examples and explanations in Aesthetics: grouping of the ggplot2 documention correctly, the group aesthetic maps a different line for each subject.
group = pre_post seems to take precedence over the grouping by the color aesthetic enforcing to draw a single line connecting all "pre" data points and another line connecting all "post" data points.
In order to get six different lines for each combination of categories and pre_post you may follow Axeman's suggestion to use interaction():
ggplot(data, aes(x = date, y = values, color = categories, group = interaction(categories, pre_post))) +
geom_line()
or you may use a different aesthetic like linetype:
ggplot(data, aes(x = date, y = values, color = categories, linetype = pre_post)) +
geom_line()
Using an undefined aesthetic like by or foo causes the data to be grouped as well but without effect on the aesthetical appearance.
However, grouping a line chart may lead to a loss of information as can be seen here:
ggplot(data, aes(x = date, y = values, color = categories)) +
geom_line()
Without grouping by pre_post the blue line extends into 2018 which was not visible on the previous plots.
To avoid the loss of information I prefer to plot the data points as well as the lines in order to mark the end points of the line segments or to show a single data point:
ggplot(data, aes(x = date, y = values, color = categories, linetype = pre_post, shape = pre_post)) +
geom_line() +
geom_point() +
geom_label(aes(label = date), data = data[data$categories == "c" & data$pre_post == "post", ],
hjust = -0.1, vjust = -0.1, show.legend = FALSE)
geom_label() is optional and only used in this post to highlight the single data point.

ggplot2: how to add sample numbers to density plot?

I am trying to generate a (grouped) density plot labelled with sample sizes.
Sample data:
set.seed(100)
df <- data.frame(ab.class = c(rep("A", 200), rep("B", 200)),
val = c(rnorm(200, 0, 1), rnorm(200, 1, 1)))
The unlabelled density plot is generated and looks as follows:
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
What I want to do is add text labels somewhere near the peak of each density, showing the number of samples in each group. However, I cannot find the right combination of options to summarise the data in this way.
I tried to adapt the code suggested in this answer to a similar question on boxplots: https://stackoverflow.com/a/15720769/1836013
n_fun <- function(x){
return(data.frame(y = max(x), label = paste0("n = ",length(x))))
}
ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4) +
stat_summary(geom = "text", fun.data = n_fun)
However, this fails with Error: stat_summary requires the following missing aesthetics: y.
I also tried adding y = ..density.. within aes() for each of the geom_density() and stat_summary() layers, and in the ggplot() object itself... none of which solved the problem.
I know this could be achieved by manually adding labels for each group, but I was hoping for a solution that generalises, and e.g. allows the label colour to be set via aes() to match the densities.
Where am I going wrong?
The y in the return of fun.data is not the aes. stat_summary complains that he cannot find y, which should be specificed in global settings at ggplot(df, aes(x = val, group = ab.class, y = or stat_summary(aes(y = if global setting of y is not available. The fun.data compute where to display point/text/... at each x based on y given in the data through aes. (I am not sure whether I have made this clear. Not a native English speaker).
Even if you have specified y through aes, you won't get desired results because stat_summary compute a y at each x.
However, you can add text to desired positions by geom_text or annotate:
# save the plot as p
p <- ggplot(df, aes(x = val, group = ab.class)) +
geom_density(aes(fill = ab.class), alpha = 0.4)
# build the data displayed on the plot.
p.data <- ggplot_build(p)$data[[1]]
# Note that column 'scaled' is used for plotting
# so we extract the max density row for each group
p.text <- lapply(split(p.data, f = p.data$group), function(df){
df[which.max(df$scaled), ]
})
p.text <- do.call(rbind, p.text) # we can also get p.text with dplyr.
# now add the text layer to the plot
p + annotate('text', x = p.text$x, y = p.text$y,
label = sprintf('n = %d', p.text$n), vjust = 0)

ggplot2- geom_linerange with stat_smooth

Oh wise ones: I've got a question about the use of geom_linerange(), attached is what I hope is a workable example to illustrate my problem.
b=c(100,110,90,100,120,130,170,150,150,120,140,150,120,90,90,100,40,50,40,40,20,60,30)
test<-data.frame(a=c(2,2,2,4,4,4,4,6,6,6,6,6,6,8,8,8,10,10,10,10,10,10,10),
b=b,c=c(b-15))
testMelt <- melt(
test,
id = c("a"),
measured = c("b", "c")
)
p <- ggplot(
aes(
x = factor(a),
y = value,
fill= variable
),
data = testMelt) +
geom_boxplot() +
stat_smooth(aes(group=variable,x=factor(a),y=value,fill=factor(variable)),data=testMelt)
My actual dataset is much larger, and the boxplots are a bit overwhelming. I think what I want is to use geom_linerange() somehow to show the range of the data, at "b" and "c", at each value of "a".
The best I've come up with is:
p<- p+ geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable))
I can assume the "c" values are always equal to or less than "b", but if the range is smaller, this "covers it up". Can I jitter the lines somehow? Is there a better solution?
In your geom_linerange call, add an additional argument position=position_dodge(width=0.3). You can adjust the absolute width to change the separation between the vertical lines.
My understanding of the question is that you want the line range to reflect the range for the combination a:b:c.
geom_linerange(aes(as.factor(a),ymin=min(value),ymax=value,color=variable)) will set the minimum value to the whole-dataset minimum (hence all the lines appear with the same minimum value.
A couple of solutions.
Calculate the minima and maxima yourself
test_range <- ddply(testMelt, .(a,variable), summarize,
val_min = min(value), val_max = max(value))
then run
ggplot(data = testMelt) +
geom_boxplot(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable))) +
geom_linerange(data = test_range, aes(x = as.factor(a), ymin = val_min,
ymax = val_max, color = variable),
position = position_dodge(width = 0.3))
Or, for an alternative to boxplots / line range use a violin plot.
ggplot(data = testMelt) +
geom_violin(aes(x = factor(a), y = value, fill = variable)) +
stat_smooth(aes(group = variable, x = factor(a), y = value,
fill = factor(variable)))

Resources