geom_area ggplot fill above threshold with data subset - r

I read several posts on how to fill above geom_area plots using the geom_ribbon function, but none have also dealt with subsets of data.
Consider the following data and plot. I simply want to fill above a threshold, 25 in this example (y-axis), but also fill only within a subset of days within each month, in this case between days 2 and 12. In sum, both criteria must be met in order to fill, and I'm trying to get a smooth fill.
I can improve upon my graph below by using the approx function to interpolate a lot of points on my line, but it still does not handle my subset and connects fill lines between months.
library(ggplot2)
y = sample(1:50)
x = seq(as.Date("2011-12-30"), as.Date("2012-02-17"), by="days", origin="1970-01-01")
z = format(as.Date(x), "%d")
z=as.numeric(z)
df <- data.frame(x,y,z)
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=subset(df, z>=2 & z<=12), aes(ymin=25, ymax=ifelse(y< 25,20, y)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)
plot
Figure

The data=subset(df, z>=2 & z<=12) removes lines from the dataframe, so the data is 'lost' for geom_ribbon.
Instead of subsetting an additional condition for the y-value may get you closer to what you want to achieve:
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=df, aes(ymin=25, ymax=ifelse((z>=2 & z<=12), ifelse(y < 25, 20, y), 25)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)

Related

Creating a legend with shapes using ggplot2

I have created the following code for a graph in which four fitted lines and corresponding points are plotted. I have problems with the legend. For some reason I cannot find a way to assign the different shapes of the points to a variable name. Also, the colours do not line up with the actual colours in the graph.
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
g <- ggplot(df, aes(x=x), shape="shape") +
geom_smooth(aes(y=y1), colour="red", method="auto", se=FALSE) + geom_point(aes(y=y1),shape=14) +
geom_smooth(aes(y=y2), colour="blue", method="auto", se=FALSE) + geom_point(aes(y=y2),shape=8) +
geom_smooth(aes(y=y3), colour="green", method="auto", se=FALSE) + geom_point(aes(y=y3),shape=6) +
geom_smooth(aes(y=y4), colour="yellow", method="auto", se=FALSE) + geom_point(aes(y=y4),shape=2) +
ylab("x") + xlab("y") + labs(title="overview")
geom_line(aes(y=1000), linetype = "dashed")
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5)) +
scale_shape_binned(name="Value g", values=c(y1="14",y2="8",y3="6",y4="2"))
print(g)
I am wondering why the colours don't match up and how I can construct such a legend that it is clear which shape corresponds to which variable name.
While you can add the legend manually via scale_shape_manual, perhaps the adequate solution would be to reshape your data (try using tidyr::pivot_longer() on y1:y4 variables), and then assigning the resulting variable to the shape aesthetic (you can then manually set the colors to your liking). You would then need to use a single geom_point() and geom_smooth() instead of four of each.
Also, you're missing a reproducible example (what are the values of x?) and your code emits some warnings while trying to perform loess smoothing (because there's fewer data points than need to perform it).
Update (2021-12-12)
Here's a reproducible example in which we reshape the original data and feed it to ggplot using its aes() function to automatically plot different geom_point and geom_smooth for each "y group". I made up the values for the x variable.
library(ggplot2)
library(tidyr)
x <- 1:6
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
data2 <- df %>%
pivot_longer(y1:y4, names_to = "group", values_to = "y")
ggplot(data2, aes(x, y, color = group, shape = group)) +
geom_point(size = 3) + # increased size for increased visibility
geom_smooth(method = "auto", se = FALSE)
Run the code line by line in RStudio and use it to inspect data2. I think it'll make more sense here's the resulting output:
Another update
Freek19, in your second example you'll need to specify both the shape and color scales manually, so that ggplot2 considers them to be the same, like so:
library(ggplot2)
data <- ... # from your previous example
ggplot(data, aes(x, y, shape = group, color = group)) +
geom_smooth() +
geom_point(size = 3) +
scale_shape_manual("Program type", values=c(1, 2, 3,4,5)) +
scale_color_manual("Program type", values=c(1, 2, 3,4,5))
Hope this helps.
I managed to get close to what I want, using:
library(ggplot2)
data <- data.frame(x = c(0,0.02,0.04,0.06,0.08,0.1),
y = c(1400,1200,1100,1000,910,850, #y1
1300,1130,1010,970,890,840, #y2
1200,1080,980,950,880,820, #y3
1100,1050,960,930,830,810, #y4
1050,1000,950,920,810,800), #y5
group = rep(c("5%","6%","7%","8%","9%"), each = 6))
data
Values <- ggplot(data, aes(x, y, shape = group, color = group)) + # Create line plot with default colors
geom_smooth(aes(color=group)) + geom_point(aes(shape=group),size=3) +
scale_shape_manual(values=c(1, 2, 3,4,5))+
geom_line(aes(y=1000), linetype = "dashed") +
ylab("V(c)") + xlab("c") + labs(title="Valuation")+
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5))+
labs(group="Program Type")
Values
I am only stuck with 2 legends. I want to change both name, because otherwise they overlap. However I am not sure how to do this.

R ggplot annotate line plot with max value

I have a line plot created with this code:
# Create data
year <- c(2006,2007,2008,2009,2010,2011,2012,2013,2014)
sales <- c(4176,8560,6473,10465,14977,15421,14805,11183,10012)
df <- data.frame(year,sales)
# Plot
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line()
I would like to annotate it with a line that "shows" the maximum value like the example below:
Is this possible with ggplot?
Yes. For your current example, try this:
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line() +
geom_segment(aes(x = 2011, y = 0, xend = 2011, yend = 15421),linetype="dashed", color = "red")
Of course, for more general plotting needs, you can improve the codes instead of manually inputting the values here.
This is close, just needs the arrow:
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line() + theme_bw() +
geom_linerange(aes(ymax=sales, ymin=min(df$sales)),
data=df[which.max(df$sales),],
col="red", lty=2) +
geom_text(aes(label=sales),data=df[which.max(df$sales),], hjust=1.2, vjust=3)
It works by adding geom_linerange and geom_text geoms but setting the data for each to be the row of the original dataset corresponding to the maximum of the 'sales' column.

Plotting a bar chart with years grouped together

I am using the fivethirtyeight bechdel dataset, located here https://github.com/rudeboybert/fivethirtyeight, and am attempting to recreate the first plot shown in the article here https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. I am having trouble getting the years to group together similarly to how they did in the article.
This is the current code I have:
ggplot(data = bechdel, aes(year)) +
geom_histogram(aes(fill = clean_test), binwidth = 5, position = "fill") +
scale_fill_manual(breaks = c("ok", "dubious", "men", "notalk", "nowomen"),
values=c("red", "salmon", "lightpink", "dodgerblue",
"blue")) +
theme_fivethirtyeight()
I see where you were going with using the histogram geom but this really looks more like a categorical bar chart. Once you take that approach it's easier, after a bit of ugly code to get the correct labels on the year columns.
The bars are stacked in the wrong order on this one, and there needs to be some formatting applied to look like the 538 chart, but I'll leave that for you.
library(fivethirtyeight)
library(tidyverse)
library(ggthemes)
library(scales)
# Create date range column
bechdel_summary <- bechdel %>%
mutate(date.range = ((year %/% 10)* 10) + ((year %% 10) %/% 5 * 5)) %>%
mutate(date.range = paste0(date.range," - '",substr(date.range + 5,3,5)))
ggplot(data = bechdel_summary, aes(x = date.range, fill = clean_test)) +
geom_bar(position = "fill", width = 0.95) +
scale_y_continuous(labels = percent) +
theme_fivethirtyeight()
ggplot

ggplot gantt chart - consistent space between lines

I have two data frames, one larger (10 people) and one smaller (two people). I have generated a gantt chart for each data frame. How do I get it so the distance between lines is the same for each plot (i.e. not scaled based on number of entries).
# Generate vectors:
name <- paste("person", seq(10), sep = '_')
start <- sample(seq(5), size = 10, replace = T)
end <- sample(seq(6,10), size = 10, replace = T)
# Generate data frames:
big_chart <- data.frame(name = c(name,name), value = c(start,end))
small_chart <- big_chart[c(1:2,11:12),]
# big plot
library(ggplot)
ggplot(big_chart, aes(value, name)) +
geom_line()
# small plot
ggplot(small_chart, aes(value, name)) +
geom_line()
Below is my solution for you, hopefully it is what you were looking for. I made use of the coord_fixed function to control the overall scaling. In addition, I also fixed your x-axis range using the xlim function.
library(ggplot2)
ggplot(big_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)
ggplot(small_chart, aes(value, name)) +
geom_line() +
xlim(0, 10) + #optional
coord_fixed(ratio = 0.5)

How to apply separate coord_cartesian() to "zoom in" into individual panels of a facet_grid()?

Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.

Resources