How to change line properties in ggplot2 halfway in a time series? - r

Take the following straightforward plot of two time series from the economics{ggplot2} dataset
require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) %>%
ggplot(aes(date, percentage, group = indicator, colour = indicator)) + geom_line(size=1)
I would like to change the linetype from "solid" to "dashed" (and possibly also the line size) for all points in the 21st century, i.e. for those observations for which Y2K equals TRUE.
I did a group_by(indicator, Y2K) but inside the ggplot command it appears I cannot use group = on multiple levels, so the line properties only differ by indicator now.
Question: How can I achieve this segmented line appearance?
UPDATE: my preferred solution is a slight tweak from the one by #sahoang:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = year(date) >= 2000)) +
scale_linetype(guide = F)
This eliminates the group_by as commented by #Roland, and the filter steps make sure that the time series will be connected at the Y2K point (in case the data would be year based, there could be a visual discontinuity otherwise).

Even easier than #Roland's suggestion:
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
group_by(indicator, Y2K) -> econ
ggplot(econ, aes(date, percentage, group = indicator, colour = indicator)) +
geom_line(data = filter(econ, !Y2K), size=1, linetype = "solid") +
geom_line(data = filter(econ, Y2K), size=1, linetype = "dashed")
P.S. Alter plot width to remove spike artifacts (red line).

require(dplyr)
require(ggplot2)
require(lubridate)
require(tidyr)
economics %>%
gather(indicator, percentage, c(4:5), -c(1:3, 6)) %>%
mutate(Y2K = year(date) >= 2000) %>%
ggplot(aes(date, percentage, colour = indicator)) +
geom_line(size=1, aes(linetype = Y2K))

Related

How to stack partially matched time periods with geom_area (ggplot2)?

With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")

shading under geom_step with discrete x-axis, respecting the factor order

I'd like to shade the area under a geom_step() curve on a plot with a discrete and ordered x-axis, e.g. to show the cumulative distribution for some frequency-ordered categories/
The basic geom_step() curve could be created like this:
library(dplyr)
library(ggplot2)
library(forcats)
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
expand_limits(y=0) +
labs(title="a pareto-style cumulative distribution chart",
subtitle="with x-axis ordered by decreasing frequency",
y="cumulative fraction of total") +
theme_minimal()
but adding the shaded area using geom_rect() as taught by this answer seems to re-order the x-axis, resulting in a nonsensical plot:
diamonds %>%
group_by(color) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf=cumsum(frac_of_tot),
color=fct_reorder(color, ecdf)) %>%
ggplot(aes(x=color, y=ecdf, group=0)) +
geom_step() +
geom_rect(aes(xmin=color, xmax=lead(color), ymin=0, ymax=ecdf), alpha=0.3) +
expand_limits(y=0) +
labs(title="A sudden mess after adding geom_rect",
subtitle="with x-axis surprisingly back in alpha order",
y="cumulative fraction of total") +
theme_minimal()
Why is the geom_rect() layer causing the x-axis to be re-ordered?
How can I produce a plot that looks just like the first one, but with the area under the curve shaded?
It seems to me that doing this with geom_rect is doing it the hard way. With some minor data reshaping you can simply use geom_area
library(dplyr)
library(ggplot2)
library(forcats)
library(tidyr)
diamonds %>%
group_by(color) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
mutate(frac_of_tot = count/sum(count),
ecdf = cumsum(frac_of_tot),
ecd = lag(ecdf),
color = fct_reorder(color, ecdf)) %>%
pivot_longer(starts_with("ecd")) %>%
arrange(color, name) %>%
ggplot(aes(x = color, y = value, group = 0)) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
expand_limits(y = 0) +
labs(title = "a pareto-style cumulative distribution chart",
subtitle = "with x-axis ordered by decreasing frequency",
y = "cumulative fraction of total") +
theme_minimal()

x-axis starting value for diverging plot

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

Difference between geom_col() and geom_point() for same value

So, I'm trying to plot missing values here over time (longitudinal data).
I would prefer placing them in a geom_col() to fill up with colours of certain treatments afterwards. But for some weird reason, geom_col() gives me weird values, while geom_point() gives me the correct values using the same function. I'm trying to wrap my head around why this is happening. Take a look at the y-axis.
Disclaimer:
I know the missing values dissappear on day 19-20. This is why I'm making the plot.
Sorry about the lay-out of the plot. Not polished yet.
For the geom_point:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_point()
Picture: geom_point
For the geom_col:
gaussian_transformed %>% group_by(factor(time)) %>% mutate(missing = sum(is.na(Rose_width))) %>% ggplot(aes(x = factor(time), y = missing)) + geom_col()
Picture: geom_col
The problem is that you're using mutate and creating several rows for your groups. You cannot see that, but you will have plenty of points overlapping in your geom_point plot.
One way is to either use summarise, or you use distinct
Compare
library(tidyverse)
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_point()
The points look ugly because there is a lot of over plotting.
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
distinct(order, .keep_all = TRUE) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
msleep %>% group_by(order) %>%
mutate(missing = sum(is.na(sleep_cycle))) %>%
ggplot(aes(x = order, y = missing)) +
geom_col()
Created on 2021-06-02 by the reprex package (v2.0.0)
So after some digging:
What happens was that the geom_col() function sums up all the missing values while geom_point() does not. Hence the large values for y. Why this is happening, I do not know. However doing the following worked fine for me:
gaussian_transformed$time <- as.factor(gaussian_transformed$time)
gaussian_transformed %>% group_by(time) %>% summarise(missing = sum(is.na(Rose_width))) -> gaussian_transformed
gaussian_transformed %>% ggplot(aes(x = time, y = missing)) + geom_col(fill = "blue", alpha = 0.5) + theme_minimal() + labs(title = "Missing values in Gaussian Outcome over the days", x = "Time (in days)", y = "Amount of missing values") + scale_y_continuous(breaks = seq(0, 10, 1))
With the plot: GaussianMissing

How to adjust distances between years on x-axis and adjust line of geom_line(), in ggplot, R-studio?

I would like to create a line plot using ggplot's geom_line() where all distances between years are equal independent of the actual value the year-variable takes and where the dots of geom_point() are connected if there are only two years in between but not if the temporal distance is more than that.
Example:
my.data<-data.frame(
year=c(2001,2003,2005,NA,NA,NA,NA,NA,NA,2019),
value=c(runif(10)))
As for the plot I have tried two different things, both of which are not ideal:
Plotting year as continuous variable with breaks=year and minor_breaks=F, where, obviously the distances between the first three observations are much smaller than the distance between 2005 and 2019, and where, unfortunately, all dots are connected:
library(ggplot2)
library(dplyr)
my.data %>%
ggplot(aes(x=year,y=value)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
Removing NAs and plotting year as factor which yields equal spacing between the years, but obviously removes the lines between data points:
my.data %>%
filter(!is.na(year)) %>%
ggplot(aes(x=factor(year),y=value)) +
geom_line() +
geom_point() +
theme_minimal()
Are there any solutions to these issues? What am I overlooking?
First attempt:
Second attempt:
What I need (but ideally without the help of Paint):
my.data %>%
ggplot(aes(x=year)) +
geom_line(aes(y = ifelse(year <= 2005,value,NA))) +
geom_point(aes(y = value)) +
scale_x_continuous(breaks=c(2001,2003,2005,2019), minor_breaks=F) +
theme_minimal()
maybe something like this would work
I came to a bit convoluted and not super clean solution, but it might get the job done. I am checking if one year should be connected to the next one with lead(). And "remove" the appropriate connections by turning them white. The dummy column is there to put all years in one line and not two.
my.data = data.frame(year=c(2001,2003,2005,2008,2009,2012,2015,2016,NA,2019),
value=c(runif(10))) %>%
filter(!is.na(year)) %>%
mutate(grouped = if_else(lead(year) - year <= 2, "yes", "no")) %>%
fill(grouped, .direction = "down") %>%
mutate(dummy = "all")
my.data %>%
ggplot(aes(x = factor(year),y = value)) +
geom_line(aes(y = value, group = dummy, color = grouped), show.legend = FALSE) +
geom_point() +
scale_color_manual(values = c("yes" = "black", "no" = "white")) +
theme_classic()

Resources