I like to plot the time series of my data. However there are some gaps in the date value like in the example below. The following code produces the plot disregarding the missing date. How can I show the missing date i.e. show a gap between 2021-01-02 and 2021-01-04 and similarly 2021-01-06 and 2021-01-08.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08')),
stringsAsFactors = FALSE
)
fake.data %>%
gather(variable, value,varA:varC) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area()
I guess the easiest would be to fake the gaps, e.g., with geom_rect.
Consider that "gaps in data" are actually inherent to most use of line / area graphs - some purists might actually be totally against showing lines / areas for non-continuous measurements, because it suggests continuous measurements. Thus, because it is interpolated anyways, you could argue that you might as well not need to show those gaps.
library(tidyverse)
fake.data <- data.frame(
varA = c(0.6,0.5,0.2,0.3,0.7),
varB = c(0.1,0.2,0.4,0.6,0.2),
varC = c(0.3,0.3,0.4,0.1,0.1),
start_date = as.Date(c('2021-01-01','2021-01-02','2021-01-04','2021-01-06','2021-01-08'))
) %>% pivot_longer(cols = matches("^var"), names_to = "variable", values_to = "value" )
ls_data <- setNames(fake.data %>%
complete(start_date = full_seq(start_date, 1)) %>%
split(., is.na(.$variable)), c("vals", "missing"))
ggplot(ls_data$vals, aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_rect(data = ls_data$missing, aes(xmin = start_date-.5, xmax = start_date+.5,
ymin = 0, ymax = Inf), fill = "white") +
theme_classic()
Created on 2021-04-21 by the reprex package (v2.0.0)
Considering the above - I'd possibly favour not explicitly showing the gaps, but to show the measurements more explicitly. E.g., with geom_point.
fake.data %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area() +
geom_point(position = "stack") +
geom_line(position = "stack")
is this close to what you wish ?
todateseq<-fake.data %>%
select(start_date) %>%
pull
first <- min(todateseq)
last <- max(todateseq)
date_seq <- seq.Date(first,last,by='day')
fake.data2 <- data.frame(start_date=date_seq) %>%
left_join(fake.data,by='start_date')
fake.data2 %>%
gather(variable, value,varA:varC) %>%
mutate(value=ifelse(is.na(value),0,value)) %>%
ggplot(aes(x = start_date, y = value, fill = variable)) +
geom_area(na.rm = F,position = position_stack())
Related
With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)
I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)
I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()
I am using the built-in economics (from the ggplot2 package) dataset in R, and have plotted a time-series for each variable in the same graph using the following code :
library(reshape2)
library(ggplot2)
me <- melt(economics, id = c("date"))
ggplot(data = me) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_y')
Now, I further want to refine my graph, For each series, I want to display a red point for the smallest and the largest value.
So I thought if I could find the co-ordinates of the min and max of each time-series, I could find a way to plot a red dot at beginning and ending of each time series. For this I used the following code :
which(pce == min(economics$pce), arr.ind = TRUE)
which(pca == max(pca), arr.ind = TRUE)
This doesnt really lead me anywhere.
Thank you:)
Method 1: Using Joins
This can be nice when you want to save the filtered subsets
library(reshape2)
library(ggplot2)
library(dplyr)
me <- melt(economics, id=c("date"))
me %>%
group_by(variable) %>%
summarise(min = min(value),
max = max(value)) -> me.2
left_join(me, me.2) %>%
mutate(color = value == min | value == max) %>%
filter(color == TRUE) -> me.3
ggplot(data=me, aes(x = date, y = value)) +
geom_line() +
geom_point(data=me.3, aes(x = date, y = value), color = "red") +
facet_wrap(~variable, ncol=1, scales='free_y')
Method 2: Simplified without Joins
Thanks #Gregor
me.2 <- me %>%
group_by(variable) %>%
mutate(color = (min(value) == value | max(value) == value))
ggplot(data=me.2, aes(x = date, y = value)) +
geom_line() +
geom_point(aes(color = color)) +
facet_wrap(~variable, ncol=1, scales="free_y") +
scale_color_manual(values = c(NA, "red"))