Manipulate y-axis scale using ggforce facet_zoom - r

I am plotting a time series and I want to zoom on few observations. This can be done using facet_zoom() from ggforce package.
library(dplyr)
library(ggplot2)
library(ggforce)
library(stringr)
airquality %>%
mutate(month_day = seq(as.Date("2000/1/1"),
by = "month",
length.out = n())) %>%
ggplot(aes(x = month_day, y = Temp)) +
geom_line() +
facet_zoom(x = month_day > "2010/1/1" & month_day < "2010/9/1")
Resulting plot:
However, I would like to manipulate the scale on y-axis of the lower panel plot, making it smaller. Is there a way to do this?

Use xy instead of x, and set horizontal to TRUE to auto fit the y-axis:
airquality %>%
mutate(month_day = seq(as.Date("2000/1/1"),
by = "month",
length.out = n())) %>%
ggplot(aes(x = month_day, y = Temp)) +
geom_line() +
facet_zoom(xy = month_day > "2010/1/1" & month_day < "2010/9/1", horizontal = FALSE)

Related

Fill by group is not showing up in ggplot

I am trying to make a stacked 100% area chart showing the distribution of two rider types (casual vs member) from hours between 0 and 24. However, my plot does not show up with separate fills for my group.
My table is the following:
start_hour_dist <- clean_trips %>%
group_by(start_hour, member_casual) %>%
summarise(n = n()) %>%
mutate(percentage = n / sum(n))
start_hour_dist table
my code for the plot is the following:
ggplot(start_hour_dist, mapping = aes(x=start_hour, y=percentage, fill=member_casual)) +
geom_area()
However, when I run the plot, my chart does not have the fill and looks like this:
plot
What can I do to make the plot show up something like this?
image from r-graph-gallery
Thanks!
Ben
Your problem is likely the start_hour column being passed as a character vector. Change to an integer first. For example:
library(tidyverse)
df <- tibble(start_hour = sprintf("%02d", rep(0:23, each = 2)),
member_casual = rep(c("member", "casual"), times = 24),
percentage = runif(48))
df |>
ggplot(mapping = aes(
x = start_hour,
y = percentage,
fill = member_casual
)) +
geom_area()
This re-creates your blank graph:
Changing the column type first:
df |>
mutate(start_hour = as.integer(start_hour)) |>
ggplot(mapping = aes(
x = start_hour,
y = percentage,
fill = member_casual
)) +
geom_area(position = "fill")

How to stack partially matched time periods with geom_area (ggplot2)?

With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")

x-axis starting value for diverging plot

How can I change the "x-axis starting value" from the diverging bar chart below (extracted from here), so that the vertical axis is set at 25 instead of 0. And therefore the bars are drawn from 25 and not 0.
For instance, I want this chart:
To look like this:
EDIT
It it not the label I want to change, it is how the data is plotted. My apologies if I wasn't clear. See example below:
Another example to make it clear:
You can provide computed labels to an (x-)scale via scale_x_continuous(labels = function (x) x + 25).
If you also want to change the data, you’ll first need to offset the x-values by the equivalent amount (in the opposite direction):
Example:
df = tibble(Color = c('red', 'green', 'blue'), Divergence = c(5, 10, -5))
offset = 2
df %>%
mutate(Divergence = Divergence - offset) %>%
ggplot() +
aes(x = Divergence, y = Color) +
geom_col() +
scale_x_continuous(labels = function (x) x + offset)
I'm still not 100% clear on your intended outcome but you can "shift" your data by adding/subtracting 25 from each value, e.g.
Original plot:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
subtract 25:
library(tidyverse)
library(gapminder)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change)) +
geom_bar(stat = "identity") +
coord_flip()
If you combine that with my original relabelling I think that's the solution:
ggplot(data = gapminder_subset,
aes(x = country, y = gdp_change - 25)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(breaks = c(-25, 0, 25, 50),
labels = c(0, 25, 50, 75))
The answers that existed at the time that I'm writing this are suggesting to change the data or to change the label. Here, I'm proposing to change neither the data nor the labels, and instead just change where the starting position of a bar is.
First, for reproducibility, I took #jared_mamrot's approach for the data subset.
library(gapminder)
library(tidyverse)
set.seed(123)
gapminder_subset <- gapminder %>%
pivot_longer(-c(country, continent, year)) %>%
filter(year == "1997" | year == "2007") %>%
select(-continent) %>%
filter(name == "gdpPercap") %>%
pivot_wider(names_from = year) %>%
select(-name) %>%
mutate(gdp_change = ((`2007` - `1997`) / `1997`) * 100) %>%
sample_n(15)
Then, you can set xmin = after_scale(25). You'll get a warning that xmin doesn't exists, but it does exist after the bars are reparameterised to rectangles in the ggplot2 internals (which is after the x-scale has seen the data to determine limits). This effectively changes the position where bars start.
ggplot(gapminder_subset,
aes(gdp_change, country)) +
geom_col(aes(xmin = after_scale(25)))
#> Warning: Ignoring unknown aesthetics: xmin
Created on 2021-06-28 by the reprex package (v1.0.0)

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

How to mark minimum point from ggplot line plot [duplicate]

I am using the built-in economics (from the ggplot2 package) dataset in R, and have plotted a time-series for each variable in the same graph using the following code :
library(reshape2)
library(ggplot2)
me <- melt(economics, id = c("date"))
ggplot(data = me) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_y')
Now, I further want to refine my graph, For each series, I want to display a red point for the smallest and the largest value.
So I thought if I could find the co-ordinates of the min and max of each time-series, I could find a way to plot a red dot at beginning and ending of each time series. For this I used the following code :
which(pce == min(economics$pce), arr.ind = TRUE)
which(pca == max(pca), arr.ind = TRUE)
This doesnt really lead me anywhere.
Thank you:)
Method 1: Using Joins
This can be nice when you want to save the filtered subsets
library(reshape2)
library(ggplot2)
library(dplyr)
me <- melt(economics, id=c("date"))
me %>%
group_by(variable) %>%
summarise(min = min(value),
max = max(value)) -> me.2
left_join(me, me.2) %>%
mutate(color = value == min | value == max) %>%
filter(color == TRUE) -> me.3
ggplot(data=me, aes(x = date, y = value)) +
geom_line() +
geom_point(data=me.3, aes(x = date, y = value), color = "red") +
facet_wrap(~variable, ncol=1, scales='free_y')
Method 2: Simplified without Joins
Thanks #Gregor
me.2 <- me %>%
group_by(variable) %>%
mutate(color = (min(value) == value | max(value) == value))
ggplot(data=me.2, aes(x = date, y = value)) +
geom_line() +
geom_point(aes(color = color)) +
facet_wrap(~variable, ncol=1, scales="free_y") +
scale_color_manual(values = c(NA, "red"))

Resources