Animated sorted bar chart with bars overtaking each other - r

Edit: keyword is 'bar chart race'
How would you go at reproducing this chart from Jaime Albella in R ?
See the animation on visualcapitalist.com or on twitter (giving several references in case one breaks).
I'm tagging this as ggplot2 and gganimate but anything that can be produced from R is relevant.
data (thanks to https://github.com/datasets/gdp )
gdp <- read.csv("https://raw.github.com/datasets/gdp/master/data/gdp.csv")
# remove irrelevant aggregated values
words <- scan(
text="world income only total dividend asia euro america africa oecd",
what= character())
pattern <- paste0("(",words,")",collapse="|")
gdp <- subset(gdp, !grepl(pattern, Country.Name , ignore.case = TRUE))
Edit:
Another cool example from John Murdoch :
Most populous cities from 1500 to 2018

Edit: added spline interpolation for smoother transitions, without making rank changes happen too fast. Code at bottom.
I've adapted an answer of mine to a related question. I like to use geom_tile for animated bars, since it allows you to slide positions.
I worked on this prior to your addition of data, but as it happens, the gapminder data I used is closely related.
library(tidyverse)
library(gganimate)
library(gapminder)
theme_set(theme_classic())
gap <- gapminder %>%
filter(continent == "Asia") %>%
group_by(year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(rank = min_rank(-gdpPercap) * 1) %>%
ungroup()
p <- ggplot(gap, aes(rank, group = country,
fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = gdpPercap/2,
height = gdpPercap,
width = 0.9), alpha = 0.8, color = NA) +
# text in x-axis (requires clip = "off" in coord_*)
# paste(country, " ") is a hack to make pretty spacing, since hjust > 1
# leads to weird artifacts in text spacing.
geom_text(aes(y = 0, label = paste(country, " ")), vjust = 0.2, hjust = 1) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
labs(title='{closest_state}', x = "", y = "GFP per capita") +
theme(plot.title = element_text(hjust = 0, size = 22),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm")) +
transition_states(year, transition_length = 4, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, fps = 25, duration = 20, width = 800, height = 600)
For the smoother version at the top, we can add a step to interpolate the data further before the plotting step. It can be useful to interpolate twice, once at rough granularity to determine the ranking, and another time for finer detail. If the ranking is calculated too finely, the bars will swap position too quickly.
gap_smoother <- gapminder %>%
filter(continent == "Asia") %>%
group_by(country) %>%
# Do somewhat rough interpolation for ranking
# (Otherwise the ranking shifts unpleasantly fast.)
complete(year = full_seq(year, 1)) %>%
mutate(gdpPercap = spline(x = year, y = gdpPercap, xout = year)$y) %>%
group_by(year) %>%
mutate(rank = min_rank(-gdpPercap) * 1) %>%
ungroup() %>%
# Then interpolate further to quarter years for fast number ticking.
# Interpolate the ranks calculated earlier.
group_by(country) %>%
complete(year = full_seq(year, .5)) %>%
mutate(gdpPercap = spline(x = year, y = gdpPercap, xout = year)$y) %>%
# "approx" below for linear interpolation. "spline" has a bouncy effect.
mutate(rank = approx(x = year, y = rank, xout = year)$y) %>%
ungroup() %>%
arrange(country,year)
Then the plot uses a few modified lines, otherwise the same:
p <- ggplot(gap_smoother, ...
# This line for the numbers that tick up
geom_text(aes(y = gdpPercap,
label = scales::comma(gdpPercap)), hjust = 0, nudge_y = 300 ) +
...
labs(title='{closest_state %>% as.numeric %>% floor}',
x = "", y = "GFP per capita") +
...
transition_states(year, transition_length = 1, state_length = 0) +
enter_grow() +
exit_shrink() +
ease_aes('linear')
animate(p, fps = 20, duration = 5, width = 400, height = 600, end_pause = 10)

This is what I came up with, so far, based in good part on #Jon's answer.
p <- gdp %>%
# build rank, labels and relative values
group_by(Year) %>%
mutate(Rank = rank(-Value),
Value_rel = Value/Value[Rank==1],
Value_lbl = paste0(" ",round(Value/1e9))) %>%
group_by(Country.Name) %>%
# keep top 10
filter(Rank <= 10) %>%
# plot
ggplot(aes(-Rank,Value_rel, fill = Country.Name)) +
geom_col(width = 0.8, position="identity") +
coord_flip() +
geom_text(aes(-Rank,y=0,label = Country.Name,hjust=0)) + #country label
geom_text(aes(-Rank,y=Value_rel,label = Value_lbl, hjust=0)) + # value label
theme_minimal() +
theme(legend.position = "none",axis.title = element_blank()) +
# animate along Year
transition_states(Year,4,1)
animate(p, 100, fps = 25, duration = 20, width = 800, height = 600)
I might come back to improve it.
The moving grid could be simulated by removing the actual grid and having geom_segment lines moving and fading out thanks to an alpha parameter changing when it approaches 100 billion.
To have labels changing values between years (which gives a nice feeling of urgency in the original chart) I think we have no choice but multiplying the rows while interpolating labels, we'll need to interpolate Rank too.
Then with a few minor cosmetic changes we should be pretty close.

This is what I came up, I just use Jon and Moody code as a template and make few changes.
library(tidyverse)
library(gganimate)
library(gapminder)
theme_set(theme_classic())
gdp <- read.csv("https://raw.github.com/datasets/gdp/master/data/gdp.csv")
words <- scan(
text="world income only total dividend asia euro america africa oecd",
what= character())
pattern <- paste0("(",words,")",collapse="|")
gdp <- subset(gdp, !grepl(pattern, Country.Name , ignore.case = TRUE))
colnames(gdp) <- gsub("Country.Name", "country", colnames(gdp))
colnames(gdp) <- gsub("Country.Code", "code", colnames(gdp))
colnames(gdp) <- gsub("Value", "value", colnames(gdp))
colnames(gdp) <- gsub("Year", "year", colnames(gdp))
gdp$value <- round(gdp$value/1e9)
gap <- gdp %>%
group_by(year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(rank = min_rank(-value) * 1,
Value_rel = value/value[rank==1],
Value_lbl = paste0(" ",value)) %>%
filter(rank <=10) %>%
ungroup()
p <- ggplot(gap, aes(rank, group = country,
fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = value/2,
height = value,
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(country, " ")), vjust = 0.2, hjust = 1) +
geom_text(aes(y=value,label = Value_lbl, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
labs(title='{closest_state}', x = "", y = "GDP in billion USD",
caption = "Sources: World Bank | Plot generated by Nitish K. Mishra #nitishimtech") +
theme(plot.title = element_text(hjust = 0, size = 22),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm")) +
transition_states(year, transition_length = 4, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, 200, fps = 10, duration = 40, width = 800, height = 600, renderer = gifski_renderer("gganim.gif"))
Here I am using duration 40 second, which is slow. You can change duration and make it faster or slower as you needed.

Related

How to make A racing Bar Chart in R

I have a dataset that has a column of years from 1965 to 2020 and Teams that have won the championship in the respective years.
I am trying to create a racing bar chart and so far I have been struggling to create the required dataset to create the animated GIF
df1 <- df %>%
group_by(Team) %>%
mutate(cups = 1:n()) %>%
ungroup() %>%
group_by(Year) %>% spread(Year, cups) %>%
replace(is.na(.),0)
which brings a result of the following format.
Kindly assist in how I should go about completing this racing bar chart as I have browsed through several resources but I still cant seem to crack it..
Check if this work, as Jon mentioned you need to pivot your data using pinot_longer
df1 <- pivot_longer(df, -1, names_to = 'Year') %>%
rename(Team= ï..Team) %>%
mutate(Year = as.numeric(substr(Year, 2, 5)))
Then this should create the racing barchart"
df1 <- df1 %>%
group_by(Year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(rank = min_rank(-value) * 1,
Value_rel = value/value[rank==1],
Value_lbl = paste0(" ",value)) %>%
filter(rank <=10) %>% # This would show the top 10 teams
ungroup()
p <- ggplot(df1, aes(rank, group = Team,
fill = as.factor(Team), color = as.factor(Team))) +
geom_tile(aes(y = value/2,
height = value,
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(Team, " ")), vjust = 0.2, hjust = 2) +
geom_text(aes(y=value,label = Value_lbl, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
labs(title='{closest_state}', x = "", y = "Your Title",
caption = "Your Caption") +
theme(plot.title = element_text(hjust = 0, size = 22),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm")) +
transition_states(Year, transition_length = 4, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, 200, fps = 10, duration = 40, width = 800, height = 600, renderer = gifski_renderer("gganim.gif"))
anim_save("YourPath//Name.gif")

Text color with geom_label_repel

Not specific to any particular piece of code, is there a relatively straightforward way to change the color of the text in a geom_label_repel box?
Specifically, I have code that produces the below chart
The percentage in the label box is the percent change in 7-day moving average for the most recent week over the week prior. I'd simply like to color the text red when the value is positive and green when it is negative.
The dataframe for this chart can be copied from here.
The plot code is
#endpoint layer
BaseEndpoints <- smDailyBaseData %>% filter(Base %in% AFMCbases) %>%
group_by(Base) %>%
filter(DaysSince == max(DaysSince)) %>%
select(Base, abbv, DaysSince, newRate,label) %>%
ungroup()
ZoomEndpoints <- BaseEndpoints %>% filter(Base != 'Edwards') %>%
mutate(zoom = TRUE)
CAEndPoint <- BaseEndpoints %>% filter(Base == 'Edwards') %>%
mutate(zoom = FALSE)
ZoomEndpoints <- rbind(ZoomEndpoints, CAEndPoint)
BasePlot <- smDailyBaseData %>% filter(Base %in% AFMCbases) %>%
ggplot(mapping = aes(x = as.numeric(DaysSince), y = newRate)) +
geom_line(aes(color=abbv),show.legend = FALSE) +
scale_color_ucscgb() +
geom_point(data = BaseEndpoints,size = 1.5,shape = 21,
aes(color = abbv,fill = abbv), show.legend = FALSE) +
geom_label_repel(data=ZoomEndpoints, aes(label=label), show.legend = FALSE,
vjust = 0, xlim=c(105,200), size=3, direction='y') +
labs(x = "Days Since First Confirmed Case",
y = "% Local Population Infected Daily") +
theme(plot.title = element_text(size = rel(1), face = "bold"),
plot.subtitle = element_text(size = rel(0.7)),
plot.caption = element_text(size = rel(1))) +
facet_zoom(xlim = c(50,120), ylim=c(0,0.011),zoom.data=zoom)
print(BasePlot)
Yes, it's as simple as this:
library(ggplot2)
df <- data.frame(x = c(-1, -1, 1, 1), y = c(-1, 1, 1, -1), value = c(-2, -1, 1, 2))
ggplot(df, aes(x, y)) +
geom_point(size = 3) +
ggrepel::geom_label_repel(aes(label = value, colour = factor(sign(value)))) +
lims(x = c(-100, 100), y = c(-100, 100)) +
scale_colour_manual(values = c("red", "forestgreen"))
EDIT
Now we have a more concrete example, I can see the problem more clearly. There are workarounds such as using ggnewscale or a hand-crafted solution such as Ian Campbell's thorough example. Personally, I would just note that you haven't used the fill scale yet, and this looks pretty good to my eye:
Here's a bit of a hacky solution since you can't have two scale_color_*'s at the same time:
The approach centers on manually assigning the color outside of aes in the geom_label_repel call. Adding one to the grepl result that searches for the minus sign in the label allows you to subset the two colors. You need two colors for each label, I assume for the box and for the text, so I used rep.
smDailyBaseData %>%
ggplot(mapping = aes(x = as.numeric(DaysSince), y = newRate)) +
geom_line(aes(color=abbv),show.legend = FALSE) +
scale_color_ucscgb() +
geom_point(data = BaseEndpoints,size = 1.5,shape = 21,
aes(color = abbv,fill = abbv), show.legend = FALSE) +
geom_label_repel(data=ZoomEndpoints, aes(label=label),
color = rep(c("green","red")[1+grepl("\\-\\d",as.factor(ZoomEndpoints$label))],times = 2),
show.legend = FALSE, vjust = 0, xlim=c(105,200),
size=3, direction='y') +
labs(x = "Days Since First Confirmed Case",
y = "% Local Population Infected Daily") +
theme(plot.title = element_text(size = rel(1), face = "bold"),
plot.subtitle = element_text(size = rel(0.7)),
plot.caption = element_text(size = rel(1))) +
facet_zoom(xlim = c(50,120), ylim=c(0,0.011),zoom.data=zoom)
Data Setup
#source("https://pastebin.com/raw/Vn2abQ4a")
BaseEndpoints <- smDailyBaseData %>%
group_by(Base) %>%
dplyr::filter(DaysSince == max(DaysSince)) %>%
dplyr::select(Base, abbv, DaysSince, newRate,label) %>%
ungroup()
ZoomEndpoints <- BaseEndpoints %>% filter(Base != 'Edwards') %>%
mutate(zoom = TRUE)
CAEndPoint <- BaseEndpoints %>% filter(Base == 'Edwards') %>%
mutate(zoom = FALSE)
ZoomEndpoints <- rbind(ZoomEndpoints, CAEndPoint)

R label with commas but no decimals

My goal is to produce labels with commas, but no decimals. Let's say I have a ggplot with the following section:
geom_text(aes(y = var,
label = scales::comma(round(var))), hjust = 0, nudge_y = 300 )
This is almost what I need. It gives me the commas, but has a decimal. I have seen here (axis labels with comma but no decimals ggplot) that comma_format() could be good, but I think the label in my case needs a data argument, which comma_format() does not take. What can I do?
Update:
As an example of when this problem occurs, see the following, which uses gganimate and has a lot more going on. Code derived from Jon Spring's answer at Animated sorted bar chart with bars overtaking each other
library(gapminder)
library(gganimate)
library(tidyverse)
gap_smoother <- gapminder %>%
filter(continent == "Asia") %>%
group_by(country) %>%
complete(year = full_seq(year, 1)) %>%
mutate(gdpPercap = spline(x = year, y = gdpPercap, xout = year)$y) %>%
group_by(year) %>%
mutate(rank = min_rank(-gdpPercap) * 1) %>%
ungroup() %>%
group_by(country) %>%
complete(year = full_seq(year, .5)) %>%
mutate(gdpPercap = spline(x = year, y = gdpPercap, xout = year)$y) %>%
mutate(rank = approx(x = year, y = rank, xout = year)$y) %>%
ungroup() %>%
arrange(country,year)
gap_smoother2 <- gap_smoother %>% filter(year<=2007 & year>=1999)
gap_smoother3 <- gap_smoother2 %<>% filter(rank<=8)
p <- ggplot(gap_smoother3, aes(rank, group = country,
fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = gdpPercap/2,
height = gdpPercap,
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(country, " ")), vjust = 0.2, hjust = 1) +
geom_text(aes(y = gdpPercap,
label = scales::comma(round(gdpPercap))), hjust = 0, nudge_y = 300 ) +
coord_flip(clip = "off", expand = FALSE) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
labs(title='{closest_state %>% as.numeric %>% floor}',
x = "", y = "GFP per capita") +
theme(plot.title = element_text(hjust = 0, size = 22),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm")) +
transition_states(year, transition_length = 1, state_length = 0) +
enter_grow() +
exit_shrink() +
ease_aes('linear')
animate(p, fps = 2, duration = 5, width = 600, height = 500)
In addition to the solution provided by #drf, you need to add scale_y_continuous(scales::comma) to your ggplot commands. But put it before the coord_flip function.
p <- ggplot(gap_smoother3, aes(rank, group = country,
fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = gdpPercap/2,
height = gdpPercap,
width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = gdpPercap,
label = scales::comma(round(gdpPercap), accuracy=1)),
hjust = 0, nudge_y = 300 ) +
scale_y_continuous(labels = scales::comma) +
... etc.

ggplot monthly date scale on x axis uses days as units

When plotting a bar chart with monthly data, ggplot shortens the distance between February and March, making the chart look inconsistent
require(dplyr)
require(ggplot2)
require(lubridate)
## simulating sample data
set.seed(.1073)
my_df <- data.frame(my_dates = sample(seq(as.Date('2010-01-01'), as.Date('2016-12-31'), 1), 1000, replace = TRUE))
### aggregating + visualizing counts per month
my_df %>%
mutate(my_dates = round_date(my_dates, 'month')) %>%
group_by(my_dates) %>%
summarise(n_row = n()) %>%
ggplot(aes(x = my_dates, y = n_row))+
geom_bar(stat = 'identity', color = 'black',fill = 'slateblue', alpha = .5)+
scale_x_date(date_breaks = 'months', date_labels = '%y-%b') +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
I would keep the dates as dates rather than factors. Yes, factors will keep the bars uniform in size but you'll have to remember to join in any months that are missing so that blank months aren't skipped and factors are easy to get out of order. I would recommend adjusting your aesthetics to reduce the effect that the black outline has on the gap between February and March.
Here are two examples:
Adjust the outline color to be white. This will reduce the contrast and makes the gap less noticible.
Set the width to 20 (days).
As an aside, you don't need to summarize the data, you can use floor_date() or round_date() in an earlier step and go straight into geom_bar().
dates <- seq(as.Date("2010-01-01"), as.Date("2016-12-31"), 1)
set.seed(.1073)
my_df <-
tibble(
my_dates = sample(dates, 1000, replace = TRUE),
floor_dates = floor_date(my_dates, "month")
)
ggplot(my_df, aes(x = floor_dates)) +
geom_bar(color = "white", fill = "slateblue", alpha = .5)
ggplot(my_df, aes(x = floor_dates)) +
geom_bar(color = "black", fill = "slateblue", alpha = .5, width = 20)
using some parts from IceCream's answer you can try this.
Of note, geom_col is now recommended to use in this case.
my_df %>%
mutate(my_dates = factor(round_date(my_dates, 'month'))) %>%
group_by(my_dates) %>%
summarise(n_row = n()) %>%
ungroup() %>%
mutate(my_dates_x = as.numeric(my_dates)) %>%
mutate(my_dates_label = paste(month(my_dates,label = T), year(my_dates))) %>%
{ggplot(.,aes(x = my_dates_x, y = n_row))+
geom_col(color = 'black',width = 0.8, fill = 'slateblue', alpha = .5) +
scale_x_continuous(breaks = .$my_dates_x, labels = .$my_dates_label) +
theme(axis.text.x = element_text(angle = 60, hjust = 1))}
You can convert it to a factor variable to use as the axis, and fix the formatting with a label argument to scale_x_discrete.
library(dplyr)
library(ggplot2)
my_df %>%
mutate(my_dates = factor(round_date(my_dates, 'month'))) %>%
group_by(my_dates) %>%
summarise(n_row = n()) %>%
ggplot(aes(x = my_dates, y = n_row))+
geom_bar(stat = 'identity', color = 'black',fill = 'slateblue', alpha = .5)+
scale_x_discrete(labels = function(x) format(as.Date(x), '%Y-%b'))+
theme(axis.text.x = element_text(angle = 60, hjust = 1))
Edit: Alternate method to account for possibly missing months which should be represented as blank spaces in the plot.
library(dplyr)
library(ggplot2)
library(lubridate)
to_plot <-
my_df %>%
mutate(my_dates = round_date(my_dates, 'month'),
my_dates_ticks = interval(min(my_dates), my_dates) %/% months(1))
to_plot %>%
group_by(my_dates_ticks) %>%
summarise(n_row = n()) %>%
ggplot(aes(x = my_dates_ticks, y = n_row))+
geom_bar(stat = 'identity', color = 'black',fill = 'slateblue', alpha = .5)+
scale_x_continuous(
breaks = unique(to_plot$my_dates_ticks),
labels = function(x) format(min(to_plot$my_dates) + months(x), '%y-%b'))+
theme(axis.text.x = element_text(angle = 60, hjust = 1))

How can I adjust the axis-x dynamically only if the values overflow a preset value or the chart in R?

I tried to use view_follow() with no success... I want to display the chart with a smaller scale and change the axis-x according the data...
How can I adjust the axis-x dynamically only
if the values overflow a preset value or the chart?
library(tidyverse)
library(gganimate)
library(gapminder)
theme_set(theme_classic())
gdp <- read.csv("https://raw.github.com/datasets/gdp/master/data/gdp.csv")
words <- scan(
text="world income only total dividend asia euro america africa oecd",
what= character())
pattern <- paste0("(",words,")",collapse="|")
gdp <- subset(gdp, !grepl(pattern, Country.Name , ignore.case = TRUE))
colnames(gdp) <- gsub("Country.Name", "country", colnames(gdp))
colnames(gdp) <- gsub("Country.Code", "code", colnames(gdp))
colnames(gdp) <- gsub("Value", "value", colnames(gdp))
colnames(gdp) <- gsub("Year", "year", colnames(gdp))
gdp$value <- round(gdp$value/1e9)
gap <- gdp %>%
group_by(year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(rank = min_rank(-value) * 1,
Value_rel = value/value[rank==1],
Value_lbl = paste0(" ",value)) %>%
filter(rank <=10) %>%
ungroup()
p <- ggplot(gap, aes(rank, group = country, fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = value/2, height = value, width = 0.9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(country, " ")), vjust = 0.2, hjust = 1) +
geom_text(aes(y=value,label = Value_lbl, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
labs(title='{closest_state}', x = "", y = "GDP in billion USD",
caption = "Sources: World Bank | Plot generated by Nitish K. Mishra #nitishimtech") +
theme(plot.title = element_text(hjust = 0, size = 32),
axis.ticks.y = element_blank(), # These relate to the axes post-flip
axis.text.y = element_blank(), # These relate to the axes post-flip
plot.margin = margin(1,1,1,4, "cm")) +
transition_states(year, transition_length = 4, state_length = 1) +
ease_aes('cubic-in-out')
animate(p, 200, fps = 10, duration = 10, width = 600, height = 400, renderer = gifski_renderer("plot.gif"))

Resources