how to make a cumulative layer plot in ggplot2 - r

I've got a data similar to example below:
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance)
Now I need to build a plot with stacked sums of distance in each day, where subsequent layers would refer to different carriers. I mean something similar to
ggplot(diamonds, aes(x = price, fill = cut)) + geom_area(stat = "bin")
but with sum as a stat.
I have tried with
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier, group = carrier), stat = 'sum')
but it didn't do a trick, resulting in
Error in f(...) : Aesthetics can not vary with a ribbon
It's pretty easy with geom_bar, but any ideas how to make a stacked geom_area plot?

library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'),
format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
group_by(date, carrier) %>%
summarise(distance = sum(distance)) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier,
group = carrier), stat = 'identity')
should do the trick.

Related

Gathering the Averages and Combining multiple Line Graphs

I am new to R and I would love some assistance on this. I am using this dataset: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv
I am trying to first find the Average of the following Cheeses: Cheddar, American, Mozzarella, Italian, Swiss, Muenster, and Blue. Then I would like to place them into a line graph but show them all at once. I would like to show the average consumption of these cheeses.
The following is my code and what I have so far. I am new at this so this might like horrible to some.
line_3 <- clean_cheese %>%
select(c(Year, Cheddar, Mozzarella, `American Other`, `Italian other`, Swiss, Muenster, Blue)) %>%
group_by(Year) %>%
summarise(avg_cheddar_cheese = mean(Cheddar), avg_mozz_cheese = mean(Mozzarella), avg_american_other = mean(`American Other`), avg_italin_other = mean(`Italian other`), avg_swiss_cheese = mean(Swiss), avg_muenster = mean(Muenster), avg_blue = mean(Blue)) %>%
pivot_longer(-c(Year)) +
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
ggplotly(line_3)
You can try :
library(tidyverse)
clean_cheese <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv')
line_3 <- clean_cheese %>%
group_by(Year) %>%
summarise(across(Cheddar:Blue, mean)) %>%
pivot_longer(cols = -Year) %>%
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
line_3

Summing/grouping unique rows in a table

Edited: As suggested by #Ben I have changed the code but getting an error.
I need to bring it in to format like:
Date Confirmed_cum
25/01/2020 4
26/01/2020 4
Can anyone help?
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(dplyr)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')
console output
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) + geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date', title = 'Number of new confirmed cases
daily throughout Australia')
`summarise()` ungrouping output (override with `.groups` argument)
Error in FUN(X[[i]], ...) : object 'confirmed' not found
It sounds like you want to calculate the sum of confirmed_cum for each dat and then plot that. Without your data, it is hard to know for sure this will work, but here is something that might work. It requires the lubridate and dplyr packages.
library(lubridate)
library(dplyr)
covid %>%
mutate(date = dmy(date)) # makes dates both pretty and functional
group_by(date) %>% # groups data by each date
summarize(confirmed_cum = sum(confirmed_cum)) # sum this column by date
This code returns a new data.frame with one row per date and the total of confirmed_cum for that date. To plot it with ggplot:
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed_cum)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')

GGplot order legend using last values on x-axis

I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)

Compare year to year revenue

I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.
this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()
I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.

R dplyr group, ungroup, top_n and ggplot

I have an object with several values including cities, states, year and number of murders. I use dplyr to group it by city and calculate the total murders over all years for the top 10 cities like this:
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_histogram(stat = "identity") +
facet_wrap(~city)
I would like to plot this for only the top ten cities, but 'x = year' is not found because it has been grouped by city. Can anyone explain how I can accomplish this?
EDIT: this the original source data https://interactive.guim.co.uk/2017/feb/09/gva-data/UCR-1985-2015.csv
And here is my code:
Deaths <- read.csv("UCR-1985-2015.csv", stringsAsFactors = F)
MurderRate <- Deaths[, -c(5:35)]
MurderNb <- Deaths[, -c(36:66)]
colnames(MurderNb) <- gsub("X", "", colnames(MurderNb))
colnames(MurderNb) <- gsub("_raw_murder_num", "", colnames(MurderNb))
MurderNb_reshaped <- melt(MurderNb, id = c("city", "Agency", "state", "state_short"))
colnames(MurderNb_reshaped) <- c("city", "Agency", "state", "state_short", "Year", "Murders")
MurderNb_reshaped2 <- MurderNb_reshaped
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)
Ok there were a couple minor issue. This should do the trick:
#this gives you the top cities
topCities <- MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total)
#you then need to filter your original data to be only the data for the top cities
MurderNb_reshaped2 <- filter(MurderNb_reshaped2, city %in% topCities$city)
ggplot(data = MurderNb_reshaped2, aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)

Resources