Edited: As suggested by #Ben I have changed the code but getting an error.
I need to bring it in to format like:
Date Confirmed_cum
25/01/2020 4
26/01/2020 4
Can anyone help?
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(dplyr)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')
console output
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) + geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date', title = 'Number of new confirmed cases
daily throughout Australia')
`summarise()` ungrouping output (override with `.groups` argument)
Error in FUN(X[[i]], ...) : object 'confirmed' not found
It sounds like you want to calculate the sum of confirmed_cum for each dat and then plot that. Without your data, it is hard to know for sure this will work, but here is something that might work. It requires the lubridate and dplyr packages.
library(lubridate)
library(dplyr)
covid %>%
mutate(date = dmy(date)) # makes dates both pretty and functional
group_by(date) %>% # groups data by each date
summarize(confirmed_cum = sum(confirmed_cum)) # sum this column by date
This code returns a new data.frame with one row per date and the total of confirmed_cum for that date. To plot it with ggplot:
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed_cum)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')
Related
I have a simple two-column time-series dataset that looks like this:
Date Signups
22-Feb-18 601
23-Feb-18 500
24-Feb-18 6000
...
27-Apr-22 999
28-Apr-22 998
29-Apr-22 123
30-Apr-22 321
And I'm trying to make a simple line chart that shows the monthly total over time and then a point at the most recent month. But the filter within the geom_point is giving me a hard time. Here's what I have:
library(tidyverse)
library(scales)
library(lubridate)
signups %>%
mutate(Date = dmy(Date)) %>%
group_by(month(Date), year(Date)) %>%
mutate(month = paste0(month(Date),"-",year(Date))) %>%
mutate(month = my(month)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup() %>%
dplyr::filter(month >= "2018-03-01") %>%
ggplot(aes(month, monthly_total)) +
geom_line() +
geom_point(data = signups %>% dplyr::filter(month == "2022-03-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
If I comment out the geom_point it gives me the line chart that I'm looking for. But when the geom_point is included here it throws this error:
Error in dplyr::filter(., month == "2022-03-01") :
Caused by error in `month == "2022-03-01"`:
! comparison (1) is possible only for atomic and list types
I've tried using subset instead of filter and it didn't help. Let me know if you have any suggestions. Thanks!
The comment from Limey got us there. Here's what I needed to do:
signups <- signups %>%
mutate(Date = dmy(Date)) %>%
mutate(just_month = paste0(month(Date),"-",year(Date))) %>%
mutate(just_month = my(just_month)) %>%
group_by(month(Date), year(Date)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup()
signups %>%
dplyr::filter(just_month >= "2018-03-01") %>%
ggplot(aes(just_month, monthly_total)) +
geom_line(aes(just_month, monthly_total)) +
geom_point(data = dplyr::filter(signups, just_month == "2022-04-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
I'm am looking at an R Tidy Tuesday dataset (European Energy) . I have wrangled the Imports and Exports as proportions and am looking to arrange the ggplot with an ascend on the Imports values. Just looking to make it look tidy, but can't seem to control the order to see each subsequent country with the next biggest import value.
I have left a couple of attempts in the code but commented out. Thnx in advance.
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl) %>%
ungroup() %>%
mutate(type_hold = type) %>%
pivot_wider(names_from = type_hold, values_from = `2018`) %>%
# ggplot(aes(country_name, country_type_pct, fill = type)) +
# ggplot(aes(reorder(country_name, Imports), country_type_pct, fill = type)) +
ggplot(aes(fct_reorder(country_name, Imports), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()
This could be achieved by adding a column with the value by which you want to reorder, i.e. the percentage share of imports in 2018 using e.g. imports_2018 = country_type_pct[type == "Imports"]. Then reorder the counters according to this column:
`
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl,
imports_2018 = country_type_pct[type == "Imports"]) %>%
ungroup() %>%
mutate(type_hold = type) %>%
ggplot(aes(fct_reorder(country_name, imports_2018), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()
#> Warning: Removed 2 rows containing missing values (position_stack).
Although my query shows me values in descending order, ggplot then displays them alphabetically instead of ascending order.
Known solutions to this problem haven't seem to work. They suggest using Reorder or factor for values, which didn't work in this case
This is my code:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y = movies_made, fill = studio, label = as.character(movies_made))) +
geom_bar(stat = 'identity') +
geom_label(label.size = 1, size = 5, color = "white") +
theme(legend.position = "none") +
ylab("Movies Made") +
xlab("Studio")
for those wanting a more complete example, here's where I got:
library(dplyr)
library(ggplot2)
# get some dummy data
boxoffice = boxoffice::boxoffice(dates=as.Date("2017-1-1"))
df <- (
boxoffice %>%
group_by(distributor) %>%
summarise(movies_made = n()) %>%
mutate(studio=reorder(distributor, -movies_made)) %>%
top_n(10))
ggplot(df, aes(x=distributor, y=movies_made)) + geom_col()
You'll need to convert boxoffice$studio to an ordered factor. ggplot will then respect the order of rows in the data set, rather than alphabetizing. Your dplyr chain will look like this:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
ungroup() %>% # ungroup
mutate(studio = factor(studio, studio, ordered = T)) %>% # convert variable
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y... (rest of plotting code)
I've got a data similar to example below:
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance)
Now I need to build a plot with stacked sums of distance in each day, where subsequent layers would refer to different carriers. I mean something similar to
ggplot(diamonds, aes(x = price, fill = cut)) + geom_area(stat = "bin")
but with sum as a stat.
I have tried with
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier, group = carrier), stat = 'sum')
but it didn't do a trick, resulting in
Error in f(...) : Aesthetics can not vary with a ribbon
It's pretty easy with geom_bar, but any ideas how to make a stacked geom_area plot?
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'),
format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
group_by(date, carrier) %>%
summarise(distance = sum(distance)) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier,
group = carrier), stat = 'identity')
should do the trick.
I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.
this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()
I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.