Summing/grouping unique rows in a table

Summing/grouping unique rows in a table - r

Edited: As suggested by #Ben I have changed the code but getting an error.
I need to bring it in to format like:
Date Confirmed_cum
25/01/2020 4
26/01/2020 4
Can anyone help?
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(dplyr)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')
console output
covid <- read.csv(file = 'covid_au_state.csv')
dput(covid)
library(lubridate)
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) + geom_point(aes(color = confirmed)) +
labs(x = 'Confirmed cases', y = 'date', title = 'Number of new confirmed cases
daily throughout Australia')
`summarise()` ungrouping output (override with `.groups` argument)
Error in FUN(X[[i]], ...) : object 'confirmed' not found

It sounds like you want to calculate the sum of confirmed_cum for each dat and then plot that. Without your data, it is hard to know for sure this will work, but here is something that might work. It requires the lubridate and dplyr packages.
library(lubridate)
library(dplyr)
covid %>%
mutate(date = dmy(date)) # makes dates both pretty and functional
group_by(date) %>% # groups data by each date
summarize(confirmed_cum = sum(confirmed_cum)) # sum this column by date
This code returns a new data.frame with one row per date and the total of confirmed_cum for that date. To plot it with ggplot:
library(ggplot2)
covid %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarize(confirmed_cum = sum(confirmed_cum)) %>%
ggplot(aes(x =confirmed_cum , y = date)) +
geom_point(aes(color = confirmed_cum)) +
labs(x = 'Confirmed cases', y = 'date',
title = 'Number of new confirmed cases daily throughout Australia')

Related

Issue with filter inside of geom in ggplot. "comparison (1) is possible only for atomic and list types"

I have a simple two-column time-series dataset that looks like this:
Date Signups
22-Feb-18 601
23-Feb-18 500
24-Feb-18 6000
...
27-Apr-22 999
28-Apr-22 998
29-Apr-22 123
30-Apr-22 321
And I'm trying to make a simple line chart that shows the monthly total over time and then a point at the most recent month. But the filter within the geom_point is giving me a hard time. Here's what I have:
library(tidyverse)
library(scales)
library(lubridate)
signups %>%
mutate(Date = dmy(Date)) %>%
group_by(month(Date), year(Date)) %>%
mutate(month = paste0(month(Date),"-",year(Date))) %>%
mutate(month = my(month)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup() %>%
dplyr::filter(month >= "2018-03-01") %>%
ggplot(aes(month, monthly_total)) +
geom_line() +
geom_point(data = signups %>% dplyr::filter(month == "2022-03-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)
If I comment out the geom_point it gives me the line chart that I'm looking for. But when the geom_point is included here it throws this error:
Error in dplyr::filter(., month == "2022-03-01") :
Caused by error in `month == "2022-03-01"`:
! comparison (1) is possible only for atomic and list types
I've tried using subset instead of filter and it didn't help. Let me know if you have any suggestions. Thanks!

The comment from Limey got us there. Here's what I needed to do:
signups <- signups %>%
mutate(Date = dmy(Date)) %>%
mutate(just_month = paste0(month(Date),"-",year(Date))) %>%
mutate(just_month = my(just_month)) %>%
group_by(month(Date), year(Date)) %>%
mutate(monthly_total = sum(signups)) %>%
ungroup()
signups %>%
dplyr::filter(just_month >= "2018-03-01") %>%
ggplot(aes(just_month, monthly_total)) +
geom_line(aes(just_month, monthly_total)) +
geom_point(data = dplyr::filter(signups, just_month == "2022-04-01")) +
expand_limits(y = 0, x = as.Date(c("2018-03-01", "2024-03-01"))) +
scale_y_continuous(labels = comma)

How to Arrange Stacked geom_bar by Ascending Proportion

I'm am looking at an R Tidy Tuesday dataset (European Energy) . I have wrangled the Imports and Exports as proportions and am looking to arrange the ggplot with an ascend on the Imports values. Just looking to make it look tidy, but can't seem to control the order to see each subsequent country with the next biggest import value.
I have left a couple of attempts in the code but commented out. Thnx in advance.
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl) %>%
ungroup() %>%
mutate(type_hold = type) %>%
pivot_wider(names_from = type_hold, values_from = `2018`) %>%
# ggplot(aes(country_name, country_type_pct, fill = type)) +
# ggplot(aes(reorder(country_name, Imports), country_type_pct, fill = type)) +
ggplot(aes(fct_reorder(country_name, Imports), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()

This could be achieved by adding a column with the value by which you want to reorder, i.e. the percentage share of imports in 2018 using e.g. imports_2018 = country_type_pct[type == "Imports"]. Then reorder the counters according to this column:
`
library(tidyverse)
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-08-04/country_totals.csv')
country_totals %>%
filter(!is.na(country_name)) %>%
filter(type %in% c("Imports","Exports")) %>%
group_by(country_name) %>%
mutate(country_type_ttl = sum(`2018`)) %>%
mutate(country_type_pct = `2018`/country_type_ttl,
imports_2018 = country_type_pct[type == "Imports"]) %>%
ungroup() %>%
mutate(type_hold = type) %>%
ggplot(aes(fct_reorder(country_name, imports_2018), country_type_pct, fill = type)) +
geom_bar(stat = "identity") +
coord_flip()
#> Warning: Removed 2 rows containing missing values (position_stack).

make geom_bar show values in the ascending order

Although my query shows me values in descending order, ggplot then displays them alphabetically instead of ascending order.
Known solutions to this problem haven't seem to work. They suggest using Reorder or factor for values, which didn't work in this case
This is my code:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y = movies_made, fill = studio, label = as.character(movies_made))) +
geom_bar(stat = 'identity') +
geom_label(label.size = 1, size = 5, color = "white") +
theme(legend.position = "none") +
ylab("Movies Made") +
xlab("Studio")

for those wanting a more complete example, here's where I got:
library(dplyr)
library(ggplot2)
# get some dummy data
boxoffice = boxoffice::boxoffice(dates=as.Date("2017-1-1"))
df <- (
boxoffice %>%
group_by(distributor) %>%
summarise(movies_made = n()) %>%
mutate(studio=reorder(distributor, -movies_made)) %>%
top_n(10))
ggplot(df, aes(x=distributor, y=movies_made)) + geom_col()

You'll need to convert boxoffice$studio to an ordered factor. ggplot will then respect the order of rows in the data set, rather than alphabetizing. Your dplyr chain will look like this:
boxoffice %>%
group_by(studio) %>%
summarise(movies_made = n()) %>%
arrange(desc(movies_made)) %>%
ungroup() %>% # ungroup
mutate(studio = factor(studio, studio, ordered = T)) %>% # convert variable
top_n(10) %>%
arrange(desc(movies_made)) %>%
ggplot(aes(x = studio, y... (rest of plotting code)

how to make a cumulative layer plot in ggplot2

I've got a data similar to example below:
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance)
Now I need to build a plot with stacked sums of distance in each day, where subsequent layers would refer to different carriers. I mean something similar to
ggplot(diamonds, aes(x = price, fill = cut)) + geom_area(stat = "bin")
but with sum as a stat.
I have tried with
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier, group = carrier), stat = 'sum')
but it didn't do a trick, resulting in
Error in f(...) : Aesthetics can not vary with a ribbon
It's pretty easy with geom_bar, but any ideas how to make a stacked geom_area plot?

library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'),
format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
group_by(date, carrier) %>%
summarise(distance = sum(distance)) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier,
group = carrier), stat = 'identity')
should do the trick.

Compare year to year revenue

I am trying to create a plot to compare year to year revenue, but I can't get it to work and don't understand why.
Consider my df:
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
group_by(Year, Month) %>%
ggplot(aes(x = Month, y = rev, fill = Year)) +
geom_line()
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I don't really understand why this isn't working. What I want is two lines that go from January to October.

this should work for you:
library(tidyverse)
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by = "month"),
rev = rnorm(22, 150, sd = 20))
df %>%
separate(date, c("Year", "Month", "Date")) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year, group = Year)) +
geom_line()
it was just the grouping which gone wrong due to the type of variables, it might be usefull if you use lubridate for the dates (also a tidyverse package)
library(lubridate)
df %>%
mutate(Year = as.factor(year(date)), Month = month(date)) %>%
filter(Month <= max(Month[Year == "2017"])) %>%
ggplot(aes(x = Month, y = rev, color = Year)) +
geom_line()

I think ggplot2 is confused because it doesn't recognise the format of your Month column, which is a character in this case. Try converting it to numeric:
... +
ggplot(aes(x = as.numeric(Month), y = rev, colour = Year)) +
....
Note that I replace the word fill with colour, which I believe makes more sense for this chart:
Btw, I'm not sure the group_by statement is adding anything. I get the same chart with or without it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Summing/grouping unique rows in a table - r

Related

Issue with filter inside of geom in ggplot. "comparison (1) is possible only for atomic and list types"

How to Arrange Stacked geom_bar by Ascending Proportion

make geom_bar show values in the ascending order

how to make a cumulative layer plot in ggplot2

Compare year to year revenue

Categories

Resources