R dplyr group, ungroup, top_n and ggplot - r

I have an object with several values including cities, states, year and number of murders. I use dplyr to group it by city and calculate the total murders over all years for the top 10 cities like this:
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_histogram(stat = "identity") +
facet_wrap(~city)
I would like to plot this for only the top ten cities, but 'x = year' is not found because it has been grouped by city. Can anyone explain how I can accomplish this?
EDIT: this the original source data https://interactive.guim.co.uk/2017/feb/09/gva-data/UCR-1985-2015.csv
And here is my code:
Deaths <- read.csv("UCR-1985-2015.csv", stringsAsFactors = F)
MurderRate <- Deaths[, -c(5:35)]
MurderNb <- Deaths[, -c(36:66)]
colnames(MurderNb) <- gsub("X", "", colnames(MurderNb))
colnames(MurderNb) <- gsub("_raw_murder_num", "", colnames(MurderNb))
MurderNb_reshaped <- melt(MurderNb, id = c("city", "Agency", "state", "state_short"))
colnames(MurderNb_reshaped) <- c("city", "Agency", "state", "state_short", "Year", "Murders")
MurderNb_reshaped2 <- MurderNb_reshaped
MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total) %>%
ggplot(aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)

Ok there were a couple minor issue. This should do the trick:
#this gives you the top cities
topCities <- MurderNb_reshaped2 %>%
select(city, state, Year, Murders) %>%
group_by(city) %>%
summarise(total = sum(Murders)) %>%
top_n(10, total)
#you then need to filter your original data to be only the data for the top cities
MurderNb_reshaped2 <- filter(MurderNb_reshaped2, city %in% topCities$city)
ggplot(data = MurderNb_reshaped2, aes(x = Year, y = Murders, fill = "red")) +
geom_bar(stat = "identity") +
facet_wrap(~city)

Related

Remove some of the X axis labels in ggplot bar chart

I have the following code for a stacked bar graph for the period 1970-2020, which is the X-axis label.
The graph is generated from a dataframe with 3 columns and 51 rows representing various years.
Year Active New
1970 1 1
......
2020 268 60
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
As my X-axis is not readable, I would like to remove most of the Years in the X-axis, and keep eg. 1970, 1980, 1990, 2000, 2020 - at their corresponding positions. I'm not sure that scale_x_discrete can do this?
The obvious answer in this specific case is to convert Year to a numeric variable, which will make the breaks pretty by default.
This sample data allows us to run your code and reproduce your issue:
set.seed(1)
df <- data.frame(Year = factor(1970:2020),
Active = cumsum(rnorm(51, 4, 2)),
New = cumsum(rnorm(51, 1, 1)))
Using your exact plotting code produces a similar plot with unreadable axis labels:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
But if we simply convert Year to numeric values, we get the same plot with pretty breaks:
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = as.numeric(as.character(Year)), y = Count,
fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
xlab("Year") +
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New"))
If for whatever reason it needs to be a factor, you can leave specific years blank using a labeling function.
df %>%
mutate(Old = Active - New) %>%
select(-Active) %>%
pivot_longer(-Year, names_to = "Type", values_to = "Count") %>%
ggplot() +
geom_col(aes(x = Year, y = Count, fill = forcats::fct_rev(Type)))+
ggtitle("example graph")+
scale_fill_discrete(name="Cases",
breaks=c("Old", "New"),
labels=c("Ongoing", "New")) +
scale_x_discrete(labels = function(x) ifelse(as.numeric(x) %% 10, "", x)) +
theme(axis.ticks.length.x = unit(0, "mm"))
Created on 2022-08-19 with reprex v2.0.2

Gathering the Averages and Combining multiple Line Graphs

I am new to R and I would love some assistance on this. I am using this dataset: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv
I am trying to first find the Average of the following Cheeses: Cheddar, American, Mozzarella, Italian, Swiss, Muenster, and Blue. Then I would like to place them into a line graph but show them all at once. I would like to show the average consumption of these cheeses.
The following is my code and what I have so far. I am new at this so this might like horrible to some.
line_3 <- clean_cheese %>%
select(c(Year, Cheddar, Mozzarella, `American Other`, `Italian other`, Swiss, Muenster, Blue)) %>%
group_by(Year) %>%
summarise(avg_cheddar_cheese = mean(Cheddar), avg_mozz_cheese = mean(Mozzarella), avg_american_other = mean(`American Other`), avg_italin_other = mean(`Italian other`), avg_swiss_cheese = mean(Swiss), avg_muenster = mean(Muenster), avg_blue = mean(Blue)) %>%
pivot_longer(-c(Year)) +
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
ggplotly(line_3)
You can try :
library(tidyverse)
clean_cheese <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv')
line_3 <- clean_cheese %>%
group_by(Year) %>%
summarise(across(Cheddar:Blue, mean)) %>%
pivot_longer(cols = -Year) %>%
ggplot(aes(x = Year, y = value, color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y')
line_3

displaying data as a line in charts

df <- read.csv('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv')
df$countryName = as.character(df$countryName)
I processed the dataset.
Can we show the patient and population charts of the continents as separate line charts on the same chart?
as output;
''date region confirmed
''2020/01/03 europa 850
The data in the output I created are examples. The data in the example are not real.
Here's an approach with dplyr, tidyr and ggplot:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
group_by(region, day) %>%
dplyr::summarize(confirmed = sum(confirmed),
recovered = sum(recovered),
death = sum(death)) %>%
pivot_longer(cols = c("confirmed","recovered","death"), names_to = "condition") %>%
ggplot(aes(x= as.Date(day), y = value, group = region, color = region)) +
geom_line() +
facet_grid(rows = vars(condition), scales = "free_y") +
labs(x = "Date", y = "Number of Individuals")

How to label only once when plotting multiple longitudinal trajectories in R?

I have done a plot with multiple trajectories like the one in the image https://i0.wp.com/svbtleusercontent.com/xcexi7wk8xsj1w_small.png?w=456&ssl=1
Let's use it as a reproducible example:
library(ourworldindata)
id <- financing_healthcare %>%
filter(continent %in% c("Oceania", "Europe") & between(year, 2001, 2005)) %>%
select(continent, country, year, health_exp_total) %>%
na.omit()
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent)) +
geom_line()
If I want to add the labels of the countries in the plot I make
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent, label= country)) +
geom_line()+geom_text()
But thus, the labels appear repeated for each year and overlapped with others. Is it possible that each label appear for only a year and avoiding overlapping?
Thanks a lot!
#devtools::install_github('drsimonj/ourworldindata')
library(ourworldindata)
library(dplyr)
library(ggplot2)
library(ggrepel)
id <- financing_healthcare %>%
filter(continent %in% c("Oceania", "Europe") & between(year, 2001, 2005)) %>%
select(continent, country, year, health_exp_total) %>%
na.omit()
idl = id %>% filter(year == 2005)
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent)) +
geom_line() +
geom_text_repel(data=idl, aes(label=country), size=2.5)
enter image description here

how to make a cumulative layer plot in ggplot2

I've got a data similar to example below:
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance)
Now I need to build a plot with stacked sums of distance in each day, where subsequent layers would refer to different carriers. I mean something similar to
ggplot(diamonds, aes(x = price, fill = cut)) + geom_area(stat = "bin")
but with sum as a stat.
I have tried with
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'), format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier, group = carrier), stat = 'sum')
but it didn't do a trick, resulting in
Error in f(...) : Aesthetics can not vary with a ribbon
It's pretty easy with geom_bar, but any ideas how to make a stacked geom_area plot?
library(dplyr)
nycflights13::flights %>%
mutate(date = as.Date(paste(day, month, year, sep = '-'),
format = '%d-%m-%Y')) %>%
select(date, carrier, distance) %>%
group_by(date, carrier) %>%
summarise(distance = sum(distance)) %>%
ggplot() +
geom_area(aes(date, distance, fill = carrier,
group = carrier), stat = 'identity')
should do the trick.

Resources