Sample comparison with ggplot2 - r

In this example, I want compare population in USA versus China, over time.
library(tidyverse)
population %>%
filter(country %in% c("United States of America", "China")) %>%
pivot_wider(names_from = country, values_from = population) %>%
rename(USA = `United States of America`) %>%
ggplot(aes(x = China, y = USA)) + geom_point(aes(col = year))
Is there a better way to do this (without using pivot_wider)? Or is this the right way?
Thanks in advance

Not sure if this is still what you want, but I'd expect th comparison between two population sizes over time to look something like this:
library(ggplot2)
library(dplyr)
population %>%
filter(country %in% c("United States of America", "China")) %>%
ggplot(aes(x = year, y = population, color=country)) +
geom_line() +
expand_limits(y=0)
Returns:

It is the right way if you want to show how the population of USA varies as a function of the population of China... but what's the sense of it? Probably what you want is to compare how the two population change as a function of time, then I think you'd be better off with a bar chart:
library(tidyverse)
population %>%
filter(country %in% c("United States of America", "China")) %>%
ggplot(aes(x = year, y=population, fill=country)) +
geom_col(position="dodge")
Created on 2021-09-28 by the reprex package (v2.0.1)

Related

Plot based on descending value of a variable

I want to create a plot that shows the relationship between countries (categorical), their government type (4 categories, including NA), and the proportion of covid deaths to population. I want to show the 30 countries with the highest death proportion and if there is a relationship with the government type.
Right now the countries are plotted in alphabetical order, but I would like to plot the death proportion in descending order. I can't seem to figure out how to do this. Thanks!
library(tidyverse)
library(lubridate)
library(readr)
Governmental System, Country, Proportion of Deaths to Population
covid_data <- read_csv(here::here("data/covid_data.csv"))
covid_data <- covid_data %>%
mutate(death_proportion = total_deaths / population)
covid_data[with(covid_data, order(-death_proportion)), ] %>%
head(30) %>%
ggplot(aes(x = death_proportion,
y = country,
color = government)) +
geom_point()
I think you just need to use forcats::fct_reorder to set the order of you countries by the plotting variable.
Check this example:
library(tidyverse)
mtcars %>%
rownames_to_column(var = "car_name") %>%
mutate(car_name = fct_reorder(car_name, desc(mpg))) %>%
ggplot(aes(x = mpg,
y = car_name,
color = factor(cyl))) +
geom_point()
Created on 2021-03-16 by the reprex package (v1.0.0)

Making the X_axis more visible?

This is the code I used, the goal is to visualize the evolution of covid in north africa
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
This is the dataset I used
as you can see the X_axis is day-to-day therefore it's a bit condensed dataset
And this is the plot
you can't see anything in the X_axis, I want to be able to discern the dates maybe use weeks or months to scale instead of days plot.
r
I converted string columns to date type as the comments suggested and it all worked out
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
mutate(date = as.Date(date))%>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
this is the plot after modification.

Plotting in ggplot using cumsum

I am trying to use ggplot2 to plot a date column vs. a numeric column.
I have a dataframe that I am trying to manipulate with country as either china or not china, and successfully created the dataframe linked below with:
is_china <- confirmed_cases_worldwide %>%
filter(country == "China", type=='confirmed') %>%
group_by(country) %>%
mutate(cumu_cases = cumsum(cases))
is_not_china <- confirmed_cases_worldwide %>%
filter(country != "China", type=='confirmed') %>%
mutate(cumu_cases = cumsum(cases))
is_not_china$country <- "Not China"
china_vs_world <- rbind(is_china,is_not_china)
Now essentially I am trying to plot a line graph with cumu_cases and date between "china" and "not china"
I am trying to execute this code:
plt_china_vs_world <- ggplot(china_vs_world) +
geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) +
ylab("Cumulative confirmed cases")
Now I keep getting a graph looking like this:
Don't understand why this is happening, been trying to convert data types and other methods.
Any help is appreciated, I linked both csv below
https://github.com/king-sules/Covid
The 'date' for other 'country' are repeated because the 'country' is now changed to 'Not China'. It would be either changed in the OP's 'is_not_china' step or do this in 'china_vs_world'
library(ggplot2)
library(dplyr)
china_vs_world %>%
group_by(country, date) %>%
summarise(cumu_cases = sum(cases)) %>%
ungroup %>%
mutate(cumu_cases = cumsum(cumu_cases)) %>%
ggplot() +
geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) +
ylab("Cumulative confirmed cases")
-output
NOTE: It is the scale that shows the China numbers to be small.
As #Edward mentioned a log scale would make it more easier to understand
china_vs_world %>%
group_by(country, date) %>%
summarise(cumu_cases = sum(cases)) %>%
ungroup %>%
mutate(cumu_cases = cumsum(cumu_cases)) %>%
ggplot() +
geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) +
ylab("Cumulative confirmed cases") +
scale_y_continuous(trans='log')
Or with a facet_wrap
china_vs_world %>%
group_by(country, date) %>%
summarise(cumu_cases = sum(cases)) %>%
ungroup %>%
mutate(cumu_cases = cumsum(cumu_cases)) %>%
ggplot() +
geom_line(aes(x=date,y=cumu_cases,group=country,color=country)) +
ylab("Cumulative confirmed cases") +
facet_wrap(~ country, scales = 'free_y')
data
china_vs_world <- read.csv("https://raw.githubusercontent.com/king-sules/Covid/master/china_vs_world.csv", stringsAsFactors = FALSE)
china_vs_world$date <- as.Date(china_vs_world$date)

How to label only once when plotting multiple longitudinal trajectories in R?

I have done a plot with multiple trajectories like the one in the image https://i0.wp.com/svbtleusercontent.com/xcexi7wk8xsj1w_small.png?w=456&ssl=1
Let's use it as a reproducible example:
library(ourworldindata)
id <- financing_healthcare %>%
filter(continent %in% c("Oceania", "Europe") & between(year, 2001, 2005)) %>%
select(continent, country, year, health_exp_total) %>%
na.omit()
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent)) +
geom_line()
If I want to add the labels of the countries in the plot I make
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent, label= country)) +
geom_line()+geom_text()
But thus, the labels appear repeated for each year and overlapped with others. Is it possible that each label appear for only a year and avoiding overlapping?
Thanks a lot!
#devtools::install_github('drsimonj/ourworldindata')
library(ourworldindata)
library(dplyr)
library(ggplot2)
library(ggrepel)
id <- financing_healthcare %>%
filter(continent %in% c("Oceania", "Europe") & between(year, 2001, 2005)) %>%
select(continent, country, year, health_exp_total) %>%
na.omit()
idl = id %>% filter(year == 2005)
ggplot(id, aes(x = year, y = health_exp_total, group = country, color = continent)) +
geom_line() +
geom_text_repel(data=idl, aes(label=country), size=2.5)
enter image description here

Countries moving around when using gganimate

I'm animating a map with percentage of deaths in Africa caused by HIV/AIDS. For some years the animation works well, but for other years the countries are sort of jumping around. The data can be found here. My code is shown below
library(sf)
library(rworldmap)
library(transformr)
library(gganimate)
library(tidyverse)
mortality <– read_csv("path_to_file")
africa_map <- getMap(resolution = "low") %>% st_as_sf() %>%
filter(continent == "Africa")
mortality %>% filter(region == "Africa", disease == "HIV/AIDS") %>%
mutate(year = as.integer(year(year))) %>% drop_na() %>%
left_join(africa_map, by = c("country_code" = "SOV_A3")) %>%
ggplot() + geom_sf(aes(fill = percent)) +
transition_time(year) +
labs(title = "Year: {frame_time}")
Any idea how to fix this?

Resources