Making the X_axis more visible? - r

This is the code I used, the goal is to visualize the evolution of covid in north africa
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
This is the dataset I used
as you can see the X_axis is day-to-day therefore it's a bit condensed dataset
And this is the plot
you can't see anything in the X_axis, I want to be able to discern the dates maybe use weeks or months to scale instead of days plot.
r

I converted string columns to date type as the comments suggested and it all worked out
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
mutate(date = as.Date(date))%>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
this is the plot after modification.

Related

How to input data in excel/csv to make multiple chart in R studio

I have a data here, my data.
I would like to make graph like this example multichart.
I have tried to run this script below.
However, I dont understand how to input my data in excel to run this script.
Does anyone to help me? Please, I have thought about this 3 days and The deadline is very soon. Thank you for your help
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
library(hrbrthemes)
library(viridis)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
ylab("Number of babies born")
You may read the data using readxl::read_excel, get it in long format and plot using ggplot.
library(tidyverse)
data <- readxl::read_excel('example data.xlsx')
data %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, values_drop_na = TRUE) %>%
ggplot() + aes(row, value, color = name) +
geom_line()

Plot based on descending value of a variable

I want to create a plot that shows the relationship between countries (categorical), their government type (4 categories, including NA), and the proportion of covid deaths to population. I want to show the 30 countries with the highest death proportion and if there is a relationship with the government type.
Right now the countries are plotted in alphabetical order, but I would like to plot the death proportion in descending order. I can't seem to figure out how to do this. Thanks!
library(tidyverse)
library(lubridate)
library(readr)
Governmental System, Country, Proportion of Deaths to Population
covid_data <- read_csv(here::here("data/covid_data.csv"))
covid_data <- covid_data %>%
mutate(death_proportion = total_deaths / population)
covid_data[with(covid_data, order(-death_proportion)), ] %>%
head(30) %>%
ggplot(aes(x = death_proportion,
y = country,
color = government)) +
geom_point()
I think you just need to use forcats::fct_reorder to set the order of you countries by the plotting variable.
Check this example:
library(tidyverse)
mtcars %>%
rownames_to_column(var = "car_name") %>%
mutate(car_name = fct_reorder(car_name, desc(mpg))) %>%
ggplot(aes(x = mpg,
y = car_name,
color = factor(cyl))) +
geom_point()
Created on 2021-03-16 by the reprex package (v1.0.0)

ggannimate - basing transition sequence on quarter

library(lubridate)
library(gganimate)
library(dplyr)
library(ggplot2)
data("crime")
#creating test data and getting quarter
TestData <- crime %>%
mutate(Quarter_year = floor_date(time, unit = 'quarter'),
Quarter_year = as.Date(Quarter_year)) %>%
group_by(Quarter_year) %>%
tally()
#Creating a simple bar graph
Graph <- TestData %>%
ggplot(aes(x = Quarter_year, y = n))+
geom_bar(stat = "identity") +
coord_flip()+
theme_minimal()
Animated_Graph <- Graph+
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
animate(Animated_Graph)
Using the great package gganimate I want to set my frame time based on a dates quarter.
However, when I pass through a transition time, the animation creates a frame for each day between quarters, even though they are not in the dataset
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
Is it possible to keep transition using only dates that appear in the dataset?
Thanks.

Getting Cumulative Sum Over Time

I have data with goals scored for each player each season:
playerID <- c(1,2,3,1,2,3,1,2,3,1,2,3)
year <- c(2002,2000,2000,2003,2001,2001,2000,2002,2002,2001,2003,2003)
goals <- c(25,21,27,31,39,34,42,44,46,59,55,53)
my_data <- data.frame(playerID, year, goals)
I would like to plot each player's cumulative number of goals over time:
ggplot(my_data, aes(x=year, y=cumsum_goals, group=playerID)) + geom_line()
I have tried using summarize from dplyr, but this only works if the data is already sorted by year (see player 1):
new_data <- my_data %>%
group_by(playerID) %>%
mutate(cumsum_goals=cumsum(goals))
Is there a way to make this code robust to data where years are not in chronological order?
We can arrange by playerID and year, take cumsum and then plot
library(dplyr)
library(ggplot2)
my_data %>%
arrange(playerID, year) %>%
group_by(playerID) %>%
mutate(cumsum_goals=cumsum(goals)) %>%
ggplot() + aes(x=year, y= cumsum_goals, color = factor(playerID)) + geom_line()

Multi Line Graph in R

I have a .csv file of the average life expectancy by country for the last 50 years. I am trying to create a graph of life expectancy by country, with the years 1960-2011 on the x axis, and the average life expectancy on the y axis. I only want to plot the top ten countries, each with their own line.
I have researched every possible way to plot a multi line graph of the data I have and it seems to me that it is impossible with the way the data is formatted. My questions are:
Is it possible to create the desired graph with this data, given the way it is organized?
If the data has to be restructured, how should that be done? Is there a function in R to better organize data?
I was able to create the desired graph in Excel which is exactly what I'd like to do in R.
Here is a link to the lexp.csv file.
https://drive.google.com/file/d/0BwsBIUlCf0Z3QVgtVGt4ampVcmM/view?usp=sharing
You are correct that the data would benefit from reorganization. This is a "wide to long" problem i.e it would be better to have 3 columns: Country, Year and Age.
You can reformat the data using the tidyr package, process it using the dplyr package and plot using ggplot2. So, assuming that you have read the CSV into R and have a data frame named lexp, you could try something like this:
library(dplyr)
library(tidyr)
library(ggplot2)
lexp %>%
# reformat from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# select most recent year
filter(Year == max(Year)) %>%
# sort by decreasing age
arrange(desc(Age)) %>%
# take the top 10 countries
slice(1:10) %>%
select(Country) %>%
# join back to the original data
inner_join(lexp) %>%
# reformat again from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# and plot the graph
ggplot(aes(Year, Age)) + geom_line(aes(color = Country, group = Country)) +
theme_dark() + theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Life Expectancy") +
scale_color_brewer(palette = "Set3")
Result:
library("reshape2")
library("ggplot2")
test_data_long <- melt(df, id="Country") # convert to long format
testdata<-test_data_long[complete.cases(test_data_long),]
ggplot(data=testdata,
aes(x=variable, y=value)) +
geom_line(aes(color = Country, group = Country))

Resources