I have a .csv file of the average life expectancy by country for the last 50 years. I am trying to create a graph of life expectancy by country, with the years 1960-2011 on the x axis, and the average life expectancy on the y axis. I only want to plot the top ten countries, each with their own line.
I have researched every possible way to plot a multi line graph of the data I have and it seems to me that it is impossible with the way the data is formatted. My questions are:
Is it possible to create the desired graph with this data, given the way it is organized?
If the data has to be restructured, how should that be done? Is there a function in R to better organize data?
I was able to create the desired graph in Excel which is exactly what I'd like to do in R.
Here is a link to the lexp.csv file.
https://drive.google.com/file/d/0BwsBIUlCf0Z3QVgtVGt4ampVcmM/view?usp=sharing
You are correct that the data would benefit from reorganization. This is a "wide to long" problem i.e it would be better to have 3 columns: Country, Year and Age.
You can reformat the data using the tidyr package, process it using the dplyr package and plot using ggplot2. So, assuming that you have read the CSV into R and have a data frame named lexp, you could try something like this:
library(dplyr)
library(tidyr)
library(ggplot2)
lexp %>%
# reformat from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# select most recent year
filter(Year == max(Year)) %>%
# sort by decreasing age
arrange(desc(Age)) %>%
# take the top 10 countries
slice(1:10) %>%
select(Country) %>%
# join back to the original data
inner_join(lexp) %>%
# reformat again from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# and plot the graph
ggplot(aes(Year, Age)) + geom_line(aes(color = Country, group = Country)) +
theme_dark() + theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Life Expectancy") +
scale_color_brewer(palette = "Set3")
Result:
library("reshape2")
library("ggplot2")
test_data_long <- melt(df, id="Country") # convert to long format
testdata<-test_data_long[complete.cases(test_data_long),]
ggplot(data=testdata,
aes(x=variable, y=value)) +
geom_line(aes(color = Country, group = Country))
Related
I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)
I have a data here, my data.
I would like to make graph like this example multichart.
I have tried to run this script below.
However, I dont understand how to input my data in excel to run this script.
Does anyone to help me? Please, I have thought about this 3 days and The deadline is very soon. Thank you for your help
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
library(hrbrthemes)
library(viridis)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
ylab("Number of babies born")
You may read the data using readxl::read_excel, get it in long format and plot using ggplot.
library(tidyverse)
data <- readxl::read_excel('example data.xlsx')
data %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, values_drop_na = TRUE) %>%
ggplot() + aes(row, value, color = name) +
geom_line()
This is the code I used, the goal is to visualize the evolution of covid in north africa
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
This is the dataset I used
as you can see the X_axis is day-to-day therefore it's a bit condensed dataset
And this is the plot
you can't see anything in the X_axis, I want to be able to discern the dates maybe use weeks or months to scale instead of days plot.
r
I converted string columns to date type as the comments suggested and it all worked out
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
mutate(date = as.Date(date))%>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
this is the plot after modification.
library(lubridate)
library(gganimate)
library(dplyr)
library(ggplot2)
data("crime")
#creating test data and getting quarter
TestData <- crime %>%
mutate(Quarter_year = floor_date(time, unit = 'quarter'),
Quarter_year = as.Date(Quarter_year)) %>%
group_by(Quarter_year) %>%
tally()
#Creating a simple bar graph
Graph <- TestData %>%
ggplot(aes(x = Quarter_year, y = n))+
geom_bar(stat = "identity") +
coord_flip()+
theme_minimal()
Animated_Graph <- Graph+
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
animate(Animated_Graph)
Using the great package gganimate I want to set my frame time based on a dates quarter.
However, when I pass through a transition time, the animation creates a frame for each day between quarters, even though they are not in the dataset
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
Is it possible to keep transition using only dates that appear in the dataset?
Thanks.
I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))