library(lubridate)
library(gganimate)
library(dplyr)
library(ggplot2)
data("crime")
#creating test data and getting quarter
TestData <- crime %>%
mutate(Quarter_year = floor_date(time, unit = 'quarter'),
Quarter_year = as.Date(Quarter_year)) %>%
group_by(Quarter_year) %>%
tally()
#Creating a simple bar graph
Graph <- TestData %>%
ggplot(aes(x = Quarter_year, y = n))+
geom_bar(stat = "identity") +
coord_flip()+
theme_minimal()
Animated_Graph <- Graph+
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
animate(Animated_Graph)
Using the great package gganimate I want to set my frame time based on a dates quarter.
However, when I pass through a transition time, the animation creates a frame for each day between quarters, even though they are not in the dataset
transition_time(Quarter_year)+
ggtitle("Test: {frame_time}")
Is it possible to keep transition using only dates that appear in the dataset?
Thanks.
Related
everyone!
How can I arrange weekdays, starting on Sunday, in R? I got the weekdays using lubridate's function weekdays(), but the days appears randomly (image attached) and I can't seem to find a way to sort it. I tried the arrange function, but I guess it only works with numeric values. A bar chart looks very weird starting on Friday. This is what the code looks like:
my_dataset <- my_dataset %>%
mutate(weekDay = weekdays(Date))
my_dataset %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = weekDay, y = steps))+
geom_bar(stat = "identity")
Thanks!
I tried the arrange function, but I guess it only works with numeric values.
Your weekDay-vector probably is of the class character. This will be arranged in alphabetical order by ggplot. The solution to this is to convert this character-vector into a factor-class.
There are several ways to get the x-axis in the order you would like to see. All of them mean to convert weekDays into a factor.
In order to come close to your example I have at first created a data frame with weekdays and some data. As those are both created randomly a seed was set to make the code reproducible.
One method is to create the data.frame with summaries and then to define in this DF weekdays as a factor with defined levels.
This can also be done within the ggplot-call when creating the aesthetics.
library(tidyverse)
set.seed(111)
myData <- data.frame(
weekDay = sample(weekdays(Sys.Date() + 0:6), 100, replace = TRUE),
TotalSteps = sample(1000:8000, 100)
)
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) -> DF # new data.frame
# the following defines weekDay as a factor and also sets
# the sequence of factor levels. This sequence is then taken
# by ggplot to construct the x-axis.
DF$weekDay <- factor(DF$weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
))
ggplot(DF, aes(x = weekDay, y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
# the factor can also be defined within the ggplot-call
myData %>%
group_by(weekDay) %>%
summarise(mean_steps = mean(TotalSteps)) %>%
ggplot(aes(x = factor(weekDay, levels = c(
"Sonntag", "Montag",
"Dienstag", "Mittwoch",
"Donnerstag", "Freitag",
"Samstag"
)), y = mean_steps)) +
geom_bar(stat = "identity") +
labs(x="")
I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)
This is the code I used, the goal is to visualize the evolution of covid in north africa
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
This is the dataset I used
as you can see the X_axis is day-to-day therefore it's a bit condensed dataset
And this is the plot
you can't see anything in the X_axis, I want to be able to discern the dates maybe use weeks or months to scale instead of days plot.
r
I converted string columns to date type as the comments suggested and it all worked out
library(readr)
library(ggplot2)
library(dplyr)
covid <- read.csv("owid-covid-data.csv")
covid
covid %>%
filter(location %in% c("Tunisia", "Morocco", "Libya")) %>%
mutate(date = as.Date(date))%>%
ggplot(aes(x = date, y= new_cases,color = location, group = location)) +
geom_line()
this is the plot after modification.
I'm trying to produce an animation for a simulation model, and I want to show how the distribution of results changes as the simulation runs.
I've seen gganimate used for scatter plots but not for boxplots (or ideally violin plots). Here I've provided a reprex.
When I use sim_category (which is a bucket for a certain number of simulation runs) I want the result to be cumulative of all previous runs to show the total distribution.
In this example (and my actual code), cumulative = TRUE does not do this. Why is this?
library(gganimate)
library(animation)
library(ggplot2)
df = as.data.frame(structure(list(ID = c(1,1,2,2,1,1,2,2,1,1,2,2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = c(1,1,1,1,2,2,2,2,3,3,3,3))))
df$ID <- factor(df$ID, levels = (unique(df$ID)))
df$sim_category <- factor(df$sim_category, levels = (unique(df$sim_category)))
ani.options(convert = shQuote('C:/Program Files/ImageMagick-7.0.5-Q16/magick.exe'))
p <- ggplot(df, aes(ID, value, frame= sim_category, cumulative = TRUE)) + geom_boxplot(position = "identity")
gganimate(p)
gganimate's cumulative doesn't accumulate the data, it just keeps gif frames in subsequent frames as they appear. To achieve what you want, you have to do the accumulation before building the plot, something along the following lines:
library(tidyverse)
library(gganimate)
df <- data_frame(
ID = factor(c(1,1,2,2,1,1,2,2,1,1,2,2), levels = 1:2),
value = c(10,15,5,10,7,17,4,12,9,20,6,17),
sim_category = factor(c(1,1,1,1,2,2,2,2,3,3,3,3), levels = 1:3)
)
p <- df %>%
pull(sim_category) %>%
levels() %>%
as.integer() %>%
map_df(~ df %>% filter(sim_category %in% 1:.x) %>% mutate(sim_category = .x)) %>%
ggplot(aes(ID, value, frame = factor(sim_category))) +
geom_boxplot(position = "identity")
gganimate(p)
I have a .csv file of the average life expectancy by country for the last 50 years. I am trying to create a graph of life expectancy by country, with the years 1960-2011 on the x axis, and the average life expectancy on the y axis. I only want to plot the top ten countries, each with their own line.
I have researched every possible way to plot a multi line graph of the data I have and it seems to me that it is impossible with the way the data is formatted. My questions are:
Is it possible to create the desired graph with this data, given the way it is organized?
If the data has to be restructured, how should that be done? Is there a function in R to better organize data?
I was able to create the desired graph in Excel which is exactly what I'd like to do in R.
Here is a link to the lexp.csv file.
https://drive.google.com/file/d/0BwsBIUlCf0Z3QVgtVGt4ampVcmM/view?usp=sharing
You are correct that the data would benefit from reorganization. This is a "wide to long" problem i.e it would be better to have 3 columns: Country, Year and Age.
You can reformat the data using the tidyr package, process it using the dplyr package and plot using ggplot2. So, assuming that you have read the CSV into R and have a data frame named lexp, you could try something like this:
library(dplyr)
library(tidyr)
library(ggplot2)
lexp %>%
# reformat from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# select most recent year
filter(Year == max(Year)) %>%
# sort by decreasing age
arrange(desc(Age)) %>%
# take the top 10 countries
slice(1:10) %>%
select(Country) %>%
# join back to the original data
inner_join(lexp) %>%
# reformat again from wide to long
gather(Year, Age, -Country, convert = TRUE) %>%
# and plot the graph
ggplot(aes(Year, Age)) + geom_line(aes(color = Country, group = Country)) +
theme_dark() + theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Life Expectancy") +
scale_color_brewer(palette = "Set3")
Result:
library("reshape2")
library("ggplot2")
test_data_long <- melt(df, id="Country") # convert to long format
testdata<-test_data_long[complete.cases(test_data_long),]
ggplot(data=testdata,
aes(x=variable, y=value)) +
geom_line(aes(color = Country, group = Country))