gganimate: handling of missing values - r

I'm trying the gganimate package for the first time and run into a problem with the handling of missing values (NA). I apologize if my question is trivial but I couldn't find any solution.
Here is a reproducible example of what I'm trying to do:
# Load libraries:
library(ggplot2)
library(gganimate)
library(dplyr)
library(tidyr)
# Create some data
## Monthly sales are in 100:1000
## Expected sales are 400/month, increasing by 5% every year
set.seed(123)
df <- data_frame(Year = rep(2015:2018, each=12),
Month = rep(1:12, 4),
Sales = unlist(lapply(1:4,
function(x){cumsum(sample(100:1000, 12))})),
Expected = unlist(lapply(1:4,
function(x){cumsum(rep(400*1.05^(x-1),12))})))
# gganimate works fine here:
df %>%
tidyr::gather("Type", "value", Sales:Expected) %>%
ggplot(aes(Month, value, col=Type)) +
geom_point() +
geom_line() +
gganimate::transition_time(Year)
# Now data for the end of Year 2018 are missing:
df[df$Year==2018 & df$Month %in% 9:12,"Sales"] = NA
# Plotting with ggplot2 works (and gives a warning about missing values):
df %>%
tidyr::gather("Type", "value", Sales:Expected) %>%
dplyr::filter(Year == "2018") %>%
ggplot(aes(Month, value, col=Type)) +
geom_point() +
geom_line()
# But gganimate fails
df %>%
tidyr::gather("Type", "value", Sales:Expected) %>%
ggplot(aes(Month, value, col=Type)) +
geom_point() +
geom_line() +
gganimate::transition_time(Year)
# I get the following error:
## Error in rep(seq_len(nrow(polygon)), splits + 1) : incorrect 'times' argument
I tried to play with the enter_() / exit_() functions of gganimate but without success.
Thank you for your help.
EDIT: (using the suggestion of MattL)
This works:
df %>%
# filter(!is.na(Sales)) %>% ##Proposed by Matt L but removes Expected values too
gather("Type", "value",Sales:Expected) %>%
filter(!is.na(value)) %>% ## Remove NA values
ggplot(aes(Month, value, col=Type)) +
geom_point() +
geom_line() +
labs(title = "Year: {frame_time}") + ## Add title
gganimate::transition_time(Year) +
gganimate::exit_disappear(early=TRUE) ## Removes 2017 points appearing in Year 2018
I still have the feeling that gganimate should be able to handle these NA values like ggplot does though.
Thanks!

Filter out the missing values before "piping" to the ggplot function:
df %>%
filter(!is.na(Sales)) %>%
tidyr::gather("Type", "value", Sales:Expected) %>%
ggplot(aes(Month, value, col=Type)) +
geom_point() +
geom_line() +
gganimate::transition_time(Year)

Related

How to input data in excel/csv to make multiple chart in R studio

I have a data here, my data.
I would like to make graph like this example multichart.
I have tried to run this script below.
However, I dont understand how to input my data in excel to run this script.
Does anyone to help me? Please, I have thought about this 3 days and The deadline is very soon. Thank you for your help
# Libraries
library(ggplot2)
library(babynames) # provide the dataset: a dataframe called babynames
library(dplyr)
library(hrbrthemes)
library(viridis)
# Keep only 3 names
don <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Helen")) %>%
filter(sex=="F")
# Plot
don %>%
ggplot( aes(x=year, y=n, group=name, color=name)) +
geom_line() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
ylab("Number of babies born")
You may read the data using readxl::read_excel, get it in long format and plot using ggplot.
library(tidyverse)
data <- readxl::read_excel('example data.xlsx')
data %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, values_drop_na = TRUE) %>%
ggplot() + aes(row, value, color = name) +
geom_line()

Why can't I get the right horizontal axis labels on my ggplot2 chart?

I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))

R: gganimate with geom_density

I am trying to create an animate graph with gganimate. My defaul graph, static is something like:
But instead of 3 grouping variables I have 12 (year 0, year 1, year 2, etc.). Instead of plotting all 12 variables together I would like to animate it. To avoid:
Those kernel densities plots are made as follows:
data_decidious %>% tidyr::gather("YEAR", "NDVI", colsPostNDVI) %>%
mutate(YEAR = str_remove(YEAR, 'meanNDVIyear')) %>% mutate(YEAR = str_remove(YEAR, 'meanprefire_NDVI')) %>% mutate(YEAR = as.factor(YEAR)) %>%
ggplot(aes(NDVI,fill=YEAR)) + geom_density(alpha=.2) + xlim(0.3, 0.7) + ylim(0,46) +
xlab("Mean NDVI") + ylab("Kernal density") + guides(fill=guide_legend(title="Comparative"))
I have found that this geom_density() only works when I add mutate(YEAR = as.factor(YEAR)). That means when I add:
transition_time(YEAR) + ease_aes('linear')
I get the error:
Error: time data must either be integer, numeric, POSIXct, Date, difftime, orhms
In addition: Warning message:
In min(cl[cl != 0]) : no non-missing arguments to min; returning Inf
Any idea to animate my graph?
Converting YEAR to a factor is not necessary. Instead simply map factor(YEAR) on fill. This way you can use YEAR in transition time and everything is fine.
Using the gapminder::gapminder dataset as example data the following code plots and animates the density of worldwide life-expectancy over time.
(BTW: Instead of using a categorical color scale you can map YEAR directly on fill to get a continuous color scale. However, in this case you have to map YEAR also on the group aesthetic):
library(ggplot2)
library(dplyr)
library(gganimate)
p <- gapminder::gapminder %>%
ggplot(aes(lifeExp, fill = factor(year))) +
geom_density(alpha=.2) +
xlab("Life Expectancy") +
ylab("Kernal density") +
guides(fill = guide_legend(title = "Year"))
p +
transition_time(year) +
ease_aes('linear')
Created on 2020-04-17 by the reprex package (v0.3.0)
Edit:
As far as I can tell without having seen your dataset you have to adapt your code like so (from inspecting your code I guess that YEAR is a character. So you have to convert it to an integer):
data_long <- data_decidious %>%
tidyr::gather("YEAR", "NDVI", colsPostNDVI) %>%
mutate(YEAR = str_remove(YEAR, 'meanNDVIyear')) %>%
mutate(YEAR = str_remove(YEAR, 'meanprefire_NDVI')) %>%
# Convert YEAR to integer
mutate(YEAR = as.integer(YEAR))
p <- data_long %>%
ggplot(aes(NDVI,fill=factor(YEAR))) +
geom_density(alpha=.2) +
xlim(0.3, 0.7) +
ylim(0,46) +
xlab("Mean NDVI") +
ylab("Kernal density") +
guides(fill=guide_legend(title="Comparative"))
p +
transition_time(YEAR) +
ease_aes('linear')
anim_save("test.gif")

Time series label in R

I have a dataframe in R where:
Date MeanVal
2002-01 37.70722
2002-02 43.50683
2002-03 45.31268
2002-04 14.96000
2002-05 29.95932
2002-09 52.95333
2002-10 12.15917
2002-12 53.55144
2003-03 41.15083
2003-04 21.26365
2003-05 33.14714
2003-07 66.55667
.
.
2011-12 40.00518
And when I plot a time series using ggplot with:
ggplot(mean_data, aes(Date, MeanVal, group =1)) + geom_line()+xlab("")
+ ylab("Mean Value")
I am getting:
but as you can see, the x axis scale is not very neat at all. Is there any way I could just scale it by year (2002,2003,2004..2011)?
Let's use lubridate's parse_date_time() to convert your Date to a date class:
library(tidyverse)
library(lubridate)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
ggplot(aes(Date, MeanVal)) +
geom_line()
Similarly, we can convert to an xts and use autoplot():
library(timetk)
mean_data %>%
mutate(Date = parse_date_time(as.character(Date), "Y-m")) %>%
tk_xts(silent = T) %>%
autoplot()
This achieves the plot above as well.
library(dplyr)
mean_data %>%
mutate(Date = as.integer(gsub('-.*', '', Date)) %>%
#use the mutate function in dplyr to remove the month and cast the
#remaining year value as an integer
ggplot(aes(Date, MeanVal, group = 1)) + geom_line() + xlab("")
+ ylab("Mean Value")

Date format using scale_x_date giving Error

Hello I need to get my ggplot with date format having this format in X axis:
.
But my date format has time with it.
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(word, created_at, sentiment) %>%
ungroup()
p <- sentiment_bing1 %>% filter(sentiment == "positive") %>% ggplot(aes(x=created_at, y = n)) +
geom_line(stat="identity", position = "identity", color = "Blue") + scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') + stat_smooth() + theme_gdocs() +
xlab("Date") + ylab("Normalized Frequency of Positive Words in Trup's Tweets")
1 abound 11/30/17 13:05 positive 0.0
2 abuse 1/11/18 12:33 negative 0.0
3 abuse 10/27/17 1:18 negative 0.0
4 abuse 2/18/18 17:10 negative 0.0
This is what I have done to get the result. Now how do I achieve it like the picture? Conversion to date doesn't help as there are instances where the tweet takes place on same day but different time and that then messes the graph.
Welcome to SO!
It's hard to answer your question without seeing the data you are using and the error that your code is generating. Next time try and create a reproducible question. This will make it easier for someone to identify where your problem lies.
Based on the code and data you've provided I've created a sample data set with a (broadly) similar structure to that from the chart...
library(lubridate)
library(ggplot2)
library(ggthemes)
set.seed(100)
start_date <- mdy_hm("03-01-2017-12:00")
end_date <- mdy_hm("03-01-2018-12:00")
number_hours <- interval(start_date, end_date)/hours(1)
created_at <- start_date + hours(6:number_hours)
length(created_at)
word <- sample(c("abound", "abuse"), size = length(created_at), replace = TRUE,
prob=c(0.25, 0.75))
Your plotting code looks good. I could be wrong here, but from what I can tell your problem could lie in the way you are summarising the frequencies. In the code below, I've used the lubridate package to group you data by dates (day), allowing for a daily frequency count.
test_plot <- data_frame(created_at, word) %>%
mutate(sentiment =
case_when(
word == "abound" ~ "positive",
word == "abuse" ~ "negative")) %>%
filter(sentiment == "positive") %>%
mutate(created_at = date(round_date(ymd_hms(created_at), unit = "day"))) %>%
group_by(created_at) %>%
tally() %>%
ggplot() +
aes(x = created_at, y = n) +
geom_line(stat="identity", position = "identity", color = "Blue") +
geom_smooth() +
scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') +
theme_gdocs() +
xlab("Date") +
ylab("Frequency of Positive Words in Trump's Tweets")
Which gives you this...
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(created_at, sentiment) %>%
spread(sentiment, n, fill=0) %>%
mutate(N = (sentiment_bing1$negative - min(sentiment_bing1$negative)) / (max(sentiment_bing1$negative) - min(sentiment_bing1$negative))) %>%
mutate(P = (sentiment_bing1$positive - min(sentiment_bing1$positive)) / (max(sentiment_bing1$positive) - min(sentiment_bing1$positive))) %>%
ungroup
sentiment_bing1$created_at <- as.Date(sentiment_bing1$created_at, "%m/%d/%y")
The use of spread helped in separating the positive and negative and then in normalization to get the result I wasa looking for!

Resources