Make multiple geoms animated in ggplot - r

I am trying to develop an animated plot showing how the rates of three point attempts and assists have changed for NBA teams over time. While the points in my plot are transitioning correctly, I tried to add a vertical and horizontal mean line, however this is staying constant for the overall averages rather than shifting year by year.
p<-ggplot(dataBREFPerPossTeams, aes(astPerPossTeam,fg3aPerPossTeam,col=ptsPerPossTeam))+
geom_point()+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Assists Per 100 Possessions")+
ylab("Threes Attempted Per 100 Possessions")+labs(color="Points Per 100 Possessions")+
geom_hline(aes(yintercept = mean(fg3aPerPossTeam)), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean(astPerPossTeam)), color='blue',linetype='dashed')
anim<-p+transition_time(as.integer(yearSeason))+labs(title='Year: {frame_time}')
animate(anim, nframes=300)
Ideally, the two dashed lines would shift as the years progress, however, right now they are staying constant. Any ideas on how to fix this?

I am using datasets::airquality since you have not shared your data. The idea here is that you need to have the values for your other geom (here it is mean) as a variable in your dataset, so gganimate can draw the connection between the values and frame (i.e. transition_time).
So What I did was grouping by frame (here it is month and it will be yearSeason for you) and then mutating a column with the average of my desired variables. Then in geoms I used that appended variable instead of getting the mean inside of the geom. Look below;
library(datasets) #datasets::airquality
library(ggplot2)
library(gganimate)
library(dplyr)
g <- airquality %>%
group_by(Month) %>%
mutate(mean_wind=mean(Wind),
mean_temp=mean(Temp)) %>%
ggplot()+
geom_point(aes(Wind,Temp, col= Solar.R))+
geom_hline(aes(yintercept = mean_temp), color='blue',linetype='dashed')+
geom_vline(aes(xintercept = mean_wind), color='green',linetype='dashed')+
scale_color_gradient(low='yellow',high='red')+
theme_classic()+
xlab("Wind")+
ylab("Temp")+labs(color="Solar.R")
animated_g <- g + transition_time(as.integer(Month))+labs(title='Month: {frame_time}')
animate(animated_g, nframes=18)
Created on 2019-06-09 by the reprex package (v0.3.0)

Related

box plots with individual observations

very beginner question here:
I have a dataset of 4 columns of values and I need to create a graph with 4 boxplots showing average and standard deviation, and I wanted to know how to also show the individual observations as points (with ggplot2).
Thank you for your help!!!!
This is relatively simple, as you can add multiple geom_s in ggplot.
Here is a small example that showcases the geom_boxplot in combination with geom_jitter.
In order to also be able to show outliers in a box plot (if that is what you want), you can add color or different point-types with e.g. geom_boxplot(outlier.color = "red").
library(tidyverse)
iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot(outlier.colour = "red") + # Add the boxplot geom
geom_jitter(width = 0.1) # Add the points with a random jitter on the X-axis
Created on 2022-08-11 by the reprex package (v2.0.0)

ggplot2: Can I fix the absolute distance between two values on an axis?

My problem seems quite basic, but I couldn't find any relevant answer. I want to create line plots with the date on the x axis. The y axis will be Covid statistics (deaths, hospitalizations, you name it). I want to create a separate plot for the different waves of the pandemic which means that my charts cover different times. My problem is that R fixes the plot to the same size and thus the lines for the shorter time period are skewed in comparison to those of the longer time period. Ideally, I would want 1 month on the x axis to be fixed to a certain number of px or mm. But I can't find out how. My best idea so far is to assign both plots a different total width, but that doesn't give me an optimal result either.
Here's a reproducible example with a built-in dataset to explain:
library(ggplot2)
library(dplyr)
economics_1967 <- economics %>%
filter(date<"1968-01-01")
economics_1968 <- economics %>%
filter(date<"1969-01-01"&date>"1967-12-31")
#data is only available for six months in 1967, but for 12 in 1968
exampleplot1 <- ggplot(economics_1967)+
geom_line(aes(date, unemploy))+
scale_x_date(date_breaks="1 month", date_labels="%b")
#possible: ggsave("exampleplot1.png", width=2, height=1)
exampleplot2 <- ggplot(economics_1968)+
geom_line(aes(date, unemploy))+
scale_x_date(date_breaks="1 month", date_labels="%b")
ggsave("exampleplot2.png", width=4, height=1)
#possible: ggsave("exampleplot1.png", width=2, height=1)
Thank you!
EDIT: Thanks for the suggestions! Facet wrap would be a good idea but in the end I decided to just plot the whole time in one case. The background is that I classified countries differently for their policies in different times, so that's why I wanted to have a clear break in the visualization, but I just put a vertical line in there.
facet_grid is one approach, if you don't mind showing the two charts together.
library(dplyr); library(ggplot2)
bind_rows(e1967 = economics_1967,
e1968 = economics_1968, .id="source") %>%
ggplot(aes(date, unemploy)) +
geom_line() +
scale_x_date(date_breaks="1 month", date_labels="%b") +
facet_grid(~source, scales = "free_x", space = "free_x")
I like #Jon Spring's solution a lot. I want to present it a tad differently --to show that facet() usually operates on a single dataset that has one existing variable used to facet.
econ_subset <-
economics %>%
dplyr::filter(dplyr::between(date, as.Date("1967-09-01"), as.Date("1968-12-31"))) %>%
dplyr::mutate(
year = lubridate::year(date) # Used below to facet
)
ggplot(econ_subset, aes(date, unemploy)) +
geom_line() +
scale_x_date(date_breaks="1 month", date_labels="%b") +
facet_grid(~year, scales = "free_x", space = "free_x")
(In Jon's solution, bind_rows() is used to stack the two separate datasets back together.)

R, ggplot, How do I keep related points together when using jitter?

One of the variables in my data frame is a factor denoting whether an amount was gained or spent. Every event has a "gain" value; there may or may not be a corresponding "spend" amount. Here is an image with the observations overplotted:
Adding some random jitter helps visually, however, the "spend" amounts are divorced from their corresponding gain events:
I'd like to see the blue circles "bullseyed" in their gain circles (where the "id" are equal), and jittered as a pair. Here are some sample data (three days) and code:
library(ggplot2)
ccode<-c(Gain="darkseagreen",Spend="darkblue")
ef<-data.frame(
date=as.Date(c("2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-01","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-02","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03","2021-03-03")),
site=c("Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace","Castle","Temple","Temple","Temple","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Temple","Temple","Palace","Palace","Castle","Castle","Castle","Castle","Castle","Temple","Temple","Palace"),
id=c("C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99","C123","T101","T93","T94","T95","T96","P102","P96","C126","C127","C128","T100","T98","P100","P98","C129","C130","C131","C132","C133","T104","T99","P99"),
gainspend=c("Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Gain","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend","Spend"),
amount=c(6,14,34,31,3,10,6,14,2,16,16,14,1,1,15,11,8,7,2,10,15,4,3,NA,NA,4,5,NA,NA,NA,NA,NA,NA,2,NA,1,NA,3,NA,NA,2,NA,NA,2,NA,3))
#▼ 3 day, points centered
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
#▼ 3 day, jitted
ggplot(ef,aes(date,site)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5,position=position_jitter(w=0,h=0.2)) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20))
My main idea is the old "add jitter manually" approach. I'm wondering if a nicer approach could be something like plotting little pie charts as points a la package scatterpie.
In this case you could add a random number for the amount of jitter to each ID so points within groups will be moved the same amount. This takes doing work outside of ggplot2.
First, draw the "jitter" to add for each ID. Since a categorical axis is 1 unit wide, I choose numbers between -.3 and .3. I use dplyr for this work and set the seed so you will get the same results.
library(dplyr)
set.seed(16)
ef2 = ef %>%
group_by(id) %>%
mutate(jitter = runif(1, min = -.3, max = .3)) %>%
ungroup()
Then the plot. I use a geom_blank() layer so that the categorical site axis is drawn before I add the jitter. I convert site to be numeric from a factor and add the jitter on; this only works for factors so luckily categorical axes in ggplot2 are based on factors.
Now paired ID's move together.
ggplot(ef2, aes(x = date, y = site)) +
geom_blank() +
geom_point(aes(size = amount, color = gainspend,
y = as.numeric(factor(site)) + jitter),
alpha=0.5) +
scale_color_manual(values = ccode) +
scale_size_continuous(range = c(1, 15), breaks = c(5, 10, 20))
#> Warning: Removed 15 rows containing missing values (geom_point).
Created on 2021-09-23 by the reprex package (v2.0.0)
You can add some jitter by id outside the ggplot() call.
jj <- data.frame(id = unique(ef$id), jtr = runif(nrow(ef), -0.3, 0.3))
ef <- merge(ef, jj, by = 'id')
ef$sitej <- as.numeric(factor(ef$site)) + ef$jtr
But you need to make site integer/numeric to do this. So when it comes to making the plot, you need to manually add axis labels with scale_y_continuous(). (Update: the geom_blank() trick from aosmith above is a better solution!)
ggplot(ef,aes(date,sitej)) +
geom_point(aes(size=amount,color=gainspend),alpha=0.5) +
scale_color_manual(values=ccode) +
scale_size_continuous(range=c(1,15),breaks=c(5,10,20)) +
scale_y_continuous(breaks = 1:3, labels= sort(unique(ef$site)))
This seems to work, but there are still a few gain/spend circles without a partner--perhaps there is a problem with the id variable.
Perhaps someone else has a better approach!

Zig Zag when using geom_line with ggplot in R

I would really appreciate some insight on the zagging when using the following code in R:
tbi_military %>%
ggplot(aes(x = year, y = diagnosed, color = service)) +
geom_line() +
facet_wrap(vars(severity))
The dataset is comprised of 5 variables (3 character, 2 numerical). Any insight would be so appreciated.
enter image description here
This is just an illustration with a standard dataset. Let's say we're interested in plotting the weight of chicks over time depending on a diet. We would attempt to plot this like so:
library(ggplot2)
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line()
You can see the zigzag pattern appear, because per diet/time point, there are multiple observations. Because geom_line sorts the data depending on the x-axis, this shows up as a vertical line spanning the range of datapoints at that time per diet.
The data has an additional variable called 'Chick' that separates out individual chicks. Including that in the grouping resolves the zigzag pattern and every line is the weight over time per individual chick.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(aes(group = interaction(Chick, Diet)))
If you don't have an extra variable that separates out individual trends, you could instead choose to summarise the data per timepoint by, for example, taking the mean at every timepoint.
ggplot(ChickWeight, aes(Time, weight, colour = factor(Diet))) +
geom_line(stat = "summary", fun = mean)
Created on 2021-08-30 by the reprex package (v1.0.0)

Aggregate data by year interval inside bar plot

I want to aggregate data by year interval inside a bar plot. Based on this answer, I wrote the following code:
years <- seq(as.Date('1970/01/01'), Sys.Date(), by="year")
set.seed(111)
effect <- sample(1:100,length(years),replace=T)
data <- data.frame(year=years, effect=effect)
ggplot(data, aes(year, effect)) + geom_bar(stat="identity", aes(group=cut(year, "5 years")))
However, only the tick marks are affected, but the data is not summed by interval. Can I get ggplot2 to sum the data without preprocessing the data, while keeping the tick marks and labels as they are?
EDIT: Sorry I wasn't clear. I'd like to keep the tick marks and labels as they are, i.e. tick marks positioned at the left hand edge of each bar (which now covers 5 years) and year only in the labels. This is based on the appearance of the linked answer above.
Slightly hacky way of doing what you want:
ggplot(data, aes(cut(year, "5 years"), effect)) +
geom_col() +
xlab("year")
What it actually does: it plots multiple columns (bars) with height equals to effect but stacked on top of each other based on 5-year interval identifier. In other words, on plot there are actually 48 bars with one colour but positioned on top of each other.
Try this:
library(tidyverse)
df %>%
mutate(index = ceiling(seq_along(years) / 5)) %>%
group_by(index) %>%
mutate(sum_effect = sum(effect)) %>%
distinct(sum_effect, .keep_all = TRUE) %>%
ggplot(aes(year, sum_effect)) +
geom_col()
Which returns:
I prefer transforming the dataset so that I don't have to do anything fancy with ggplot2

Resources