I have a data.frame containing timestamped events of different kinds, geolocated. I know how to plot an animation of each event as a point, hour by hour, with gganimate (*). It would be something like:
df = data.frame("id" = runif(500, 1e6, 1e7),
'lat' = runif(500, 45, 45.1),
'long'= runif(500, 45, 45.1),
'datetime'= seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=500),
'hour'=format(seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=500), "%H"),
'event'=paste0("type", rpois(500, 1)))
ggplot(data=df) +
aes(x=long, y=lat, color=factor(event)) +
geom_point() +
transition_states(hour, state_length = 1, transition_length = 0)
Now I would like to make points "stay" longer on screen, for instance if an event is at 5:00pm, i want it to be displayed on the animation from 2pm until 8pm (3 frames before and after his position, and if possible fade in and out). I don't know how to do that with gganimate, I tried to use transition_length but it's making the points "move" and that makes no sense for me!
Thanks,
(*) Edit: I thought of adding 6 duplicated rows for each row, and modifying the hour by -1 to +3, but it's a lot heavier and can't deal with fade in/out
library(magrittr)
df %<>% mutate(hour = hour + 2) %>% bind_rows(df)
df %<>% mutate(hour = hour + 1) %>% bind_rows(df)
df %<>% mutate(hour = hour - 4) %>% bind_rows(df)
df %<>% mutate(hour = hour %% 24 )
You can use transition_components and specify 3 hours as the enter / exit length for each point.
Data:
set.seed(123)
n <- 50 # 500 in question
df = data.frame(
id = runif(n, 1e6, 1e7),
lat = runif(n, 45, 45.1),
long = runif(n, 45, 45.1),
datetime = seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=n),
hour = format(seq.POSIXt(from=Sys.time(), to=Sys.time()+1e5, length.out=n), "%H"),
event = paste0("type", rpois(n, 1)))
Code:
df %>%
mutate(hour = as.numeric(as.character(hour))) %>%
ggplot() +
aes(x=long, y=lat, group = id, color=factor(event)) +
# I'm using geom_label to indicate the time associated with each
# point & highlight the transition times before / afterwards.
# replace with geom_point() as needed
geom_label(aes(label = as.character(hour))) +
# geom_point() +
transition_components(hour,
enter_length = 3,
exit_length = 3) +
enter_fade() +
exit_fade() +
ggtitle("Time: {round(frame_time)}")
This approach works with a datetime variable as well:
df %>%
ggplot() +
aes(x = long, y = lat, group = id, color = factor(event)) +
geom_label(aes(label = format(datetime, "%H:%M"))) +
transition_components(datetime,
enter_length = as_datetime(hm("3:0")),
exit_length = as_datetime(hm("3:0"))) +
enter_fade() +
exit_fade() +
ggtitle("Time: {format(frame_time, '%H:%M')}")
gganimate does not appear to be set up to handle leaving points on the plot. I think that you are going to have to go the manual route.
Here is a (slightly kludgy) approach to duplicate the rows including setting the times at which they should display and the offset (to be used for alpha to control fade):
df_withRange <-
df %>%
mutate(hour = parse_number(hour)) %>%
split(1:nrow(.)) %>%
lapply(function(x){
lapply(-3:3, function(this_time){
x %>%
mutate(frame_time = hour + this_time
, offset = this_time
, abs_offset = abs(this_time))
}) %>%
bind_rows()
}) %>%
bind_rows() %>%
mutate(
frame_time = ifelse(frame_time > 23, frame_time - 24, frame_time)
, frame_time = ifelse(frame_time < 0, frame_time + 24, frame_time)
)
Then, this code set up the plot:
ggplot(data=df_withRange
, aes(x=long
, y=lat
, color=factor(event)
, alpha = abs_offset
)) +
geom_point() +
transition_states(frame_time) +
labs(title = 'Hour: {closest_state}') +
scale_alpha_continuous(range = c(1,0.2))
The plot:
There is still a lot of clean up to do (e.g., the fade levels, etc.), but that should be a start at least
Related
I am trying to order the time and date axes on my scatter plot into epochs/ time periods. For example, times between 12pm-:7:59pm and 9pm-11:59pm. I want to do something similar for the dates.
I am fairly new to R so I am just looking for suggestions/ to be told if this is even possible and maybe some alternatives:)
This is my code so far:
accident <- read.csv("accidents.csv",header = TRUE)
accident <- accident %>%
ggplot(data=accident)+
geom_point(mapping=aes(x=Time, y=Date, alpha=0.5))
Thank you!
Welcome to R! Here is one set of options.
library(tidyverse)
library(lubridate)
First, simulate dataset
accident <-
rnorm(n = 1000, mean = 1500000000, sd = 1000000) %>%
tibble(date_time = .) %>%
mutate(date_time = as.POSIXct(date_time, origin = "1970-01-01")) %>%
separate(date_time, into = c("date", "time"), sep = " ", remove = F)
Original plot:
accident %>%
ggplot()+
geom_point(aes(x=time, y=date), alpha=0.5)
Step 1: Collapse the x axis into smaller number of groups
accidents_per_trihour <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
group_by(date, tri_hour) %>%
count()
Then scale dot size by number of accidents
accidents_per_trihour %>%
ggplot()+
geom_point(aes(x=tri_hour, y=date, size = n), alpha=0.5) +
labs(x = "\nTime (in three-hour groups)", y = "Day\n", size = "Accidents count")
Still not great because the y axis is too expansive. So:
Step 2: Collapse the y axis into smaller number of groups
(For your data you may need to group into months for things to start to look reasonable)
accidents_per_trihour_per_week <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
mutate(week_start = floor_date(as.Date(date), unit = "weeks"),
week = format.Date(week_start, "%Y, week %W")) %>%
group_by(week, tri_hour) %>%
count()
Should be much more readable now
We’ll improve the theme as well, just because.
if (!require(ggthemr)) devtools::install_github('cttobin/ggthemr')
ggthemr::ggthemr("flat") ## helps with pretty theming
accidents_per_trihour_per_week %>%
ggplot()+
geom_point(aes(x=tri_hour, y=week, size = n), alpha = 0.9) +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", size = "Accidents count")
Could also do a tile plot
accidents_per_trihour_per_week %>%
ggplot() +
geom_tile(aes(x = tri_hour, y = week, fill = n)) +
geom_label(aes(x = tri_hour, y = week, label = n), alpha = 0.4, size = 2.5, fontface = "bold") +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", fill = "Accidents count")
Created on 2021-11-24 by the reprex package (v2.0.1)
So, I have a data frame in R with node number and Lat, Long of different points.
Sample data for it:
library(tidyverse)
set.seed(4)
node <- seq(1, 10)
lat <- runif(10, 77, 78)
long <- runif(10, 22, 23)
df <- data.frame(node, lat, long)
ggplot()+
geom_point(aes(x=long, y=lat))+
geom_text(aes(x=long, y=lat, label=node), size=5)
[
Now I have a list with the routes like this:
route <- c(1,6,2,3,1,10,4,1,5,8,1,3,7,1)
Based on the order of the route, I want to draw arrows on the previous plot, so that it will look something like this:
How can it be done? I could use geom_path but it will give the order based on the order of the data frame. Also if different colours can be added for the different routes like 1->6->2->3->1 in one colour, 1->5->8->1 in another then it will be better. But at the time the arrow can be sufficient.
One minor change made to route (includes 9):
route <- c(1,6,2,3,1,10,4,1,5,8,1,9,7,1)
Try creating a data.frame of segments - this will allow you customize in greater detail, including color. The data.frame should have start and end values for longitude, latitude.
df_seg <- data.frame(
lat1 = df$lat[route[-length(route)]],
long1 = df$long[route[-length(route)]],
lat2 = df$lat[route[2:length(route)]],
long2 = df$long[route[2:length(route)]],
color = c(rep("blue", 4), rep("green", 3), rep("red", 3), rep("purple", 3))
)
Then you can use geom_segment referencing this data.frame:
ggplot()+
geom_point(aes(x=long, y=lat))+
geom_text(aes(x=long, y=lat, label=node), size=5) +
geom_segment(data = df_seg,
aes(x = long1, y = lat1, xend = long2, yend = lat2),
arrow = arrow(angle = 12, type = "closed"),
linetype = 1,
color = df_seg$color)
Plot
I came up with a very similar answer to the one Ben provided, just with a more flexible way of defining the coloring groups whenever the start position is 1 (this part could probably be a one liner, but I couldn't figure it out quickly) and using joins to get the start and end segment positions.
Your code with the amended route:
library(tidyverse)
set.seed(4)
node <- seq(1, 10)
lat <- runif(10, 77, 78)
long <- runif(10, 22, 23)
df <- data.frame(node, lat, long)
route <- c(1,6,2,3,1,10,4,1,5,8,1,9,7,1)
Creating the segment dataframe:
df2 = tibble(start = route, end = route[c(2:length(route), 1)]) %>%
filter(start != end) %>%
left_join(df, by = c("start" = "node")) %>%
left_join(df, by = c("end" = "node"), suffix = c("_start", "_end")) %>%
mutate(temp_coloring = if_else(start == 1, 1, 0)) %>%
mutate(coloring = if_else(temp_coloring == 1, cumsum(temp_coloring), NA_real_)) %>%
fill(coloring) %>%
select(-temp_coloring) %>%
mutate(coloring = as_factor(coloring))
Plotting:
df %>%
ggplot()+
geom_point(aes(x=long, y=lat))+
geom_text(aes(x=long, y=lat, label=node), size=5) +
geom_segment(data = df2 , aes(x = long_start, y = lat_start, xend = long_end, yend = lat_end, color = coloring),
arrow = arrow(length = unit(0.1, "inches")))
I have many time series and want to find a way to identify the top 10 greatest rises and falls for each time series.
This is not as easy as it sounds because the most prominent features on a time series can sometimes be interrupted by movements in the opposite direction, if only for a very short time (e.g. one period). This means any algorithm that simply scans for the most consecutive periods movement in the same direction usually fails to find the most prominent features (e.g. that a human would identify).
Are there any standard methods that can be used 'out of the box'?
For example, in the following chart, if asked to identify the most prominent falls, a human would probably point to the circled areas. How can we get code to identify these falls (as a human would)?
Note: I guess a convolutional neural network could probably do this, but I'm after simpler solutions if possible (it doesn't have to be perfect)
library(tidyverse)
library(priceR)
au <- historical_exchange_rates("AUD", to = "USD",
start_date = "2010-01-01", end_date = "2020-06-30")
au %>%
tail(365 * 8) %>%
rename(aud_to_usd = one_AUD_equivalent_to_x_USD) %>%
mutate(date = as.Date(date)) %>%
ggplot(aes(x = date, y = aud_to_usd, group = 1)) +
geom_line() +
geom_smooth(method = 'loess', se = TRUE) +
theme(axis.title.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_x_date(date_labels = "%Y", date_breaks = "1 year") +
ggtitle("AUD to USD over last 8 years")
Here is a function that you could use. It makes use of the run-length encoding of the timeseries into segments that rise or fall. It allows you to set a gap_width argument that instructs how long interruptions of stretches are allowed to be. It is in base R, it is not perfect, but seems to work decently for the case you presented above.
rise_and_falls <- function(value, time, gap_width = 5, top = 10, type = "fall") {
type <- match.arg(type, c("fall", "rise"))
if (type == "fall") {
rle <- rle(sign(diff(value)) == -1)
} else {
rle <- rle(sign(diff(value)) == 1)
}
rle$values <- !rle$values & rle$lengths <= gap_width | rle$values
rle <- rle(inverse.rle(rle)) # Clean up changed runs
df <- data.frame(
start = cumsum(rle$lengths) - rle$lengths + 1,
end = cumsum(rle$lengths),
len = rle$lengths,
drop = rle$values
)
df <- transform(
df,
start_value = value[start],
end_value = value[end],
start_time = time[start],
end_time = time[end]
)
df$diff <- df$start_value - df$end_value
df <- df[order(df$diff),]
if (type == "fall") {
tail(df, top)
} else {
head(df, top)
}
}
I recommend you use it as follows:
au %>%
tail(365 * 8) %>%
rename(aud_to_usd = one_AUD_equivalent_to_x_USD) %>%
mutate(date = as.Date(date)) -> au
df <- rise_and_falls(au$aud_to_usd, au$date, type = "fall")
ggplot(au, aes(x = date, y = aud_to_usd, group = 1)) +
geom_line() +
geom_smooth(method = 'loess', se = TRUE) +
theme(axis.title.x=element_blank(),
axis.ticks.x=element_blank()) +
scale_x_date(date_labels = "%Y", date_breaks = "1 year") +
ggtitle("AUD to USD over last 8 years") +
geom_segment(data = df, aes(x = start_time, y = start_value,
xend = end_time, yend = end_value),
size = 2, colour = "red")
If somebody wants to improve this, it probably makes sense to cut-off the stretches at the local extrema.
Another option would be to smooth the line with a Gaussian kernel first and then run the rise_and_falls() function with gap_width = 0.
I have a dataset of people arriving in a location, how long they stayed, and their home locations. I want to create an animated chart which 'flies' them to their destination, and returns them to their original point once their trip is over. But I'm not sure if this is possible with gganimate or not. At the moment I only seem to be able to do a "start" and "end" frame, though it's a little hard to tell whether it just doesn't have enough frames to do the intended action.
Here's something like what I have so far:
library(dplyr)
library(ggplot2)
library(ggmap)
library(gganimate)
#Coordinates
europecoords <- c(left = -23, bottom = 36, right = 27.87, top = 70.7)
londonareacoords <- c(left = -.7, bottom = 51, right = 0.2, top = 52)
londonpointcoords <- as.data.frame(list(lon = -.14, lat = 51.49))
#Get the map we'll use as the background
europe <- get_stamenmap(europecoords, zoom = 4, maptype = "toner-lite")
#Sample dataset configuration
numberofpoints <- 10
balance <- 0.1
#Set up an example dataset
ids <- seq(1:numberofpoints)
arrivalday <- sample(x = 30, size = numberofpoints, replace = TRUE)
staylength <- sample(x = 7, size = numberofpoints, replace = TRUE)
startlocationlondonarealon <- sample(x = seq(londonareacoords['left'] * 10, londonareacoords['right'] * 10), size = numberofpoints * balance, replace = TRUE) / 10
startlocationlondonarealat <- sample(x = seq(londonareacoords['bottom'] * 10, londonareacoords['top'] * 10), size = numberofpoints * balance, replace = TRUE) / 10
startlocationeuropelon <- sample(x = seq(europecoords['left'] * 10, europecoords['right'] * 10), size = (numberofpoints * (1 - balance)), replace = TRUE) / 10
startlocationeuropelat <- sample(x = seq(europecoords['bottom'] * 10, europecoords['top'] * 10), size = (numberofpoints * (1 - balance)), replace = TRUE) / 10
startlocationlon <- c(startlocationlondonarealon, startlocationeuropelon)
startlocationlat <- c(startlocationlondonarealat, startlocationeuropelat)
points <- as.data.frame(cbind(ID = ids, arrivalday, staylength, departureday = arrivalday + staylength, startlocationlon, startlocationlat))
#Map the sample dataset to check it looks reasonable
ggmap(europe) +
geom_point(data = points, aes(x = startlocationlon, y = startlocationlat), col = "blue", size = 2) +
geom_point(data = londonpointcoords, aes(x = lon, y = lat), col = "red")
#Separate the events out to rearrange, then glue them back together
event1 <- points %>%
mutate(Event = "Day Before Arrival", Date = arrivalday - 1) %>%
mutate(Lon = startlocationlon,
Lat = startlocationlat) %>%
select(ID, Event, Date, Lon, Lat)
event2 <- points %>%
mutate(Event = "Arrival Date", Date = arrivalday) %>%
mutate(Lon = londonpointcoords$lon[1],
Lat = londonpointcoords$lat[1]) %>%
select(ID, Event, Date, Lon, Lat)
event3 <- points %>%
mutate(Event = "Departure Date", Date = departureday) %>%
mutate(Lon = londonpointcoords$lon[1],
Lat = londonpointcoords$lat[1]) %>%
select(ID, Event, Date, Lon, Lat)
event4 <- points %>%
mutate(Event = "Day After Departure", Date = departureday + 1) %>%
mutate(Lon = startlocationlon,
Lat = startlocationlat) %>%
select(ID, Event, Date, Lon, Lat)
events <- rbind(event1, event2, event3, event4) %>%
mutate(Event = factor(Event, ordered = TRUE, levels = c("Day Before Arrival", "Arrival Date", "Departure Date", "Day After Departure"))) %>%
mutate(ID = factor(ID))
#Make an animation
ggmap(europe) +
geom_point(data = events, aes(x = Lon, y = Lat, group = ID, col = ID), size = 2) +
#geom_point(data = londonpointcoords, aes(x = lon, y = lat), col = "red") +
transition_manual(Date) +
labs(title = "Date: {frame}") +
NULL
But as I said, the points don't seem to be 'flying' as much as just appearing and disappearing. Should I be using a different data format? Transition type? Number of frames? (I'm having trouble finding documentation on any of the above, which is part of why I'm stuck...)
Final result
Code
library(ggplot2)
library(ggmap)
library(gganimate)
ggm <- ggmap(europe) +
geom_point(data = events,
aes(x = Lon, y = Lat,
colour = ID, group = ID, shape = Event),
size = 3, alpha = 0.8) +
transition_time(Date) +
labs(title = paste("Day", "{round(frame_time,0)}")) +
shadow_wake(wake_length = 0.1)
animate(ggm, fps = 24, duration = 16)
========================================================
Step-by-step
You have lots of moving parts there. Let's break it down a bit:
0. Load libraries
library(ggplot2)
library(ggmap)
library(gganimate)
library(ggrepel) # will be useful for data exploration in step 1
1. Data exploration
ggplot(data = events, aes(x = ID, y = Date, colour = Event)) +
geom_point()
We see, that the arrival and departure events are each quite close together for each plane. Also, there is always a gap of a couple of days inbetween. That seems reasonable.
Let's check the Date variable:
> length(unique(events$Date))
[1] 24
> min(events$Date)
[1] 2
> max(events$Date)
[1] 33
Okay, this means two things:
Our data points are unevenly spaced.
We don't have data for all Dates.
Both things will make the animation part quite challenging.
ggplot(data = unique(events[, 4:5]), aes(x = Lon, y = Lat)) +
geom_point()
Furthermore, we only have 11 unique locations (== airports). This will probaly lead to overlapping data. Let's plot it by day:
ggplot(data = unique(events[, 3:5]), aes(x = Lon, y = Lat, label = Date)) +
geom_point() +
geom_text_repel()
Yup, this will be fun... Lots of things happening at that airport in the middle.
2. Basic animation
gga <- ggplot(data = events, aes(x = Lon, y = Lat)) +
geom_point() +
transition_time(Date)
animate(gga)
We used transition_time() and not transition_states(), because the former is used for linear time variables (e.g., second, day, year) and automatic interpolation, while the latter gives more manual control to the user.
3. Let's add colour
gga <- ggplot(data = events, aes(x = Lon, y = Lat, colour = ID)) +
geom_point() +
transition_time(Date)
animate(gga)
It's starting to look like something!
4. Add title, transparency, increase size
gga <- ggplot(data = events, aes(x = Lon, y = Lat, col = ID)) +
geom_point(size = 3, alpha = 0.5) +
transition_time(Date) +
labs(title = paste("Day", "{round(frame_time, 0)}"))
Note the rounded {round(frame_time, 0)}. Try using {frame_time} and see what happens!
5. Add some pizzaz
gga <- ggplot(data = events, aes(x = Lon, y = Lat, col = ID, group = ID,
shape = Event)) +
geom_point(size = 3, alpha = 0.5) +
transition_time(Date) +
labs(title = paste("Day", "{round(frame_time, 0)}")) +
shadow_wake(wake_length = 0.05)
animate(gga)
Looks good, let's finish it up!
6. Add the map, make animation slower, tweak some details
ggm <- ggmap(europe) +
geom_point(data = events,
aes(x = Lon, y = Lat,
colour = ID, group = ID, shape = Event),
size = 3, alpha = 0.8) +
transition_time(Date) +
labs(title = paste("Day", "{round(frame_time,0)}")) +
shadow_wake(wake_length = 0.1)
animate(ggm, fps = 24, duration = 16)
Not too shabby, eh? As a side note: animate(ggm, nframes = 384) would have had the same effect on the animation as fps = 24 with duration = 16.
If you have any question please do not hesitate to shoot me a comment.
I will try my best to help or clarify things.
I have the following data set:
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
With that, I create the following plot:
Data %>% mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+ theme_classic() +
geom_line(aes(x = date, y = prop, color = treated)) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Unfortunately the plot is pretty 'jumpy' and I would like to smooth it. I tried geom_smooth() but can't get it to work. Other questions regarding smoothing didn't help me because they missed the grouping aspect and therefore had a different structure. However, the example data set is in reality part of a larger data set so I need to stick to that code.
[Edit: the geom_smooth() code I tried is geom_smooth(method = 'auto', formula = y ~ x)]
Can someone point me into the right direction?
Many thanks and all the best.
Is this what you want by a smoothed line? You call geom_smooth with aesthetics, not in combination with geom_line. You can choose different smoothing methods, though the default loess with low observations is usually what people want. As an aside, I don't think this is necessarily nicer to look at than the geom_line version, and in fact is slightly less readable. geom_smooth is best used when there are many y observations for every x which makes patterns hard to see, geom_line is good for 1-1.
EDIT: After looking at what you're doing more closely, I added a second plot that doesn't directly calculate the treatment-date means and just uses geom_smooth directly. That lets you get a more reasonable confidence interval instead of having to remove it as before.
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
library(tidyverse)
Data %>%
mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Data %>%
mutate(treated = factor(group)) %>%
mutate(y = ifelse(y == "0", 0, 1)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Created on 2018-03-27 by the reprex package (v0.2.0).