How to create an animation of geospatial / temporal data - r

I have a set of data which contains around 150,000 observations of 800 subjects. Each observation has: subject ID, latitude, longitude, and the time that the subject was at those coordinates. The data covers a 24-hour period.
If I plot all the data at once I just get a blob. Is anyone able to give me some tips as to how I can animate this data so that I can observe the paths of the subjects as a function of time?
I've read the spacetime vignette but I'm not entirely sure it will do what I want. At this point I'm spending a whole lot of time googling but not really coming up with anything that meets my needs.
Any tips and pointers greatly appreciated!

Here my first use of animation package. It was easier than I anticipated and especially the saveHTML is really amazing. Here my scenario(even I think that my R-code will be clearer:)
I generate some data
I plot a basic plot for all persons as a background plot.
I reshape data to get to a wide format in a way I can plot an arrow between present and next position for each person.
I loop over hours , to generate many plots. I put the llop within the powerful saveHTML function.
You get a html file with a nice animation. I show here one intermediate plot.
Here my code:
library(animation)
library(ggplot2)
library(grid)
## creating some data of hours
N.hour <- 24
dat <- data.frame(person=rep(paste0('p',1:3),N.hour),
lat=sample(1:10,3*N.hour,rep=TRUE),
long=sample(1:10,3*N.hour,rep=TRUE),
time=rep(1:N.hour,each=3))
# the base plot with
base <- ggplot() +
geom_point(data=dat,aes(x=lat, y=long,colour = person),
size=5)+ theme(legend.position = "none")
## reshape data to lat and long formats
library(plyr)
dat.segs <- ddply(dat,.(person),function(x){
dd <- do.call(rbind,
lapply(seq(N.hour-1),
function(y)c(y,x[x$time %in% c(y,y+1),]$lat,
x[x$time %in% c(y,y+1),]$long)))
dd
})
colnames(dat.segs) <- c('person','path','x1','x2','y1','y2')
# a function to create the animation
oopt <- ani.options(interval = 0.5)
saveHTML({
print(base)
interval = ani.options("interval")
for(hour in seq(N.hour-1)){
# a segment for each time
tn <- geom_segment(aes(x= x1, y= y1, xend = x2,
yend = y2,colour = person),
arrow = arrow(), inherit.aes = FALSE,
data =subset(dat.segs,path==hour))
print(base <- base + tn)
ani.pause()
}
}, img.name = "plots", imgdir = "plots_dir",
htmlfile = "random.html", autobrowse = FALSE,
title = "Demo of animated lat/long for different persons",
outdir=getwd())

Your question is a bit vague, but I will share how I have done this kind of animation in the past.
Create a function that plots all the subject locations for one time slice:
plot_time = function(dataset, time_id) {
# make a plot with your favorite plotting package (e.g. `ggplot2`)
# Save it as a file on disk (e.g. using `ggsave`), under a regular name,
# frame001.png, frame002.png, see sprintf('frame%03d', time_index)
}
Call this function on each of your timeslices, e.g. using lapply:
lapply(start_time_id:stop_time_id, plot_time)
leading to a set of graphics files on the hard drive called frame001 to framexxx.
Use a tool to render those frames into a movie, e.g. using ffmpeg, see for example.
This is a general workflow, which has been already implemented in the animation package (thanks for reminding me #mdsummer). You can probably leverage that package to get your animation.

Related

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Trying to plot in tmap shapefile with attribute

I am trying to work with municipality data in Norway, and I'm totally new to QGIS, shapefiles and plotting this in R. I download the municipalities from here:
Administrative enheter kommuner / Administrative units municipalities
Reproducible files are here:
Joanna's github
I have downloaded QGIS, so I can open the GEOJson file there and convert it to a shapefile. I am able to do this, and read the data into R:
library(sf)
test=st_read("C:/municipality_shape.shp")
head(test)
I have on my own given the different municipalities different values/ranks that I call faktor, and I have stored this classification in a dataframe that I call df_new. I wish to merge this "classification" on to my "test" object above, and wish to plot the map with the classification attribute onto the map:
test33=merge(test, df_new[,c("Kommunekode_str","faktor")],
by=c("Kommunekode_str"), all.x=TRUE)
This works, but when I am to plot this with tmap,
library(tmap)
tmap_mode("view")
tm_shape(test33) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
it throws this error:
Error in object[-omit, , drop = FALSE] : incorrect number of
dimensions
If I just use base plot,
plot(test33)
I get the picture:
You see I get three plots. Does this has something to do with my error above?
I think the main issue here is that the shapes you are trying to plot are too complex so tmap is struggling to load all of this data. ggplot also fails to load the polygons.
You probably don't need so much accuracy in your polygons if you are making a choropleth map so I would suggest first simplifying your polygons. In my experience the best way to do this is using the package rmapshaper:
# keep = 0.02 will keep just 2% of the points in your polygons.
test_33_simple <- rmapshaper::ms_simplify(test33, keep = 0.02)
I can now use your code to produce the following:
tmap_mode("view")
tm_shape(test_33_simple) +
tm_fill(col="faktor", alpha=0.6, n=20, palette=c("wheat3","red3")) +
tm_borders(col="#000000", lwd=0.2)
This produces an interactive map and the colour scheme is not ideal to tell differences between municipalities.
static version
Since you say in the comments that you are not sure if you want an interactive map or a static one, I will give an example with a static map and some example colour schemes.
The below uses the classInt package to set up breaks for your map. A popular break scheme is 'fisher' which uses the fisher-jenks algorithm. Make sure you research the various different options to pick one that suits your scenario:
library(ggplot2)
library(dplyr)
library(sf)
library(classInt)
breaks <- classIntervals(test_33_simple$faktor, n = 6, style = 'fisher')
#label breaks
lab_vec <- vector(length = length(breaks$brks)-1)
rounded_breaks <- round(breaks$brks,2)
lab_vec[1] <- paste0('[', rounded_breaks[1],' - ', rounded_breaks[2],']')
for(i in 2:(length(breaks$brks) - 1)){
lab_vec[i] <- paste0('(',rounded_breaks[i], ' - ', rounded_breaks[i+1], ']')
}
test_33_simple <- test_33_simple %>%
mutate(faktor_class = factor(cut(faktor, breaks$brks, include.lowest = T), labels = lab_vec))
# map
ggplot(test_33_simple) +
geom_sf(aes(fill = faktor_class), size= 0.2) +
scale_fill_viridis_d() +
theme_minimal()

Extracting the exact coordinates of a mouse click in an interactive plot

In short: I'm looking for a way to get the exact coordinates of a series of mouse positions (on-clicks) in an interactive x/y scatter plot rendered by ggplot2 and ggplotly.
I'm aware that plotly (and several other interactive plotting packages for R) can be combined with Shiny, where a box- or lazzo select can return a list of all data points within the selected subspace. This list will be HUGE in most of the datasets I'm analysing, however, and I need to be able to do the analysis reproducibly in an R markdown format (writing a few, mostly less than 5-6, point coordinates is much more readable). Furthermore, I have to know the exact positions of the clicks to be able to extract points within the same polygon of points in a different dataset, so a list of points within the selection in one dataset is not useful.
The grid.locator() function from the grid package does almost what I'm looking for (the one wrapped in fx gglocator), however I hope there is a way to do the same within an interactive plot rendered by plotly (or maybe something else that I don't know of?) as the data sets are often HUGE (see the plot below) and thus being able to zoom in and out interactively is very much appreciated during several iterations of analysis.
Normally I have to rescale the axes several times to simulate zooming in and out which is exhausting when doing it MANY times. As you can see in the plot above, there is a LOT of information in the plots to explore (the plot is about 300MB in memory).
Below is a small reprex of how I'm currently doing it using grid.locator on a static plot:
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
locator <- function(p) {
# Build ggplot object
ggobj <- ggplot_build(p)
# Extract coordinates
xr <- ggobj$layout$panel_ranges[[1]]$x.range
yr <- ggobj$layout$panel_ranges[[1]]$y.range
# Variable for selected points
selection <- data.frame(x = as.numeric(), y = as.numeric())
colnames(selection) <- c(ggobj$plot$mapping$x, ggobj$plot$mapping$y)
# Detect and move to plot area viewport
suppressWarnings(print(ggobj$plot))
panels <- unlist(current.vpTree()) %>%
grep("panel", ., fixed = TRUE, value = TRUE)
p_n <- length(panels)
seekViewport(panels, recording=TRUE)
pushViewport(viewport(width=1, height=1))
# Select point, plot, store and repeat
for (i in 1:10){
tmp <- grid.locator('native')
if (is.null(tmp)) break
grid.points(tmp$x,tmp$y, pch = 16, gp=gpar(cex=0.5, col="darkred"))
selection[i, ] <- as.numeric(tmp)
}
grid.polygon(x= unit(selection[,1], "native"), y= unit(selection[,2], "native"), gp=gpar(fill=NA))
#return a data frame with the coordinates of the selection
return(selection)
}
locator(p)
and from here use the point.in.polygon function to subset the data based on the selection.
A possible solution could be to add, say 100x100, invisible points to the plot and then use the plotly_click feature of event_data() in a Shiny app, but this is not at all ideal.
Thanks in advance for your ideas or solutions, I hope my question was clear enough.
-- Kasper
I used ggplot2. Besides the materials at https://shiny.rstudio.com/articles/plot-interaction.html, I'd like to mention the following:
Firstly, when you create the plot, don't use "print( )" within "renderPlot( )", or the coordinates would be wrong. For instance, if you have the following in UI:
plotOutput("myplot", click = "myclick")
The following in the Server would work:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
p
})
But the clicking coordinates would be wrong if you do:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
print(p)
})
Then, you could store the coordinates by adding to the Server:
mydata = reactiveValues(x_values = c(), y_values = c())
observeEvent(input$myclick, {
mydata$x_values = c(mydata$x_values, input$myclick$x)
mydata$y_values = c(mydata$y_values, input$myclick$y)
})
In addition to X-Y coordinates, when you use facet with ggplot2, you refer to the clicked facet panel by
input$myclick$panelvar1

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

Resources