Automating great-circle map production in R - r

I've taken some of the things I learned in a Flowing Data great circle mapping tutorial and combined them with code linked in the comments to prevent weird things from happening when R plots trans-equatorial great circles. That gives me this:
airports <- read.csv("/home/geoff/Desktop/DissertationData/airports.csv", header=TRUE)
flights <- read.csv("/home/geoff/Desktop/DissertationData/ATL.csv", header=TRUE, as.is=TRUE)
library(maps)
library(geosphere)
checkDateLine <- function(l){
n<-0
k<-length(l)
k<-k-1
for (j in 1:k){
n[j] <- l[j+1] - l[j]
}
n <- abs(n)
m<-max(n, rm.na=TRUE)
ifelse(m > 30, TRUE, FALSE)
}
clean.Inter <- function(p1, p2, n, addStartEnd){
inter <- gcIntermediate(p1, p2, n=n, addStartEnd=addStartEnd)
if (checkDateLine(inter[,1])){
m1 <- midPoint(p1, p2)
m1[,1] <- (m1[,1]+180)%%360 - 180
a1 <- antipode(m1)
l1 <- gcIntermediate(p1, a1, n=n, addStartEnd=addStartEnd)
l2 <- gcIntermediate(a1, p2, n=n, addStartEnd=addStartEnd)
l3 <- rbind(l1, l2)
l3
}
else{
inter
}
}
# Unique months
monthyear <- unique(flights$month)
# Color
pal <- colorRampPalette(c("#FFEA00", "#FF0043"))
colors <- pal(100)
for (i in 1:length(monthyear)) {
png(paste("monthyear", monthyear[i], ".png", sep=""), width=750, height=500)
map("world", col="#191919", fill=TRUE, bg="black", lwd=0.05)
fsub <- flights[flights$month == monthyear[i],]
fsub <- fsub[order(fsub$cnt),]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$month)) {
air1 <- airports[airports$iata == fsub[j,]$airport1,]
air2 <- airports[airports$iata == fsub[j,]$airport2,]
p1 <- c(air1[1,]$long, air1[1,]$lat)
p2 <- c(air2[1,]$long, air2[1,]$lat)
inter <- clean.Inter(p1,p2,n=100, addStartEnd=TRUE)
colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
lines(inter, col=colors[colindex], lwd=1.0)
}
dev.off()
}
I'd like to automate the production of maps for a large dataset containing all scheduled commercial routes — dummy sample — shared between ATL and other airports in the global network (airports.csv is linked to in the Flowing Data post). Preferably, I'd produce one map per month that I would use as frame in a short video depicting changes in the Atlanta airport network space.
The problem: I can't get the loop to produce any more than one PNG—from only the first unique month in each CSV—each time I run it. I'm fairly certain Aaron Hardin's code 'breaks' the automation as it is used in the Flowing Data tutorial. After three days of messing with it and chasing down any relevant R how-to's, I realize I simply lack the chops to reconcile one with the other. Can anybody help me automate the process?
There's a dissertation acknowledgement in it for you!

Too much information for a comment, so I post an answer instead. Here is what I think (and read to the end to see what could potentially be the problem):
I have tried to run your code on the original data in the Flowing Data tutorial. (Obviously you have to add a column for monthly data, so I simply added this line to randomise the month:):
airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv",
header=TRUE)
flights <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv",
header=TRUE, as.is=TRUE)
# Add column with random data for month
flights$month <- sample(month.abb[1:4], nrow(flights), replace=TRUE)
Whenever I have a loop that takes a long time to run, I generally stick a bit of code in there that gives me a progress check. Use what takes your fancy: print, cat, tcltk::tkProgressBar. I use message:
for (i in 1:length(monthyear)) {
message(i)
#
# your code here
#
}
Anyway, I then ran your code. Everything works exactly as it should. Since I sampled four months worth of data, I get:
The message with the current iteration of i prints four times
Four png plots, each with a dark world map and bright yellow lines. Here is one of the four lines:
So, why does it work on my machine and not yours?
I can only guess, but my guess is that you haven't set the working directory. There is no setwd in your code, and the call to png just gives the filename. I suspect your code is being written to whatever your working directory is in your system.
By default, on my installation, the working directory is:
getwd()
[1] "C:/Program Files/eclipse 3.7"
To solve this, do one of the following:
Use setwd() to set your working directory at the top of your script.
Or use the full path and file name in your call to png()

Related

R (Geosphere Library) - "Error in if (antipodal(p1, p2)) { : missing value where TRUE/FALSE needed"

I have an issue that I have tried for seven hours to fix without any success. Basically, I am trying to visualize the airport and flight data from openflights.org. This was supposed to be a simple, straightforward visualization, but it has turned in to a real hair puller.
Everything works up until the for loop. When I try and run the for loop in order to depict the flight paths on my map, it runs for a little and some lines appear on the map, but then it quits with the error:
Error in if (antipodal(p1, p2)) { : missing value where TRUE/FALSE needed
What I tried to do to fix it: As you can see, I have gone through the data set and removed any bad entries. For example, there were some IDs that were "\\N" and therefore I removed those entries entirely. Then I tried to change the IDs to be numbers rather than strings, just to see what would happen.
It always errors out around the same time after I run the for loop, is there a way I can view the line it got to when it returned the error? I am new too R.
I know a similar question has been asked before but none of the solutions they had there worked for me. Thank you for any help you can provide.
Code:
library('geosphere')
library('maps')
# Reading in the files from openflights.org
airports <- read.csv("airports.dat.txt", header=FALSE, col.names=c("Airport ID","Name","City","Country","IATA","ICAO","Latitude","Longitude",
"Altitude","Timezone","DST", "TZ Database","Type","Source"), as.is=TRUE)
flights <- read.csv("routes.dat.txt", header=FALSE, col.names=c("Airline","Airline ID","Source Airport","Source Airport ID","Destination Airport",
"Destination Airport ID","Codeshare","Stops","Equipment"), as.is=TRUE)
# Cleaning the data set, there are some instances of the value "\\N" in the flights data set.
flights <- flights[!grepl("\\N", flights$Source.Airport.ID),]
flights <- flights[!grepl("\\N", flights$Destination.Airport.ID),]
# Converting all of the IDs to numbers (I thought this might work but it did not)
flights$Source.Airport.ID <- as.numeric(flights$Source.Airport.ID)
flights$Destination.Airport.ID <- as.numeric(flights$Destination.Airport.ID)
airports$Airport.ID <- as.numeric(airports$Airport.ID)
# Creating a world background
map("world", col="white", border="gray10", fill=TRUE, bg="gray30")
# Adding all of the airports as points
points(x=airports$Longitude, y=airports$Latitude, pch=19,
cex=0.05, col="blue")
# Generating a color set to be used for mapping the flight paths
col.1 <- adjustcolor("limegreen", alpha=0.05)
col.2 <- adjustcolor("darkgreen", alpha=0.05)
edge.pal <- colorRampPalette(c(col.1, col.2), alpha = TRUE)
edge.col <- edge.pal(100)
# Now, generating the visualization of the flight paths.
# Here is where the error occurs, when I run this.
# It gets through some of the data but then errors out.
for(i in 1:nrow(flights)) {
node1 <- airports[airports$Airport.ID == flights[i,]$Source.Airport.ID,]
node2 <- airports[airports$Airport.ID == flights[i,]$Destination.Airport.ID,]
arc <- gcIntermediate( c(as.numeric(node1[1,]$Longitude), as.numeric(node1[1,]$Latitude)),
c(as.numeric(node2[1,]$Longitude), as.numeric(node2[1,]$Latitude)),
n=1000, addStartEnd=TRUE)
#edge.ind <- round(100*table(flights[i,]$Source.Airport.ID) / table(max(airports$Airport.ID)))
#lines(arc, col=edge.col[edge.ind], lwd=edge.ind/30)
lines(arc, col = "limegreen", lwd = 0.02)
}

How can I add directional arrows to lines drawn on a map in R?

I've got a map that I've built using the maps and geosphere packages that looks like an airline map. However, I'd like to add arrows to the lines to show the directions of the "routes" in my map. You can see my current working code below (based off of the fabulous tutorial from FlowingData). I've tried before to use the arrows function in lieu of the lines function, yet I'm not sure how to make the arrows go with the geosphere curve, or ensure that the arrows are spaced along the line so that they look like this:
-->-->-->
I'm incredibly new to R, so any and all assistance would be greatly appreciated. Thanks in advance.
library(maps)
library(geosphere)
read.csv("http://marsiccr.github.io/Data/airports.csv", header=TRUE, as.is=TRUE) -> airports
read.csv("http://marsiccr.github.io/Data/leaders.csv", header=TRUE, as.is=TRUE) -> flights
pal <- colorRampPalette(c("#f2f2f2", "blue"))
colors <- pal(100)
colleges<-NULL
colleges$name <- airports$insname
colleges$long <- airports$long
colleges$lat <- airports$lat
colleges
map("state")
map("state", col="#f2f2f2", fill=TRUE, bg="white", lwd=0.25)
fsub <- flights[flights$type == "aau",]
fsub <- fsub[order(fsub$cnt),]
maxcnt <- max(fsub$cnt)
for (j in 1:length(fsub$type)) {
air1 <- airports[airports$unitid == fsub[j,]$school1,]
air2 <- airports[airports$unitid == fsub[j,]$school2,]
inter <- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
colindex <- round( (fsub[j,]$cnt / maxcnt) * length(colors) )
lines(inter, col=colors[colindex], lwd=0.8)
}
Slipping this code into the for-loop just after inter<- got me arrowheads (and a few warnings)
tinter <- tail(inter,2)
arrows(tinter[1,1], tinter[1,2], tinter[2,1], tinter[2,2])
Obviously there's some tweaking to be done. See ?arrowsfor the full range of options. You could also use the second to last (or the fifth to last?) points in the inter matrix. You might also want to onlyy put in arrowheads for selected routes.

Variable class different within function?

In order to streamline future data analysis, I'm trying to write a script that will identify the different self-report scales included in a data.frame and perform routine analyses on each scale's items. Currently, I want it to identify which scales are present, find the responses for each of the scale's items, and then calculate the Cronbach's Alphas for each scale.
Everything seems to be working except when I run my function that should produce a list of alpha() outputs for each scale I get the following error:
> Cronbach.Alphas(scales.data, scale.names)
Error in alpha(data[, responses[[i]]]) :
Data must either be a data frame or a matrix
Obviously I know that this is saying the information being given to the alpha() function is not a data.frame or matrix. The reason I'm so confused though is that when I do these calculations manually step-by-step outside of my Cronbach.Alphas() function, it clearly tells me that it is a data.frame and seems to work like a charm:
> class(scales.data[,responses[[1]]])
[1] "data.frame"
This is driving me crazy and I'll be extremely appreciative of any help with figuring this out. My full code is pasted below. (Note: I'm pretty new to programming functions in R so the way I'm doing things is probably not optimal. Any additional advice is welcome as well.)
Also, it might help to mention that my code is designed to identify scale names based on the presence of an underscore in a column name. That is, "rsq_12" indicates the scale as rsq and the column as responses to item 12 of the scale.
require(psych)
##### Function for identifying names of scales present in the data file #####
GetScales <- function(x) {
find.scale.names <- regexec("^(([^_]+)_)", colnames(x))
scales <- do.call(rbind, lapply(regmatches(colnames(x), find.scale.names), `[`, 3L))
colnames(scales) <- "scale"
na.find <- ifelse(is.na(scales[,1]), 0, 1)
scales <- cbind(scales, na.find)
output <- scales[scales[,2] == 1,]
output[,1]
}
##### Function for calculating cronbach's alpha for each scale #####
Cronbach.Alphas <- function(data, scales){
for(i in 1:length(scales)){
if(i == 1) {
responses <- list(grep(scales[i], colnames(data)))
alphas <- list(alpha(data[,responses[[i]]]))
} else {
responses <- append(responses, list(grep(scales[i], colnames(data))))
alphas <- append(alphas, list(alpha(data[,responses[[i]]])))
}
}
return(alphas)
}
### Import data from .csv file ###
scales.data <- data.frame(read.csv(file.choose()))
### Identify each item's scale ###
scale.items <- GetScales(scales.data)
### Reduce to names of scales ###
scale.names <- cbind(scale.items, !duplicated(scale.items))
scale.names <- scale.names[scale.names[,2] == TRUE, 1]
scale.names
### Calculate list of alphas ###
Cronbach.Alphas(scales.data, scale.names)
Thank you to anyone who has taken the time to look over my code. I appreciate your help. I was working off of the suggestions left here when I realized a simple mistake on my part...
One of the scales in the dataset that I've been using as a test while working on this script had only one item in it. Thus, data[,responses[[i]]] in my Cronbach.Alphas() function was passing a vector (rather than a data.frame or matrix) to the alpha() function at that point in the for loop. It is impossible to calculate cronbach alpha for a single item scale because it is an index of inter-item reliability...
Sooooo, all my code needed was a way to identify scales with just one item:
Cronbach.Alphas <- function(data, scales){
for(i in 1:length(scales)){
if(i == 1) {
responses <- list(grep(scales[i], colnames(data)))
if(length(responses[[i]]) > 1){
alphas <- list(alpha(data[,responses[[i]]]))
}
} else {
responses <- append(responses, list(grep(scales[i], colnames(data))))
if(length(responses[[i]]) > 1){
alphas <- append(alphas, list(alpha(data[,responses[[i]]])))
}
}
}
return(alphas)
}
Sorry for wasting anyone's time with my mistake. On the plus side, by substituting this new Cronbach.Alphas() function into the script above, I've now posted a script that will automatically identify scales and produce a list of cronbach's alphas (provided the columns are named with an underscore after the scale names) for anyone who might interested. Thanks again!

Long vector-plot/Coverage plot in R

I really need your R skills here. Been working with this plot for several days now. I'm a R newbie, so that might explain it.
I have sequence coverage data for chromosomes (basically a value for each position along the length of every chromosome, making the length of the vectors many millions). I want to make a nice coverage plot of my reads. This is what I got so far:
Looks alright, but I'm missing y-labels so I can tell which chromosome it is, and also I've been having trouble modifying the x-axis, so it ends where the coverage ends. Additionally, my own data is much much bigger, making this plot in particular take extremely long time. Which is why I tried this HilbertVis plotLongVector. It works but I can't figure out how to modify it, the x-axis, the labels, how to make the y-axis logged, and the vectors all get the same length on the plot even though they are not equally long.
source("http://bioconductor.org/biocLite.R")
biocLite("HilbertVis")
library(HilbertVis)
chr1 <- abs(makeRandomTestData(len=1.3e+07))
chr2 <- abs(makeRandomTestData(len=1e+07))
par(mfcol=c(8, 1), mar=c(1, 1, 1, 1), ylog=T)
# 1st way of trying with some code I found on stackoverflow
# Chr1
plotCoverage <- function(chr1, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr1[start:end]), type="l")
}
plotCoverage(chr1, start=1, end=length(chr1)) # Plots coverage result.
# Chr2
plotCoverage <- function(chr2, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr2[start:end]), type="l")
}
plotCoverage(chr2, start=1, end=length(chr2)) # Plots coverage result.
# 2nd way of trying with plotLongVector
plotLongVector(chr1, bty="n", ylab="Chr1") # ylab doesn't work
plotLongVector(chr2, bty="n")
Then I have another vector called genes that are of special interest. They are about the same length as the chromosome-vectors but in my data they contain more zeroes than values.
genes_chr1 <- abs(makeRandomTestData(len=1.3e+07))
genes_chr2 <- abs(makeRandomTestData(len=1e+07))
These gene vectors I would like plotted as a red dot under the chromosomes! Basically, if the vector has a value there (>0), it is presented as a dot (or line) under the long vector plot. This I have not idea how to add! But it seems fairly straightforward.
Please help me! Thank you so much.
DISCLAIMER: Please do not simply copy and paste this code to run off the entire positions of your chromosome. Please sample positions (for example, as #Gx1sptDTDa shows) and plot those. Otherwise you'd probably get a huge black filled rectangle after many many hours, if your computer survives the drain.
Using ggplot2, this is really easily achieved using geom_area. Here, I've generated some random data for three chromosomes with 300 positions, just to show an example. You can build up on this, I hope.
# construct a test data with 3 chromosomes and 100 positions
# and random coverage between 0 and 500
set.seed(45)
chr <- rep(paste0("chr", 1:3), each=100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
require(ggplot2)
p <- ggplot(data = df, aes(x=pos, y=cov)) + geom_area(aes(fill=chr))
p + facet_wrap(~ chr, ncol=1)
You could use the ggplot2 package.
I'm not sure what exactly you want, but here's what I did:
This has 7000 random data points (about double the amount of genes on Chromosome 1 in reality). I used alpha to show dense areas (not many here, as it's random data).
library(ggplot2)
Chr1_cov <- sample(1.3e+07,7000)
Chr1 <- data.frame(Cov=Chr1_cov,fil=1)
pl <- qplot(Cov,fil,data=Chr1,geom="pointrange",ymin=0,ymax=1.1,xlab="Chromosome 1",ylab="-",alpha=I(1/50))
print(pl)
And that's it. This ran in less than a second. ggplot2 has a humongous amount of settings, so just try some out. Use facets to create multiple graphs.
The code beneath is for a sort of moving average, and then plotting the output of that. It is not a real moving average, as a real moving average would have (almost) the same amount of data points as the original - it will only make the data smoother. This code, however, takes an average for every n points. It will of course run quite a bit faster, but you will loose a lot of detailed information.
VeryLongVector <- sample(500,1e+07,replace=TRUE)
movAv <- function(vector,n){
chops <- as.integer(length(vector)/n)
count <- 0
pos <- 0
Cov <-0
pos[1:chops] <- 0
Cov[1:chops] <- 0
for(c in 1:chops){
tmpcount <- count + n
tmppos <- median(count:tmpcount)
tmpCov <- mean(vector[count:tmpcount])
pos[c] <- tmppos
Cov[c] <- tmpCov
count <- count + n
}
result <- data.frame(pos=pos,cov=Cov)
return(result)
}
Chr1 <- movAv(VeryLongVector,10000)
qplot(pos,cov,data=Chr1,geom="line")

Plotting during a loop in RStudio

I am implementing a solution to the Traveling Salesman Problem (TSP) in R (simulated Annealing) and I want to output the current best path periodically. I have searched quite a bit for how to output plots during a for loop and have thus far failed.
I use RStudio, and want to see the graphs as they are generated. If you have ever watched TSP solvers do their thing, you will understand how cool it is to watch. Here is a sample of the graphics output I want to see http://www.staff.science.uu.nl/~beuke106/anneal/anneal.html
I don't think that the memory usage will be a problem (during about 500,000 iterations, I am only expecting 50-100 plots). Here is a sample function, where we would expect to see 10 different plots during the time the function runs:
Plotz <- function(iter = 1000000, interval = 100000) {
x <- 1:10
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
plot(x, y)
}
}
return(c(x, y))
}
Plotz()
When I run this, all I see is the final plot produced (in RStudio). How can I see the plots as they're generated?
Also: I am on Ubuntu (whatever the newest stable release is). Don't know if that is relevant.
Thank you everyone in advance.
EDIT: Per Captain Murphy's suggestion, I tried running this in the Linux terminal, and the graphics appeared. I still think the question of "How to do this in RStudio?" Is still relevant, however. It's such a good program, so maybe someone has an idea of what could be done to get this to work?
EDIT2: As Thilo stated, this is a known bug in Rstudio. If anyone has any other ideas to solve this without the software itself being fixed, then there is still something to discuss. Otherwise, consider this question solved.
Calling Sys.sleep(0) should cause the plot to draw. Unlike the X11 solution, this will work on server versions of RStudio as well.
(I was surprised that dev.flush() did not give the result you were hoping for, that might be a bug.)
One thing you can do is open a x11 window and plot in there:
x11()
Plotz()
That should work the same as running it in terminal.
Following up on #JoeCheng's answer and #RGuy's comment on that answer: as I worked out with the RStudio folks, the problem seems to primarily arise when there is too much plotting going on in too short a timespan. The solution is twofold:
Sys.sleep(0) helps force an update to the plotting window.
Plotting updates every Wth loop rather than every loop.
For instance, on my computer (i7, RStudio Server), the following code does not update until the loop completes:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
points( x[i]~i )
Sys.sleep(0)
}
The following code updates in real-time, despite having the same number of points to be plotted:
N <- 1000
x <- rep(NA,N)
plot(c(0,1)~c(0,N), col=NA)
for(i in seq(N)) {
Sys.sleep(.01)
x[i] <- runif(1)
iseq <- seq(i-99,i)
if(i%%100==0) {
points( x[iseq]~iseq )
Sys.sleep(0)
}
}
In other words, it's the number of calls the plot that seems to matter, not the amount of data to be plotted.
If you want to save the plots as well you could just open a new device in the loop and close it afterwards.
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
png(file=paste(i,"png",sep="."))
p <- p + 1; plot(x, y)
dev.off()
}
}
return(c(x, y))
}
Plotz <- function(iter = 1000, interval = 100) {
x <- 1:10
p <- 0 #plot number
for(i in 1:iter){
y <- runif(10)
if(i %% interval == 0) {
p <- p + 1; plot(x, y)
readline("Please press the Enter key to see the next plot if there is one.")
}
}
return(c(x, y))
}
Plotz()
You can also use the back arrows on the plots tab of the lower left pane of the RStudio interface in order to view the plots.
You can use the animate package to layer your plots into a GIF.

Resources