Calculating distance of multiple coordinates in R

Calculating distance of multiple coordinates in R - r

As written in the title I want to calculate the distance from all gas stations in my hometown to the two motorway accesses around here using the package osrm.
stations_ms contains Latitude & Longitude for the gas stations and highway_ms.df contains the lat & long for the motorway accesses.
Calculating the distance for just one row of my dataset is no problem, but I am not able to create a loop/function, which does it for every row.
Here is my code:
route4 <- osrmRoute(src = c(stations_ms$longitude[1], stations_ms$latitude[1]),
dst = highway_ms.df[1,],
overview = "FALSE")
for (i in 1:nrow(stations_ms)) {
route[i] <- osrmRoute(src = c(stations_ms$longitude[i], stations_ms$latitude[i]),
dst = highway_ms.df[1,],
overwiew = "FALSE")
}
```
Maybe someone can help me :)

Here is a workable example that might be helpful.
The overview in osrmRoute has the following options:
"full", "simplified" or FALSE. Use "full" to return the detailed
geometry, use "simplified" to return a simplified geometry, use FALSE
to return only time and distance.
If you only want time and distance, using FALSE should work fine. My comment was in regard to spelling (had a "w" instead of a "v").
I made up some example data:
my_points <- data.frame(
id = 1:3,
longitude = c(13.4, 13.5, 13.3),
latitude = c(52.4, 52.5, 52.3)
)
And wanted to find distances to a pharmacy in Berlin (using apotheke.df that comes with the osrm package). You could do:
library(osrm)
route <- list()
for (i in 1:nrow(my_points)) {
route[[i]] <- osrmRoute(src = c(my_points$longitude[i], my_points$latitude[i]),
dst = apotheke.df[1,],
overview = FALSE)
}
This starts with an empty list called route. Then, we fill in each list element with both time and duration. The end result is the following list:
R> route
[[1]]
duration distance
20.56 11.77
[[2]]
duration distance
17.38 7.63
[[3]]
duration distance
33.12 27.45
Which can be converted to a matrix or data frame (in this case, I made a matrix):
R> do.call(rbind, route)
duration distance
[1,] 20.56 11.77
[2,] 17.38 7.63
[3,] 33.12 27.45

Related

R: How to construct an index which states for each country pair the correlation between the yes votes?

I have a dataset which states for each UN resolution the country and the vote:
ResolutionID: 1,2,3,...
Country: US, CA, MX, ...
vote: yes, no, abstain
Dataset
I want to create an variable calculating for each country pair (e.g. US-CA, US-MX, MX-CA,...) the correlation of their voting records. Thus, providing an index for each country pairs friendship or strategic alliance.
What R Code do I have to used?
Citeation of the dataset: Erik Voeten "Data and Analyses of Voting in the UN General Assembly" Routledge Handbook of International Organization, edited by Bob Reinalda (published May 27, 2013)
What R Code do I have to used?

Using the algorithm for calculating the 'index of agreement' as proposed by Lijphart (1963), the below might be what you're after.
## set up sample data
set.seed(32446)
test = data.frame(rcid = rep(seq(3), each=10), country = rep(letters[seq(10)], 3), vote = sample(c("yes","no"),30, replace = T))
test$vote[sample(seq(30),3)] = "abstain" # add in a few abstentions
test = test[-sample(seq(30), 2), ] # remove some as missing
test
## set up the comparison df
allCountries = unique(test$country)
compdf = outer(allCountries, allCountries, paste)
compdf = data.frame(compdf[which(lower.tri(compdf))])
names(compdf) = "comp"
## cycle through the resolutions - use scoring as per Lijphart (1963): agreement by averaging the scores of 1 if there is agreement, 0 if the vote is opposite, and 0.5 if only one country abstains.
for(r in unique(test$rcid)){
tempVotes = data.frame(countries = allCountries,
test$vote[which(test$rcid==r)][match(allCountries, test$country[which(test$rcid==r)])])
tempVotes = outer(tempVotes[,2], tempVotes[,2],
FUN=function(x,y){
ifelse(is.na(x) | is.na(y), NA, #NA if one country didn't vote
ifelse(x==y,1, ## 1 if they agree
ifelse(x=="abstain" | y=="abstain", 0.5, # 0.5 if one side abstains
0)
) # zero otherwise
)
}
)
compdf = cbind(compdf, tempVotes[which(lower.tri(tempVotes) )] )
names(compdf)[ncol(compdf)] = paste0("resolution_",r)
}
## calculate the mean score across resolutions
result = data.frame(comp = compdf$comp,
result = rowMeans(compdf[,seq(2, ncol(compdf)) ], na.rm=T)
)
result$result = 1-result$result # make it into a distance score, rather than agreement score
## create distance matrix and plot Dendrogram
library(ggplot2)
library(ggdendro)
distance = matrix(NA, length(allCountries), length(allCountries))
distance[which(lower.tri(distance))] = result$result
rownames(distance) = allCountries; colnames(distance) = allCountries
distance
cluster = hclust(as.dist(distance))
ggdendrogram(cluster, rotate = FALSE, size = 2)
Lijphart, A. The Analysis of Bloc Voting in the General Assembly: A Critique and a Proposal. The American Political Science Review Vol. 57, No. 4 (Dec., 1963), pp. 902-917 https://www.jstor.org/stable/1952608

Using terra and sf in R: Why am I getting illogical distance measurements?

I am using terra to get "curvy" distances between points within a bounding polygon and comparing those to straight-line distances that ignore the polygon. The results I'm getting back don't make sense, and I am hoping you all could help me figure out what is going on.
We load the US Congressional map used in the 114th Congress for the state of Texas first:
texas = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
ggplot() + geom_sf(data = texas$geometry)
We also make some storage objects:
longest.dist.district.straight = rep(NA, 36)
longest.dist.district.curved = rep(NA, 36)
Then, we go district by district (n = 36). For each, we take a sample of 100 random points within that district's polygon. Then, we ask "What is the longest straight-line distance between any two of our 100 points?" We then rasterize the polygon, mask it, and go point by point, asking "How far is this point from all others, assuming we cannot travel outside the polygon?" This means we'll have to bend around within the polygon to get between the points some of the time. We find the longest such distance between any two points. We then compare the straight-line and curvy-line approaches, with the assumption that the curvy-line approaches will always be longer by some amount...
for(c in 1:36) { #Texas had 36 districts.
if(c %% 3 == 0) {print(c)} # Progress bar
this.district = texas[c, ] #Get the current district
#We'll get a sample of 100 randomly placed points around the district.
rand.ptsDistrict = sf::st_sample(this.district,
size = 100,
type = 'random',
exact = TRUE)
#What's the max straight-line distance between any two points?
longest.dist.district.straight[c] = max(sf::st_distance(rand.ptsDistrict))
#Now, calculate our 'as the politician would walk' distances (aka curvy distances). We need to do this for each of our 100 points separately, with each as the target point in turn, and save the longest value we get...
current.raster = terra::ext(this.district) # Rasterizing
current.raster = terra::rast(current.raster,
nrow=100, ncol=100,
crs = crs(this.district),
vals = 1)
current.raster = terra::mask(current.raster, # Masking
terra::vect(this.district),
updatevalue = NA)
point.locs = terra::cellFromXY(current.raster, # Getting point locations in the new grid
sf::st_coordinates(rand.ptsDistrict))
longest.dists.i = rep(NA, 100) # Storage object
for(i in 1:100) {
point.i.loc = cellFromXY(current.raster, #Focal point this time.
st_coordinates(rand.ptsDistrict[i]))
point.noni.loc = cellFromXY(current.raster, #All other points
st_coordinates(rand.ptsDistrict[-i]))
terra::values(current.raster)[point.i.loc] = 2 # Make focal point the target value
all.dists = terra::gridDistance(current.raster, #Get all distances to the target value
target = 2, scale = 1)
longest.dists.i[i] = max(values(all.dists)[point.noni.loc], na.rm=TRUE) # Find the longest of these for this point and store it.
terra::values(current.raster)[point.i.loc] = 1
}
longest.dist.district.curved[c] = max(longest.dists.i) # Find the longest curved distance between any two points in the current district.
}
When I do this, I always get straight-line distances that are strictly longer than the curvy distances from the same district, which doesn't logically make sense--how could a straight line between two points ever be longer than a curvy line between them?
> (cbind(longest.dist.district.straight, longest.dist.district.curved))
longest.dist.district.straight longest.dist.district.curved
[1,] 239285.77 121703.64
[2,] 63249.88 48238.89
[3,] 49495.09 24823.91
[4,] 290542.38 147894.80
[5,] 213758.13 108663.63
[6,] 129261.83 68351.77
[7,] 36705.18 22081.22
[8,] 165759.58 87749.33
[9,] 38317.61 19903.54
[10,] 196211.38 100959.66
[11,] 505130.81 261479.58
[12,] 79502.87 45134.11
[13,] 604901.43 313317.24
[14,] 201724.57 115286.81
[15,] 414257.14 208204.75
[16,] 61867.34 32115.77
[17,] 193198.96 103829.75
[18,] 41693.26 26462.02
[19,] 433902.07 225041.00
[20,] 32201.45 17060.41
[21,] 212300.45 119597.54
[22,] 88143.49 46720.59
[23,] 777236.95 394663.54
[24,] 39692.06 21192.98
[25,] 299336.81 153871.46
[26,] 65901.64 35200.83
[27,] 272822.43 158724.70
[28,] 362477.84 205297.74
[29,] 40210.19 30094.43
[30,] 44693.37 23430.33
[31,] 93781.16 50340.85
[32,] 38941.81 21047.40
[33,] 52395.85 31169.46
[34,] 394586.71 206545.50
[35,] 138182.61 73556.10
[36,] 223351.15 112601.38
I can only guess I have either messed up the code somewhere or else have found a bug. Please help! Thanks!
Edit: I just noticed after posting this that it looks like if I were to multiply the curvy distances by 2, I'd get values that were believable (the curvy distances are always longer but by a variable amount)--but I don't see a coding reason to need to do this...can anyone else see one I'm missing?

You are comparing the shortest-distance ("as the crow flies" to those who have not seen crows fly) with the grid-distance (move from the center of a cell to the center of a neighboring cell), only allowing to use the grid cells that fall within a district.
When I run a condensed version of your code, I see that the distances are very similar, with the grid distance always longer, as they should be, except for district 14 as that district is not contiguous.
library(terra)
#terra 1.6.47
texas <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
tex <- vect(texas)
# generate random points
set.seed(0)
b <- spatSample(tex[, "DISTRICT"], size = 100, method="random", strata=1:nrow(tex))
# max distance between any two random points by district.
pdist <- sapply(tex$DISTRICT, \(i) max( distance( b[b$DISTRICT == i, ])) )
# max grid distance between any two random points by district.
pgrid <- rep(NA, nrow(tex))
for (i in 1:nrow(tex)) {
r <- rast(tex[i,], nrow=100, ncol=100)
r <- rasterize(tex[i,], r)
xy <- crds(b[b$DISTRICT==i, ])
cells <- cellFromXY(r, xy)
maxdists <- rep(NA, 100)
for(j in 1:100) {
r[cells[j]] <- 2
dists <- gridDist(r, target=2)
# Find the longest of these for this point
maxdists[j] <- max( dists[ cells[-j] ], na.rm=TRUE)
r[cells[j]] <- 1
}
pgrid[i] <- max(maxdists)
}
The results look good:
head(cbind(pdist, pgrid))
# pdist pgrid
#1 217746.46 223906.22
#2 61707.87 99422.07
#3 50520.61 51479.98
#4 282744.13 293656.59
#5 196074.08 202014.45
#6 120913.60 126532.72
plot(pdist, pgrid)
abline(0, 1, col="red")
If your results are different you are perhaps using an older version of "terra"? I assume you are because you are using gridDistance which works with a warning because it was renamed to gridDist in the current version.
You use different grid cell sizes for each district. I do not know what your goal is, but it might be more reasonable to use a single template raster for all of Texas. You could do something like
# outside the loop
rr <- rast(tex, res=1/60, vals=1)
# inside the loop
r <- crop(rr, tex[i,], mask=TRUE)

Find the nearest neighbor of coordinates in 2 separate matrices and the distance between the 2 in r

I have 2 dataframes that are simply matrices of 2 dimensions (lat/long). Both dataframes would look like the input below:
latitude longitude
27.78833 -82.28197
27.79667 -82.29294
Let's call them "dfref" and "dfnew". I would like to find the nearest point in dfnew for each point in dfref and the distance between the 2 points in meters.
The output would look like this:
dr.latitude dr.longitude dn.latitude dn.longitude dist
27.78833 -82.28197 27.54345 -82.33233 162.34
27.79667 -82.29294 27.56543 -82.12323 232.23
I have tried using the knn function in the class package and the Searchtrees package but my script only found the nearest points in the dfref matrix and I am not sure how to add the measurement.
knn1(train=cbind(dfref), test=cbind(dfnew), cl=seq_len(nrow(dfnew)))
Is there a function that does both efficiently and how can I get this into one script?

I am not expert on Geo math, but it seems that you can start with something like this:
dfref <- read.table(text =
"latitude longitude
27.78833 -82.28197
27.79667 -82.29294", header = T)
dtref <- data.table(dfref)
dfnew <- read.table(text =
"latitude longitude
27.54345 -82.33233", header = T)
dtnew <- data.table(dfnew)
# Make cartesian product of to tables.
dtref$fake <- 1
dtnew$fake <- 1
dtall <- merge(dtref, dtnew, by = "fake", allow.cartesian = T)
# Calculate distance.
library(geosphere)
dtall[, distance := distVincentyEllipsoid(c(longitude.x, latitude.x), c(longitude.y, latitude.y)), by = 1:nrow(dtall)]
# Print results.
dtall[, .(latitude.x, longitude.x, latitude.y, longitude.y, distance)]
# latitude.x longitude.x latitude.y longitude.y distance
# 1: 27.78833 -82.28197 27.54345 -82.33233 27587.29
# 2: 27.79667 -82.29294 27.54345 -82.33233 28328.19

Distance matrix

I am trying to determine the distance between everypoint in one data set vs the other data set in R. Each data set has an X and Y parameter. I have been converting the data sets into data frames and the finding the distance. However my current code creates a large matrix to due this listing both the data sets as columns and rows. I then need to identify a specific part of the matrix I care about to get my answers, Is there a way just to put DSA as the columns and DSB as the rows. this whould cut the matrix in 1/4 which since my data sets contain thousands of points each whould really cut down the time for the algorithum to run
Here is the code I am using
tumor<-data.frame(DSA[,c ("X_Parameter","Y_Parameter")])
cells<-data.frame(DSB[,c ("X_Parameter","Y_Parameter")])
distances<-as.matrix(dist(rbind(tumor,cells)))
row.start<-nrow(tumor)+1
row.end<-nrow(tumor)+nrow(cells)
col.start<-1
col.end<-nrow(tumor)
distances[row.start:row.end, col.start:col.end]
d<- distances[row.start:row.end, col.start:col.end]

Try flexclust::dist2:
n_tumor = 2000
n_cells = 2000
tumor = matrix(runif(n_tumor * 2), n_tumor, )
cells = matrix(runif(n_cells * 2), n_cells, )
t_dist = system.time({
distances<-as.matrix(dist(rbind(tumor,cells)))
row.start<-nrow(tumor)+1
row.end<-nrow(tumor)+nrow(cells)
col.start<-1
col.end<-nrow(tumor)
d <- distances[row.start:row.end, col.start:col.end]
})[3]
require(flexclust)
t_dist2 = system.time({d2 = dist2(x = cells, y = tumor, method = "euclidean")})[3]
t_dist # 1.477
t_dist2 # 0.244
identical(unname(d), d2) # TRUE
EDIT:
Another alternative is proxy::dist.

This will compute only the portion of the matrix you need:
tumoridx <- rep(1:nrow(tumor), each=nrow(cells)
cellsidx <- rep(1:nrow(cells), nrow(tumor))
tcdist <- matrix(sqrt(rowSums((tumor[tumoridx, ] - cells[cellsidx, ])^2)),
nrow(cells), nrow(tumor))

solaR timestamp for radiation on a tilted surface

I am trying to use the R package solaR to calculate irradiance on a tilted plane given measured irradiance on the horizontal plane. I can get the code to work, but the final output timestamp does not make sense.
Data for this code can be found here. It is one day's worth of measured irradiance (global horizontal -- ghz, direct normal -- dir, diffuse horizontal -- dhz, and outdoor temp ta) for Austin, TX. The timestamp is local 'CST6CDT' time. The data is for a clear day, so that maximum value of global horizontal (ghz) should roughly correspond with solar noon (the time that the sun crosses the local meridian).
My code is as follows:
library(solaR)
sol_data <- read.csv(file)
# The data must be named a certain way.
names(sol_data) <- c('time', 'G0', 'B', 'D0', 'Ta')
# The negatives are an artifact of the sensor and are set to 0.
sol_data$G0 <- ifelse(sol_data$G0 < 0, 0, sol_data$G0)
sol_data$B <- ifelse(sol_data$B < 0, 0, sol_data$B)
sol_data$D0 <- ifelse(sol_data$D0 < 0, 0, sol_data$D0)
# This calculates the beam incidence on the horizontal plane.
sol_data$B0 <- sol_data$G0 - sol_data$D0
sol_data$B0 <- ifelse(sol_data$B0 < 0, 0, sol_data$B0)
# This takes the data and assigns the timestamp to a certain format and timezone
idxLocal <- with(sol_data, as.POSIXct(time, format='%Y-%m-%d %H:%M:%S', tz = 'CST6CDT'))
# This converts the timestamp to solar time
idx <- local2Solar(idxLocal, lon = -97.7428)
# Creates a zoo object needed to make the Meteo file for input
z <- zoo(sol_data[,c('G0', 'D0', 'B0', 'Ta')], idx)
# local latitude
lat = 30.2669
# Creates a Meteo file
My_Meteo <- zoo2Meteo(z, lat=lat)
# Finds the start and end date of the input file
start <- idx[1]
end <- idx[length(idx)]
# Returns a base time for the calculations
BTd <- fBTd(mode = 'serie', year = '2013', start = start, end = end, format = '%Y-%m-%d %H:%M:%S')
# Computes the movement of the sun/earth
sol <- calcSol(lat = 30.2669, BTd, sample = 'min')
# Creates a G0 file for solar rad on horizontal surface
compI <- calcG0(30.2669, modeRad = 'bdI', dataRad = My_Meteo, corr = 'none')
# creates the angles for calculation of the rad on a tilted surface
angGen <- fTheta(sol = sol, beta = 0, alfa = 0)
# Calculates the irradiance on a tilted surface
irad_tilt <- fInclin(compI, angGen)
When I use beta = 0, alfa = 0 (a flat plane) I should get roughly the same output as my input. However, when I search for the max value of global horizontal irradiance:
x <- which.max(irad_tilt$G)
irad_tilt[x,]
I get it to return a max at 2013-05-05 10:43:01 and I cannot figure out what/why this time is as it is. It is not local time, that should be around 13:24. Local solar time should be around 12:00. UTC time should be around 18:24, and UTC solar time (if there is such a thing) should be 17:00...
I know this is obscure, but any thoughts?

I have tested the code and data in my computer with correct
results. Let's reproduce the main steps with some graphical
outputs:
library(solaR)
sol_data <- read.csv('/tmp/one_day_WSL_8.csv')
## The data must be named a certain way.
names(sol_data) <- c('time', 'G0', 'B', 'D0', 'Ta')
## The negatives are an artifact of the sensor and are set to 0.
sol_data$G0 <- ifelse(sol_data$G0 < 0, 0, sol_data$G0)
sol_data$B <- ifelse(sol_data$B < 0, 0, sol_data$B)
sol_data$D0 <- ifelse(sol_data$D0 < 0, 0, sol_data$D0)
## This calculates the beam incidence on the horizontal plane.
sol_data$B0 <- sol_data$G0 - sol_data$D0
sol_data$B0 <- ifelse(sol_data$B0 < 0, 0, sol_data$B0)
## This takes the data and assigns the timestamp to a certain format and timezone
idxLocal <- with(sol_data, as.POSIXct(time, format='%Y-%m-%d %H:%M:%S', tz = 'CST6CDT'))
The function local2Solar converts the time zone of a POSIXct object to the mean solar time and set its time zone to UTC as a synonym of mean solar time. It includes two corrections: the difference of longitudes between the location and the time zone, and the daylight saving time.
idx <- local2Solar(idxLocal, lon = -97.7428)
## Creates a zoo object needed to make the Meteo file for input
z <- zoo(sol_data[,c('G0', 'D0', 'B0', 'Ta')], idx)
Because your data belongs to a clear day and this time series uses
mean solar time, the maximum should be located around noon.
xyplot(z, type=c('l', 'g'))
Now we compute the sun geometry with calcSol. Here I am using a
different code from yours.
## local latitude
lat = 30.2669
## Computes the movement of the sun/earth
sol <- calcSol(lat, BTi=idx)
xyplot(as.zooI(sol), type=c('l', 'g'))
Next we calculate radiation on the horizontal surface.
g0 <- calcG0(lat, modeRad = 'bdI', dataRad = z, corr = 'none')
xyplot(as.zooI(g0), type=c('l', 'g'))
Finally, with calcGef we obtain irradiance on a tilted surface:
gef <- calcGef(lat=lat, modeRad='bdI', dataRad=z)
xyplot(as.zooI(gef), type=c('l', 'g'))
I suspect that your problem is related with time zones defined in
your computer. Could you check these results?:
lonHH('America/Chicago')
## [1] -1.570796
lonHH('CDT6CST')
## [1] -1.570796
idxLocal1 <- as.POSIXct(sol_data$time, format='%Y-%m-%d %H:%M:%S', tz = 'CST6CDT')
idxLocal2 <- as.POSIXct(sol_data$time, format='%Y-%m-%d %H:%M:%S', tz = 'America/Chicago')
idxUTC1 <- as.POSIXct(format(idxLocal1, tz='UTC'), tz='UTC')
idxUTC2 <- as.POSIXct(format(idxLocal2, tz='UTC'), tz='UTC')
all.equal(idxUTC1, idxUTC2)
## [1] TRUE
Maybe these technical notes are useful for additional information on
this topic:
Ripley, B. D. and Hornik, K. (2001) Date-time classes. R News, 1/2, 8–11.
Gabor Grothendieck and Thomas Petzoldt (2004), Date and Time Classes in
R, R News 4(1), 29-32.
Besides, you should take a look at the information and examples of help(timezone).

thank you greatly for responding directly and for the great package. It turns out we had a wildly wrong interpretation of solar time. I am seeing a different possible issue that would not fit into the comments section.
When I run:
local2Solar(as.POSIXct("2013-07-07 13:36:00",tz="America/Chicago"),lon=-97.7428)
I get "2013-07-07 12:05:01 UTC". According to NOAA, "2013-07-07 13:36:00" is solar noon for that day.
Just to confuse matter, when I run:
local2Solar(as.POSIXct("2013-06-07 13:30:00",tz="America/Chicago"),lon=-97.7428)
I get "2013-06-07 11:59:01 UTC", so it appears to be very close. According to NOAA, "2013-06-07 13:30:00" is solar noon for that day.
If you were to run:
local2Solar(as.POSIXct("2013-01-07 12:37:27",tz="America/Chicago"),lon=-97.7428)
You would get "2013-01-07 12:06:28 UTC". According to NOAA, "2013-01-07 12:37:27"" is solar noon for that day.
I ran G. Master's equations separately from solaR and got: "2013-06-07 13:29:30 CDT" (the highest precision is each minute for this version) for the time with maximum incident power for the first case on "2013-06-07".

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Calculating distance of multiple coordinates in R - r

Related

R: How to construct an index which states for each country pair the correlation between the yes votes?

Using terra and sf in R: Why am I getting illogical distance measurements?

Find the nearest neighbor of coordinates in 2 separate matrices and the distance between the 2 in r

Distance matrix

solaR timestamp for radiation on a tilted surface

Categories

Resources