I have a Geocode dataset with three columns: Latitude, Longitude and Cluster. I calculated the average center of the clusters and store the results in two lists Center_lat and Center_lon.
Now I wanna calculate the distance from each observation(3000+) to each cluster center(30) with the Haversine formula. To get a 3000 by 30 matrix.
I tried to use a nested for loop, but I got the same distance for all clusters. Here's the code.
for (i in 1:n){
for (k in 1:c){
lat1=radians(Geocode[i,1])
lon1=radians(Geocode[i,2])
lat2=radians(Center_lat[k,2])
lon2=radians(Center_lon[k,2])
}
R <- 3958.756 # Earth mean radius [miles]
dist_mat[i,] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R
}
I'm also thinking using a lapply to substitute the nested loop. But I'm not sure how to use the function... Any help is appreciated.
# Convert to radian
radians = function(theta=0){return(theta * pi / 180)}
# Calculates the geodesic distance from each property to the center of it's current cluster using the
# Spherical Law of Cosines (slc)
get_dist <- function(lat1, lon1, lat2, lon2) {
R <- 3958.756 # Earth mean radius [miles]
d <- acos(sin(radians(lat1))*sin(radians(lat2)) +
cos(radians(lat1))*cos(radians(lat2)) * cos(radians(lon2)-radians(lon1))) * R
return(d) # Distance in miles
}
dist_mat<-lapply()
This is the type of computation you want to vectorize in R. Here we use outer to generate all the possible combinations of row indices from your Geocode data and Center_x data, and then apply the distance function in one fell swoop.
First, get data in easier to use form (one matrix for locations, another for centers, first column lats, second lons):
# See Data section below for actual data used
# G <- radians(Geocode)
# C <- radians(cbind(Center_lat[, 2], Center_lon[, 2]))
R <- 3958.756 # Earth mean radius [miles]
Define the function, notice how we use the indices to look up the actual coordinates in G and C, and how the function is vectorized (i.e. we only need to call it once with all the data):
my_dist <- function(xind, yind)
acos(
sin(G[xind, 1]) * sin(C[yind, 1]) +
cos(G[xind, 1]) * cos(C[yind, 1]) * cos(C[yind, 2] - G[xind, 2])
) * R
And apply it with outer:
DISTS <- outer(seq.int(nrow(G)), seq.int(nrow(C)), my_dist)
str(DISTS)
# num [1:3000, 1:30] 4208 6500 8623 7303 3864 ...
quantile(DISTS) # to make sure stuff is reasonable:
# 0% 25% 50% 75% 100%
# 0.000 4107.574 6204.799 8333.155 12422.059
This runs in about 30ms on my system.
Data:
set.seed(1)
lats <- runif(10000, -60, 60) * pi / 180
lons <- runif(10000, -179, 180) * pi / 180
G.ind <- sample(10000, 3000)
C.ind <- sample(10000, 30)
G <- cbind(lats[G.ind], lons[G.ind])
C <- cbind(lats[C.ind], lons[C.ind])
It seems like you want to write to the matrix once per row and per column, so you'd want to change the matrix within both for loops, like this:
for (i in 1:n){
for (k in 1:c){
lat1=radians(Geocode[i,1])
lon1=radians(Geocode[i,2])
lat2=radians(Center_lat[k,2])
lon2=radians(Center_lon[k,2])
R <- 3958.756 # Earth mean radius [miles]
dist_mat[i,k] <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(lon2-lon1)) * R
}
}
Related
I want to predict vegetation health using 2 remote sensing vegetation indices (VIs) for multiple tree-stands across multiple months. I previously approached this by using a for() loop to iterate through a list of multi-band rasters and calculate the two VIs for each raster (month) using a given equation. I then used raster::extract() to extract the pixels corresponding to each stand. However, I now would like to include some additional variables in my predictions of vegetation health, and am having trouble integrating them using the same method as they are simply columns in a dataframe and not rasters. I'm open to different ways to do this, I just can't think of any.
example:
#Part 1: Loading libraries and creating some sample data
library(sf)
library(raster)
library(terra)
#polygons to generate random points into
v <- vect(system.file("ex/lux.shp", package="terra"))
v <- v[c(1:12)]
v_sf <- st_as_sf(v) # Convert 'SpatVector' into 'sf' object
#5 rasters (months) with 5 bands each
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("band", 1:5)
ras_list <- list(r,r,r,r,r)
#generating some points (10 forest stands)
pnts <- st_sample(v_sf, size = 10, type = "random")
pnts<- as_Spatial(pnts)
#Part 2: Loop to predict vegetation health using two VI variables
vis <- list() #empty list to store NDVI rasters
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
#vegetation health = 1.23 + (0.45 * VI1) - (0.67 * VI2)
vis[i] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
#Part 3: Loop to extract pixel values for each forest stand
vi_vals <- list() #empty list to store extracted pixel values
for (i in 1:length(vis)) {
n <- raster(vis[[i]])
vi_vals[[i]] <- raster::extract(n, pnts, method = "bilinear")
}
This method works fine but as I mentioned above, I now need to repeat the same process using a new equation which incorporates variables that can't be calculated from a raster. These values are simply 3 columns in a dataframe that are identified by a stand ID.
Let's first simplify your example a bit
Example data
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r <- rep(r, 5) * 1:5
names(r) <- paste0("b", 1:5)
ras_list <- list(r,r,r,r,r)
set.seed(1)
pnts <- spatSample(v, 10, "random")
values(pnts) = data.frame(id=10, a=5:14, b=3:12, d=6:15)
Compute VI and extract
vis <- list()
for (i in seq_along(ras_list)) {
b <- ras_list[[i]]
vis[[i]] <- 1.23 + 0.45*((b[[4]] + b[[3]] - b[[1]]) / (b[[4]] + b[[3]])) - 0.67*(b[[1]] * b[[3]] - b[[4]])
}
vis <- rast(vis)
names(vis) = paste0("set", 1:5)
vi_vals <- extract(vis, pnts, method = "bilinear")
And now you can do something with the tree parameters
out <- t(t(vi_vals[,-1])) * pnts$a + pnts$b / pnts$d
It would be more efficient to first extract the values and then apply the function
e <- list()
for (i in seq_along(ras_list)) {
x <- extract(ras_list[[i]], pnts, method="bilinear")[,-1]
e[[i]] = (1.23 + 0.45*((x$b4 + x$b3 - x$b1) / (x$b4 + x$b3)) - 0.67*(x$b1 * x$b3 - x$b4)) * pnts$a + pnts$b / pnts$d
}
e <- do.call(cbind, e)
The results are not exactly the same; I assume because of loss of decimal number precision in one or the other method.
I am attempting to sample along multiple lines (roads) at regular intervals and am struggling to obtain exact perpendicular angles for each road segment. I have split each road into points giving the node at which each line changes orientation and what I have so far creates a point within a straight segment of each road and appears to be working fine.
This is the code I am using to produce perpendicular angles for each node segment.
# X and Y for 3 points along a line
road_node <- matrix(
c(
381103, 381112, 381117,
370373, 370301, 370290
),
ncol = 2,
)
road_node <- as.data.frame(road_node)
angle_inv <- c()
for (i in 2:nrow(road_node) - 1) {
n1 <- road_node[i, ]
n2 <- road_node[i + 1, ]
x <- as.numeric(n1[1] - n2[1])
y <- as.numeric(n1[2] - n2[2])
ang <- atan2(y, x) + 1 / 2 * pi
if (!is.na(ang) && ang < 0) {
ang <- 2 + ang
}
angle_inv <- rbind(angle_inv, ang)
}
Where road_node gives the coordinates of each node.
From this I take the mid points and the inverse angles to create two points either side of the mid points, to produce a line segment.
# X Y and Angles (angles for one segment are the same
mids <- matrix(
c(
381374.5, 381351.0, 381320.5,
371590.5,371560.0, 371533.590,
2.3, 2.3, 2.3
),
nrow = 3,
)
mids <- as.data.frame(mids)
pts <- c()
for (i in 1:nrow(mids)) {
x1 <- mids[i, 1] + 10 * cos(mids[i, 3])
y1 <- mids[i, 2] + 10 * sin(mids[i, 3])
x2 <- mids[i, 1] - 10 * cos(mids[i, 3])
y2 <- mids[i, 2] - 10 * sin(mids[i, 3])
p1 <- cbind(x1, y1)
p2 <- cbind(x2, y2)
pair <- rbind(p1, p2)
pts <- rbind(pts, pair)
}
Some line segments appear to be correctly perpendicular to the node they are associate with, however some are not. Each appear to correctly share the same length.
I believe the problem lies with either how I am selecting my angles using atan2, or with how I am selecting my points either side of the node segment.
Firstly, there's no need to use trigonometry to solve this. Instead you can use the inverse reciprocal of the slope intercept form of the line segment equation, then calculate points on a perpendicular line passing through a give point.
See Equation from 2 points using Slope Intercept Form
Also your mid points appear incorrect and there are only 2 mid points as 3 points = 2 line segments.
This code appears to work fine
# Function to calculate mid points
mid_point <- function(p1,p2) {
return(c(p1[1] + (p2[1] - p1[1]) / 2,p1[2] + (p2[2] - p1[2]) / 2))
}
# Function to calculate slope of line between 2 points
slope <- function(p1,p2) {
return((p2[2] - p1[2]) / (p2[1] - p1[1]))
}
# Function to calculate intercept of line passing through given point wiht slope m
calc_intercept <- function(p,m) {
return(p[2] - m * p[1])
}
# Function to calculate y for a given x, slope m and intercept b
calc_y <- function(x,m,b) {
return(c(x, m * x + b))
}
# X and Y for 3 points along a line
road_node <- matrix(
c(
381103, 381112, 381117,
370373, 370301, 370290
),
ncol = 2,
)
road_node <- as.data.frame(road_node)
perp_segments <- c()
for (i in 2:nrow(road_node) - 1) {
n1 <- road_node[i, ]
n2 <- road_node[i + 1, ]
# Calculate mid point
mp <- mid_point(n1,n2)
# Calculate slope
m <- slope(n1,n2)
# Calculate intercept subsituting n1
b <- calc_intercept(n1,m)
# Calculate inverse reciprocal of slope
new_m <- -1.0 / m
# Calculate intercept of perpendicular line through mid point
new_b <- calc_intercept(mp,new_m)
# Calculate points 10 units away in x direction at mid_point
p1 <- rbind(calc_y(as.numeric(mp[1])-10,new_m,new_b))
p2 <- rbind(calc_y(as.numeric(mp[1])+10,new_m,new_b))
# Add point pair to output vector
pair <- rbind(p1,p2)
perp_segments <- rbind(perp_segments,pair)
}
This is how it looks geometrically (image)
I hope this helps.
Edit 1:
I thought about this more and came up with this simplified function. If you tink of the problem as a right isosceles triangle (45,45,90), then all you need to do is find the point which is the required distance from the reference point interpolated along the line segment, then invert its x and y distances from the reference points, then add and subtract these from the reference point.
Function calc_perp
Arguments:
p1, p2 - two point vectors defining the end points of the line segment
n - the distance from the line segment
interval - the interval along the line segment of the reference point from the start (default 0.5)
proportion - Boolean defining whether the interval is a proportion of the length or a constant (default TRUE)
# Function to calculate Euclidean distance between 2 points
euclidean_distance <-function(p1,p2) {
return(sqrt((p2[1] - p1[1])**2 + (p2[2] - p1[2])**2))
}
# Function to calculate 2 points on a line perpendicular to another defined by 2 points p,p2
# For point at interval, which can be a proportion of the segment length, or a constant
# At distance n from the source line
calc_perp <-function(p1,p2,n,interval=0.5,proportion=TRUE) {
# Calculate x and y distances
x_len <- p2[1] - p1[1]
y_len <- p2[2] - p1[2]
# If proportion calculate reference point from tot_length
if (proportion) {
point <- c(p1[1]+x_len*interval,p1[2]+y_len*interval)
}
# Else use the constant value
else {
tot_len <- euclidean_distance(p1,p2)
point <- c(p1[1]+x_len/tot_len*interval,p1[2]+y_len/tot_len*interval)
}
# Calculate the x and y distances from reference point to point on line n distance away
ref_len <- euclidean_distance(point,p2)
xn_len <- (n / ref_len) * (p2[1] - point[1])
yn_len <- (n / ref_len) * (p2[2] - point[2])
# Invert the x and y lengths and add/subtract from the refrence point
ref_points <- rbind(point,c(point[1] + yn_len,point[2] - xn_len),c(point[1] - yn_len,point[2] + xn_len))
# Return the reference points
return(ref_points)
}
Examples
> calc_perp(c(0,0),c(1,1),1)
[,1] [,2]
point 0.5000000 0.5000000
1.2071068 -0.2071068
-0.2071068 1.2071068
> calc_perp(c(0,0),c(1,1),sqrt(2)/2,0,proportion=FALSE)
[,1] [,2]
point 0.0 0.0
0.5 -0.5
-0.5 0.5
This is how the revised function looks geometrically with your example and n = 10 for distance from line:
I am starting in R and trying to get this loop to execute. I am trying to get the loop to calculate consecutive distances between coordinates using a function (Vincenty's formula). 'Distfunc' is the file to the function. The function is then called up by 'x' below. All I want then is a data frame or a list of the distances between coordinates. Greatful of any help!
Distfunc <- source("F://Distfunc.R")
for (i in length(Radians)) {
LatRad1 <- Radians[i,1]
LongRad1 <- Radians[i,2]
LatRad2 <- Radians[i+1,1]
LongRad2 <- Radians[i+1,2]
x <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
print(data.frame(x[i]))
}
Well, without a good description of the problem you are facing and a proper reproducible example it is very difficult to provide any good insight. To start off, see How to make a great R reproducible example?.
There are many things that are not clear in the way you are doing things. First of all, why assign the results of source(...) to the variable Distfunc?
Anyways, here is some code that I put together in trying to understand this; it runs without problems, but it is not clear that it accomplishes what you expect (since you don't provide much information). In particular, the codet uses the implementation for function gcd.vif by Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/). The code below is aimed at clarity, since you mention that you are starting in R.
# Calculates the geodesic distance between two points specified by radian latitude/longitude using
# Vincenty inverse formula for ellipsoids (vif)
# By Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/)
gcd.vif <- function(long1, lat1, long2, lat2) {
# WGS-84 ellipsoid parameters
a <- 6378137 # length of major axis of the ellipsoid (radius at equator)
b <- 6356752.314245 # ength of minor axis of the ellipsoid (radius at the poles)
f <- 1/298.257223563 # flattening of the ellipsoid
L <- long2-long1 # difference in longitude
U1 <- atan((1-f) * tan(lat1)) # reduced latitude
U2 <- atan((1-f) * tan(lat2)) # reduced latitude
sinU1 <- sin(U1)
cosU1 <- cos(U1)
sinU2 <- sin(U2)
cosU2 <- cos(U2)
cosSqAlpha <- NULL
sinSigma <- NULL
cosSigma <- NULL
cos2SigmaM <- NULL
sigma <- NULL
lambda <- L
lambdaP <- 0
iterLimit <- 100
while (abs(lambda-lambdaP) > 1e-12 & iterLimit>0) {
sinLambda <- sin(lambda)
cosLambda <- cos(lambda)
sinSigma <- sqrt( (cosU2*sinLambda) * (cosU2*sinLambda) +
(cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda) )
if (sinSigma==0) return(0) # Co-incident points
cosSigma <- sinU1*sinU2 + cosU1*cosU2*cosLambda
sigma <- atan2(sinSigma, cosSigma)
sinAlpha <- cosU1 * cosU2 * sinLambda / sinSigma
cosSqAlpha <- 1 - sinAlpha*sinAlpha
cos2SigmaM <- cosSigma - 2*sinU1*sinU2/cosSqAlpha
if (is.na(cos2SigmaM)) cos2SigmaM <- 0 # Equatorial line: cosSqAlpha=0
C <- f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha))
lambdaP <- lambda
lambda <- L + (1-C) * f * sinAlpha *
(sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM)))
iterLimit <- iterLimit - 1
}
if (iterLimit==0) return(NA) # formula failed to converge
uSq <- cosSqAlpha * (a*a - b*b) / (b*b)
A <- 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)))
B <- uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)))
deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM^2) -
B/6*cos2SigmaM*(-3+4*sinSigma^2)*(-3+4*cos2SigmaM^2)))
s <- b*A*(sigma-deltaSigma) / 1000
return(s) # Distance in km
}
# Initialize the variable 'Radians' with random data
Radians <- matrix(runif(20, min = 0, max = 2 * pi), ncol = 2)
lst <- list() # temporary list to store the results
for (i in seq(1, nrow(Radians) - 1)) { # loop through each row of the 'Radians' matrix
LatRad1 <- Radians[i, 1]
LongRad1 <- Radians[i, 2]
LatRad2 <- Radians[i + 1, 1]
LongRad2 <- Radians[i + 1, 2]
gcd_vif <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
# Store the input data and the results
lst[[i]] <- c(
latitude_position_1 = LatRad1,
longtude_position_1 = LongRad1,
latitude_position_2 = LatRad2,
longtude_position_2 = LongRad2,
GCD = gcd_vif
)
}
Results <- as.data.frame(do.call(rbind, lst)) # store the input data and the results in a data frame
So I have a formula for calculating the distance between two locations,however I want to have a second function that uses this one.
The second function would get a list of lat, long values with corresponding numbers 1,2,3,etc.Then for for each point on the list run the GPSDist on all the other lat long data in that column. find the set that has the smallest distance (d) value, and say what number that lat long corresponds to.
GPSDist <- function(lon1,lat1,lon2,lat2,u){
if (u=="k"){R<-6373}
if (u=="m"){R<-6378137}
if (u=="M"){R<-3961}
if (u=="k"){print("unit = kilometers")}
if (u=="m"){print("unit = meters")}
if (u=="M"){print("unit = miles")}
radians <- 0.0174532925
lon1 <- lon1* radians
lat1 <- lat1* radians
lon2 <- lon2* radians
lat2 <- lat2* radians
dlon <- lon2-lon1
dlat <- lat2-lat1
a <- ((sin((dlat)/2))^2) + cos(lat1) * cos(lat2) * ((sin((dlon)/2))^2)
c <- 2 * atan2(sqrt(a), sqrt(1-a))
d = c*R
return(d)
}
for clarity I would like use that function (GPSDist) on each point on say there is a data set 1 to 5;
use point 1 against points 2,3,4,5. find smallest list it by its respective number.
use point 2 compare to points 1,3,4,5 etc.
First, it always helps if you define an example data set. So, here is one for you:
point1 <- c(1,3)
point2 <- c(4,5)
point3 <- c(7,9)
point4 <- c(11,13)
point5 <- c(89,5)
pointList <- list(point1, point2, point3, point4, point5)
Next, this can probably be solved with nested for loops.
GPDistOthers <- function(inputList, u) {
output <- list()
for (i in 1:length(inputList)) {
currentList <- list()
for (j in 1:length(inputList)) {
if (i != j) {
currentList <- c(currentList, GPSDist(inputList[[i]][1], inputList[[i]][2], inputList[[j]][1], inputList[[j]][2], u))
}
}
output[[length(output)+1]] <- currentList
}
return(output)
}
If you are trying to run this many times, you will make your code run much faster by implementing this using lapply as described here: Access lapply index names inside FUN
I would like to calculate a density function of a distribution whose characteristics function is known. As a simple example take the normal distribution.
norm.char<-function(t,mu,sigma) exp((0+1i)*t*mu-0.5*sigma^2*t^2)
and then I would like to use R's fft function. but I don't get the multiplicative constants right and I have to reorder the result (take the 2nd half and then the first half of the values). I tried something like
xmax = 5
xmin = -5
deltat = 2*pi/(xmax-xmin)
N=2^8
deltax = (xmax-xmin)/(N-1)
x = xmin + deltax*seq(0,N-1)
t = deltat*seq(0,N-1)
density = Re(fft(norm.char(t*2*pi,mu,sigma)))
density = c(density[(N/2+1):N],density[1:(N/2)])
But this is still not correct. Does anybody know a good reference on the fft in R in the context of density calculations? Obviously the problem is the mixture of the continuous FFT and the discrete one. Can anybody recommend a procedure?
Thanks
It is just cumbersome: take a pen and paper,
write the integral you want to compute
(the Fourier transform of the characteristic function),
discretize it, and rewrite the terms so that they look like
a discrete Fourier transform (the FFT assumes that the interval starts
at zero).
Note that fft is an unnormalized transform: there is no 1/N factor.
characteristic_function_to_density <- function(
phi, # characteristic function; should be vectorized
n, # Number of points, ideally a power of 2
a, b # Evaluate the density on [a,b[
) {
i <- 0:(n-1) # Indices
dx <- (b-a)/n # Step size, for the density
x <- a + i * dx # Grid, for the density
dt <- 2*pi / ( n * dx ) # Step size, frequency space
c <- -n/2 * dt # Evaluate the characteristic function on [c,d]
d <- n/2 * dt # (center the interval on zero)
t <- c + i * dt # Grid, frequency space
phi_t <- phi(t)
X <- exp( -(0+1i) * i * dt * a ) * phi_t
Y <- fft(X)
density <- dt / (2*pi) * exp( - (0+1i) * c * x ) * Y
data.frame(
i = i,
t = t,
characteristic_function = phi_t,
x = x,
density = Re(density)
)
}
d <- characteristic_function_to_density(
function(t,mu=1,sigma=.5)
exp( (0+1i)*t*mu - sigma^2/2*t^2 ),
2^8,
-3, 3
)
plot(d$x, d$density, las=1)
curve(dnorm(x,1,.5), add=TRUE)