Calculate 'nearest neighbour' when using group_by() in R - r

I am trying to find the centroids location of GPS coordinates within trackline segments. Then find the indexes of nearest neighbours between these centroid points and GPS coordinates in the same trackline segment.
So far I have found the centroid of each segment, and then found a method to get the indexes of these nearest neighbours to my GPS coordinates, but am unable to make R only find nearest neighbours within the same segment.
# Calculate the average Latitude/Longitude for each 'Segment'
data_update <- data %>% as.tibble() %>%
group_by(Segment) %>% mutate(ave_lat = mean(Latitude), ave_lon = mean(Longitude))
# Find the nearest neighbour
install.packages('RANN')
library(RANN)
closest <- RANN::nn2(data_update[,2:3], data_update[,4:5], k = 1, searchtype = "radius", radius = 1)
closest <- sapply(closest, cbind) %>% as_tibble
# closest produces two columns nn.idx and nn.dist - I need nn.idx only
# data_update[,2] = Longitude values (decimal degrees)
# data_update[,3] = Latitude values (decimal degrees)
# data_update[,4] = average longitude value for each Segment (decimal degrees)
# data_update[,5] = average latitude value for each Segment (decimal degrees)
I need to use nn2 to calculate the nn.idx within each 'Segment' rather than across the entire data frame as the code above is doing.
Does anyone know how to group the nn2 function to calculate the nearest neighbour by 'Segment'? I am open to non-tidyverse options also.
Example data can be found here: https://drive.google.com/file/d/16cZPo6kXIafU0ezAoy8EgB9CHVEDUTe-/view?usp=sharing

Related

Nearest adjacent polygon, distance and closest point to other polygons

I have multiple polygons in a dataset and I would like to:
Identify the nearest polygon to each polygon and what the distance between them is
Calculate the coordinates of where the nearest parts of the two polygons are (so I can draw a line and visually check the distances)
If the distance is 800 metres of less, join the polygons together to make multipart polygons
This code does half of my first ask and I know st_distance can do the latter. I was hoping for a solution that wouldn't need for a matrix of every distance between every polygon to be generated.
library(sf)
library(dplyr)
download.file("https://drive.google.com/uc?export=download&id=1-I4F2NYvFWkNqy7ASFNxnyrwr_wT0lGF" , destfile="ProximityAreas.zip")
unzip("ProximityAreas.zip")
Proximity_Areas <- st_read("Proximity_Areas.gpkg")
Nearest_UID <- st_nearest_feature(Proximity_Areas)
Proximity_Areas <- Proximity_Areas %>%
select(UID) %>%
mutate(NearUID = UID[Nearest_UID])
Is there a method of producing two outputs 1) an appended Proximity_Areas file that included the distance and XY coorindates for the nearest points for the UID and Neatest_UID and 2) a file that looks similar to the original Proximity_Areas file, just with merged polygons if the criteria is met?
Once you have created index of nearest neighbors you can calculate the connecting lines via a sf::st_nearest_points() call.
An interesting aspect is that if you make the call on geometries (not sf, but sfc objects) you do the calculation pairwise (i.e. not in a matrix way).
The call will return linestrings, which is very helpful since you can calculate their length and have two of your objectives (nearest points & distance) at a single call...
lines <- Proximity_Areas %>%
st_geometry() %>% # extact geometry
# create a line to nearest neighbour as geometry
st_nearest_points(st_geometry(Proximity_Areas)[Nearest_UID], pairwise =T) %>%
# make sf again (so it can hold data)
st_as_sf() %>%
# add some data - start, finish, lenght
mutate(start = Proximity_Areas$UID,
end = Proximity_Areas$UID[Nearest_UID],
distance = st_length(.))
glimpse(lines)
# Rows: 39
# Columns: 4
# $ x <LINESTRING [m]> LINESTRING (273421.5 170677..., LINESTRING (265535.1 166136..., LINESTRING (265363.3 1…
# $ start <chr> "U001", "U002", "U003", "U004", "U005", "U006", "U007", "U008", "U009", "U010", "U011", "U012", "…
# $ end <chr> "U026", "U010", "U013", "U033", "U032", "U014", "U028", "U036", "U011", "U008", "U028", "U030", "…
# $ distance [m] 317.84698 [m], 579.85131 [m], 529.67907 [m], 559.96441 [m], 0.00000 [m], 80.54011 [m], 754.94311 [m…
mapview::mapview(lines)
The part about joining close objects together is a bit tricky, since you don't know how many polygons you will end up with - you can have a polygon A that is far from C, but will end up merged since both are close to B. This does not vectorize easily and you are likely to end up running a while loop. For a possible approach consider this related answer Dissolving polygons by distance - R

Shortest distance in R

I would like to know how to calculate the shortest distance between two properties (points) for my code below. There are two shapefile files, one being a points shapefile, the other a roads shapefile.
For testing, both shapefiles can be downloaded from the following website: https://github.com/JovaniSouza/JovaniSouza5/blob/master/Example.zip
library(sf)
roads <- st_read('Roads/Roads.shp')
pts <- st_read('Points/Points.shp') %>%
st_transform(crs=st_crs(roads))
plot(st_geometry(roads))
plot(st_geometry(pts), add = T, col = 'red', pch = 20)
Example
You can just use st_distance to get a distance matrix and find the minimum. I wrote a function that can process all of that and return a new sf data.frame. The data.frame will contain attributes called nearest and distance which is the index of the nearest point and the distance to that point respectively. Note the distances are in meters reflecting your projection. Your data have repeating points, so some of the points show no distance because of that. If you don't want those points you will have to remove the duplicates.
getNearest <- function(shp){
dist <- as.data.frame(st_distance(shp))
for (i in 1:ncol(dist)){
rows <- seq(1:ncol(dist))
rows <- rows[i != rows]
shp[i, 'nearest'] <- which.min(dist[rows, i])
shp[i, 'distance'] <- dist[which.min(dist[rows, i]), i]
}
return(shp)
}
pts2 <- getNearest(pts)
From what I understand, you are trying to measure the distance along the road to each point and it's closest point. Please see a similar workflow here:
https://community.rstudio.com/t/distance-between-points-along-network-path/49596/2

R: Heading/ Direction of Travel/ bearing from .gpx file - tmaptools

Using tmaptools package in R - How can I extract the 'Bearing' information from a .GPX track file. This appears in Garmin Basecamp but does not appear using tmaptools::read_GPX. Currently I use the below code. But surely there is a simpler way? Link to GPS Track: https://www.dropbox.com/s/02p3yyjkv9fmrni/Barron_Thomatis_2019_EOD.gpx?dl=0
library(tmaptools)
library(tmap)
library(sf)
library(tidyverse)
library(geosphere)
GPSTrack <- read_GPX("Barron_Thomatis_2019_EOD.gpx", layers = "track_points", as.sf = TRUE)
#
#Adjust GPS Track Data
#
#Extract Lat & Lon from Track geometery (c(lat, Lon))
GPSTrack_Pts <- st_coordinates(GPSTrack)
#Add X, Y Columns to Track
GPSTrack2 <- cbind(GPSTrack, GPSTrack_Pts)
#Create a coordinate vector by combining X & Y
coords <- cbind(GPSTrack2$X,GPSTrack2$Y)
#Convert GPS Track into SpatialPoints format for calculating Bearing
GPSTrack_SpPts <- SpatialPoints(coords)
#Create GPS Point Bearing, GPP point distance & GPS Time interval columns
empty <- st_as_sfc("POINT(EMPTY)")
GPSTrack2 <- GPSTrack2 %>%
st_set_crs(4326) %>% # will use great circle distance
mutate(
Bearing = bearing(coords))
#Convert Bearing to Course and Add as column
GPSTrack2 <- GPSTrack2 %>%
mutate(course = (Bearing + 360) %% 360) # add full circle, i.e. +360, and determine modulo for 360
I suggest you use lwgeom::st_geod_azimuth() for this task - it makes for somewhat more concise code.
Note that there is a challenge when adding the vector of bearings back to the spatial dataframe of points; it has by definition one element less than is the number of rows (you need two points to define a bearing).
One possibility of achieving that - if required - is by concatenating the vector with a single NA value representing the bearing of the very last point. By definition it has no azimuth, as there is no following point.
The azimuth values are objects of class units, originally in radians. Should the class create a problem (as it does with concatenating with the NA) you can easily convert it to a plain number via units::drop_units().
library(sf)
library(dplyr)
library(lwgeom)
points <- st_read("Barron_Thomatis_2019_EOD.gpx",
layer = "track_points",
quiet = T,
stringsAsFactors = F)
points <- points %>%
mutate(bearing = c(lwgeom::st_geod_azimuth(.) %>% units::drop_units(), NA))

I would like to work out the distance of data points (lat/long) from the edges of a shape file in R and then apply a criterion to the data points?

I have data points of a species observed using camera traps and would like to measure the distance of each camera trap site (CameraStation) to the edge of a national park using R. I have a shapefile of the park (shp) and want to apply a criterion to CameraStation(s) which are <5km from the edge. My data frame (df) consists of multiple events/observations (EventID) per CameraStation. The aim is to analyse when events near the park edge are most frequent given other environmental factors such as Season, Moon Phase and DayNight (also columns in DF).
I found a package called distance in R but this is for distance sampling and not what I want to do. Which package is relevant in this situation?
I expect the following outcome:
EventID CameraStation Distance(km) Within 5km
0001 Station 1 4.3 Yes
0002 Station 1 4.3 Yes
0003 Station 2 16.2 No
0004 Station 3 0.5 Yes
...
Here's a general solution, adapted from Spacedmans answer to this question at gis.stackexchange. Note: This solution requires working in a projected coordinate system. You can transform to a projected CRS if needed using spTransform.
The gDistance function of the rgeos package calculates the distance between geometries, but for the case of points inside a polygon the distance is zero. The trick is to create a new "mask" polygon where the original polygon is a hole cut out from the mask. Then we can measure the distance between points in the hole and the mask, which is the distance to the edge of the original polygon that we really care about.
We'll use the shape file of the Yellowstone National Park Boundary found on this page.
library(sp) # for SpatialPoints and proj4string
library(rgdal) # to read shapefile with readOGR
library(rgeos) # for gDistance, gDifference, and gBuffer
# ab67 was the name of the shape file I downloaded.
yellowstone.shp <- readOGR("ab67")
# gBuffer enlarges the boundary of the polygon by the amount specified by `width`.
# The units of `width` (meters in this case) can be found in the proj4string
# for the polygon.
yellowstone_buffer <- gBuffer(yellowstone.shp, width = 5000)
# gDifference calculates the difference between the polygons, i.e. what's
# in one and not in the other. That's our mask.
mask <- gDifference(yellowstone_buffer, yellowstone.shp)
# Some points inside the park
pts <- list(x = c(536587.281264245, 507432.037861251, 542517.161278414,
477782.637790409, 517315.171218198),
y = c(85158.0056377799, 77251.498952222, 15976.0721391485,
40683.9055315169, -3790.19457474617))
# Sanity checking the mask and our points.
plot(mask)
points(pts)
# Put the points in a SpatialPointsDataFrame with camera id in a data field.
spts.df <- SpatialPointsDataFrame(pts, data = data.frame(Camera = ordered(1:length(pts$x))))
# Give our SpatialPointsDataFrame the same spatial reference as the polygon.
proj4string(spts.df) <- proj4string(yellowstone.shp)
# Calculate distances (km) from points to edge and put in a new column.
spts.df$km_to_edge <- apply(gDistance(spts.df, difference, byid=TRUE),2,min)/1000
# Determine which records are within 5 km of an edge and note in new column.
spts.df$edge <- ifelse(spts.df$km_to_edge < 5, TRUE, FALSE)
# Results
spts.df
# coordinates Camera km_to_edge edge
# 1 (536587.3, 85158.01) 1 1.855010 TRUE
# 2 (507432, 77251.5) 2 9.762755 FALSE
# 3 (542517.2, 15976.07) 3 11.668700 FALSE
# 4 (477782.6, 40683.91) 4 4.579638 TRUE
# 5 (517315.2, -3790.195) 5 8.211961 FALSE
Here's a quick solution.
Simplify the outline of your shapefile into N points. Then calculate the minimum distance for each camera trap to every point in the outline of the national park.
library(geosphrere)
n <- 500 ##The number of points summarizing the shapefile
NPs <- ##Your shapefile goes here
NP.pts <- spsample(NPs, n = n, type = "regular")
CP.pts <- ## Coordinates for a single trap
distances<-distm(coordinates(CP.pts), coordinates(NP.pts), fun = distHaversine)/1000
##Distance in Km between the trap to each point in the perimeter of the shapefile:
distances
Use distances to find the minimum distance between the shapefile and that given trap. This approach can easily be generalizable using for loops or apply functions.
I had a problem with the points data frame and shape file being projected so instead I used the example in this link to answer my question
https://gis.stackexchange.com/questions/225102/calculate-distance-between-points-and-nearest-polygon-in-r
Basically, I used this code;
df # my data frame with points
shp # my shapefile (non-projected)
dist.mat <- geosphere::dist2Line(p = df2, line = shp)
coordinates(df2)<-~Longitude+Latitude # Longitude and Latitude are columns in my df
dmat<-data.frame(dist.mat) # turned it into a data frame
dmat$km5 <- ifelse(dmat$distance < 5000, TRUE, FALSE) # in meters (5000)
coordinates(dmat)<-~lon+lat
df2$distance <- dmat$distance # added new Distance column to my df

How to find point related to set of coordinates?

I have a set of about 5000 geographical (WGS84) coordinates. All of them are inside 40km square.
Is there any algorithm / R function to find point, inside square and not in the given set, farthest from any point from set?
I mean how to find point in the square where the distance to the nearest point from set is longest?
Now I do it by generating grid of coordinates equally spaced and finding distance from each grid point to the nearest set point. Is there any less numerical / not brute force method?
EDIT:
I made mistake in previous version of the question. Maybe this will help:
Set of points are coordinates of the 5000 shops in the city. I want to find place in the city where distance to the nearest shop is the longest.
I think that if the point you seek isn't on the edge of the box then it has to be at a vertex of the voronoi tesselation of the points. If it is on the edge of the box then it has to be on the intersection of the box and an edge of the voronoi tesselation.
So if you compute the voronoi tesselation and then use rgeos to intersect it with the box, that gives you a set of possible points. You can then use the FNN package to compute the neighbour distances from those possible points to the data points, sort, and find the possible point with the biggest nearest neighbour.
That gives you an exact point without any of this gridding business. If it wasn't so close to bedtime I'd sort out some code to do it. You probably want the deldir package or voronoi tesselations. It might even already do the box intersection...
Okay, not quite bedtime. Here's the solution:
findM <- function(pts,xmin,xmax,ymin,ymax){
require(deldir)
require(FNN)
d = deldir(pts[,1],pts[,2],rw=c(xmin,xmax,ymin,ymax))
vpts = rbind(as.matrix(d$dirsgs[,1:2]),as.matrix(d$dirsgs[,3:4]))
vpts = rbind(vpts,cbind(c(xmin,xmax,xmin,xmax),c(ymin,ymin,ymax,ymax)))
vpts = vpts[!duplicated(vpts),]
nn = get.knnx(pts,vpts,k=1)
ptmin = which(nn$nn.dist==max(nn$nn.dist))
list(point = vpts[ptmin,,drop=FALSE], dist = nn$nn.dist[ptmin])
}
Edited version now returns one point and adds the corner points as possibles.
Here's an example that uses several functions (distanceFromPoints(), maxValue(), Which(), and xyFromCell()) from the raster package to perform the key calculations:
# Load required libraries
library(sp)
library(rgdal)
library(raster)
# Create a SpatialPoints object with 10 points randomly sampled from
# the area lying between longitudes 0 and 1 and latitudes 0 and 1
bbox <- matrix(c(0,0,1,1), ncol=2, dimnames = list(NULL, c("min", "max")))
PRJ4 <- CRS("+proj=longlat +datum=WGS84 +ellps=WGS84")
S <- Spatial(bbox = bbox, proj4string = PRJ4)
SP <- spsample(S, 10, type="random")
# Create a raster object covering the same area
R <- raster(extent(bbox), nrow=100, ncol=100, crs=PRJ4)
# Find the coordinates of the cell that is farthest from all of the points
D <- distanceFromPoints(object = R, xy = SP)
IDmaxD <- Which(D == maxValue(D), cells=TRUE)
(XY <- xyFromCell(D, IDmaxD))
# x y
# [1,] 0.005 0.795
# Plot the results
plot(D, main = "Distance map, with most distant cell in red")
points(SP)
points(XY, col="red", pch=16, cex=2)

Resources