I have multiple polygons in a dataset and I would like to:
Identify the nearest polygon to each polygon and what the distance between them is
Calculate the coordinates of where the nearest parts of the two polygons are (so I can draw a line and visually check the distances)
If the distance is 800 metres of less, join the polygons together to make multipart polygons
This code does half of my first ask and I know st_distance can do the latter. I was hoping for a solution that wouldn't need for a matrix of every distance between every polygon to be generated.
library(sf)
library(dplyr)
download.file("https://drive.google.com/uc?export=download&id=1-I4F2NYvFWkNqy7ASFNxnyrwr_wT0lGF" , destfile="ProximityAreas.zip")
unzip("ProximityAreas.zip")
Proximity_Areas <- st_read("Proximity_Areas.gpkg")
Nearest_UID <- st_nearest_feature(Proximity_Areas)
Proximity_Areas <- Proximity_Areas %>%
select(UID) %>%
mutate(NearUID = UID[Nearest_UID])
Is there a method of producing two outputs 1) an appended Proximity_Areas file that included the distance and XY coorindates for the nearest points for the UID and Neatest_UID and 2) a file that looks similar to the original Proximity_Areas file, just with merged polygons if the criteria is met?
Once you have created index of nearest neighbors you can calculate the connecting lines via a sf::st_nearest_points() call.
An interesting aspect is that if you make the call on geometries (not sf, but sfc objects) you do the calculation pairwise (i.e. not in a matrix way).
The call will return linestrings, which is very helpful since you can calculate their length and have two of your objectives (nearest points & distance) at a single call...
lines <- Proximity_Areas %>%
st_geometry() %>% # extact geometry
# create a line to nearest neighbour as geometry
st_nearest_points(st_geometry(Proximity_Areas)[Nearest_UID], pairwise =T) %>%
# make sf again (so it can hold data)
st_as_sf() %>%
# add some data - start, finish, lenght
mutate(start = Proximity_Areas$UID,
end = Proximity_Areas$UID[Nearest_UID],
distance = st_length(.))
glimpse(lines)
# Rows: 39
# Columns: 4
# $ x <LINESTRING [m]> LINESTRING (273421.5 170677..., LINESTRING (265535.1 166136..., LINESTRING (265363.3 1…
# $ start <chr> "U001", "U002", "U003", "U004", "U005", "U006", "U007", "U008", "U009", "U010", "U011", "U012", "…
# $ end <chr> "U026", "U010", "U013", "U033", "U032", "U014", "U028", "U036", "U011", "U008", "U028", "U030", "…
# $ distance [m] 317.84698 [m], 579.85131 [m], 529.67907 [m], 559.96441 [m], 0.00000 [m], 80.54011 [m], 754.94311 [m…
mapview::mapview(lines)
The part about joining close objects together is a bit tricky, since you don't know how many polygons you will end up with - you can have a polygon A that is far from C, but will end up merged since both are close to B. This does not vectorize easily and you are likely to end up running a while loop. For a possible approach consider this related answer Dissolving polygons by distance - R
Related
This question already has answers here:
Find nearest distance from spatial point with direction specified
(3 answers)
Closed 2 years ago.
this is my reproducible example
########################################
library(sf)
# matrix of lon lat for the definition of the linestring
m<-rbind(
c(12.09136, 45.86471),
c(12.09120, 45.86495),
c(12.09136, 45.86531),
c(12.09137, 45.86540),
c(12.09188, 45.86585),
c(12.09200, 45.86592),
c(12.09264, 45.86622),
c(12.09329, 45.86624),
c(12.09393, 45.86597),
c(12.09410, 45.86585),
c(12.09423, 45.86540),
c(12.09411, 45.86495),
c(12.09393, 45.86471),
c(12.09383, 45.86451),
c(12.09329, 45.86414),
c(12.09264, 45.86413),
c(12.09200, 45.86425),
c(12.09151, 45.86451),
c(12.09136, 45.86471)
)
# define a linestring
ls<-st_linestring(m)
# create a simple feature with appropriate crs
ls<-st_sfc(ls, crs=4326)
# and now again going through the very same
# definition process for a point
# define a point
pt <- st_point(c(12.09286,45.86557))
# crate simple feature with appropriate crs
pt<-st_sfc(pt, crs = 4326)
plot(ls)
plot(pt, add=TRUE)
# this is computing the minimum distance from the point to the line
st_distance(ls, pt)
###############
given the above mentioned toy dataset, I need to find a proper method to calculate:
1 - the distance from each vertex of the line to the given point: and this is probably easily accomplished by calculating the distance between each couple of points (line vetex vs. point) through the simple application of the pythagorean theorem even if I'm quite dubious of that because of the crs in use (i.e. epsg 4326, in degree unit), so that I probably need first to convert the whole dataset to another reference system (with metric unit)...
2 - the distance between the point and the line at fixed bearing angles (10°, 20°, 30°,....,360° from the North): and this is where I'm really lost....
please give me some help in order to properly proceed with the calculation, possibly by using the 'sf' standard that I'm trying now to familiarize with
thanks
thank you for pointing me in the right direction
I worked out my final solution that I'm posting here for the sake of completeness
# my reproducible example
library(sf)
# matrix of lon lat for the definition of the linestring
m<-rbind(
c(12.09136, 45.86471),
c(12.09120, 45.86495),
c(12.09136, 45.86531),
c(12.09137, 45.86540),
c(12.09188, 45.86585),
c(12.09200, 45.86592),
c(12.09264, 45.86622),
c(12.09329, 45.86624),
c(12.09393, 45.86597),
c(12.09410, 45.86585),
c(12.09423, 45.86540),
c(12.09411, 45.86495),
c(12.09393, 45.86471),
c(12.09383, 45.86451),
c(12.09329, 45.86414),
c(12.09264, 45.86413),
c(12.09200, 45.86425),
c(12.09151, 45.86451),
c(12.09136, 45.86471)
)
# define the linestring
ls<-st_linestring(m)
# create a simple feature linestring with appropriate crs
ls<-st_sfc(ls, crs=4326)
# and now again going through the very same
# definition process for a point
# define the origin point
pt <- st_point(c(12.09286,45.86557))
# create simple feature point with appropriate crs
pt<-st_sfc(pt, crs = 4326)
plot(ls)
plot(pt, add=TRUE)
# get minimum distance from the origin point to the line
dist_min<-st_distance(ls, pt)
# get cordinates of the origin point
pt_orig<-st_coordinates(pt)
# load library for later use of the function destPoint()
library(geosphere)
# create vector of bearing angles of 10 degress amplitude
b_angles<-seq(0, 350, 10)
# create empty container for final result as data frame
result<-data.frame(bearing=NULL, distance=NULL)
for(i in 1:length(b_angles)){
result[i,"bearing"]<-b_angles[i]
# calculate destination point coordinates with bearing angle i
# at fixed safe distance (i.e. 100 times the minimum distance)
# so that to avoid null intersection in next step calculation
pt_dest<-destPoint(p=pt_orig, b=b_angles[i],d=dist_min*100)
# define linestring from origin to destination
b_ls<-st_sfc(st_linestring(rbind(pt_orig, pt_dest)), crs=4326)
# get the intersection point between two features
pt_int<-st_intersection(ls, b_ls)
# get the distance
d<-st_distance(pt, pt_int)
result[i,"distance"]<-d
}
I stick as much as possible with the "sf" approach which is giving the following warning inside the for loop in correspondace with the execution of st_intersection(): "although coordinates are longitude/latitude, st_intersection assumes that they are planar"
but considering the short distance I'm working with it seems to me an acceptable approximation
by the way, as far as I understand, it does not exists a corresponding function to geosphere::destPoint within the package "sf"
thanks
I want to calculate length of each polygon.
-Around each polygon I created points (st_sample),
-from combiantion of points I created all possible polyline,
-for polylines which are inside polygon I calucalted length,
-the longest polyline is my result (max length of poylgon).
I wrote code which got me results but it is really slow. Do you have some solution for improvment of my code? I know that with two loops I cannot expect some miracle about speed but I do not know how get results another way.
If nothing else mybe at least some alterntive solution for creating all polyline from combination of points for one polygon in one step without loop ? :)
thank you
library(sf)
library(data.table)
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=poly[1:10,]
poly=st_cast(poly,"POLYGON")
poly$max_length=0
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
##for each polygon create sample of coordinates along line, from them I create polyline and calculated length for linestring which are inside polygon
for (ii in 1:nrow(poly)){
ncl=st_cast(poly[ii,],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,10, type="regular", exact=T),"POINT")
##create empty sf
aaa=st_sf(st_sfc())
st_crs(aaa)="NAD27"
##for each combination of points create linestring and calculate length only for polylines which are inside polygon
for (i in 1:nrow(aa)){
aaa=rbind(aaa,st_sf(geometry=st_cast(st_union(ncp[t(aa[i])]),"LINESTRING")))
}
poly$max_length[ii]=as.numeric(max(st_length(aaa[unlist(st_contains(poly[ii,],aaa)),])))
}
Second attempt with running function inside data.table. One loop less but problem is probably second loop.
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=poly[1:10,]
poly=st_cast(poly,"POLYGON")
poly$max_length=0
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
overFun <- function(x){
ncl=st_cast(x[,geometry],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,40, type="regular", exact=T),"POINT")
##create empty sf
aaa=st_sf(st_sfc())
st_crs(aaa)="NAD27"
##for each combination pof points create linestring and calculate length
for (i in 1:nrow(aa)){
aaa=rbind(aaa,st_sf(geometry=st_cast(st_union(ncp[t(aa[i])]),"LINESTRING")))
}
as.numeric(max(st_length(aaa[unlist(st_contains(x[,geometry],aaa)),])))}
setDT(poly)
##run function inside data.table
poly[,max_length:=overFun(poly), by=seq(nrow(poly))]
Edit: I found some solution for my problem which is enough fast for my needs.
Using parallel library inside data.table with function which also work on a data.table. There is still question why some polyline are excluded with function st_contains (see picture upper). Maybe some problem with precision?
library(sf)
library(data.table)
poly=st_read(system.file("shape/nc.shp", package="sf"))
poly=st_cast(poly,"POLYGON")
setDT(poly)
##Combination of 10 points, withot repetiton
aa=CJ(1:10,1:10)
aa=aa[!duplicated(t(apply(aa[,.(V1, V2)], 1, sort))),][V1!=V2]
overFun <- function(x){
ncl=st_cast(poly[1,geometry],"LINESTRING")
##sample of point along line
ncp=st_cast(st_sample(ncl,10, type="regular", exact=T),"POINT")
df=data.table(ncp[aa[,V1]],ncp[aa[,V2]] )
df[,v3:=st_cast(st_union(st_as_sf(V1),st_as_sf(V2)),"LINESTRING"), by=seq(nrow(df))]
as.numeric(max(st_length(df[unlist(st_contains(poly[1,geometry], df$v3)),]$v3)))}
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, list("overFun","data.table","st_cast","CJ","poly","st_sample","st_sf","st_sfc","aa","st_length","st_union",
"st_as_sf","st_contains"))
system.time(poly[,c("max_length"):=.(clusterMap(cl, overFun, poly$geometry)),])
stopCluster(cl)
I encountered a similar problem and frankly have not found any ready-made solution.
I will use the same Ashe county from sf package.
library(sf)
library(dplyr)
shape <- st_read(system.file("shape/nc.shp", package="sf")) %>%
dplyr::filter(CNTY_ID == 1825) %>% # Keep only one polygon
st_transform(32617) # Reproject to WGS 84 / UTM zone 17N
Solution 1
What you can do with just dplyr, tidyr, and sf is to turn polygons into points and calculate the distance between all the points. From this variety, choose the maximal value. It would be a green line from your example figure.
library(tidyr)
shape %>%
st_cast("POINT") %>% # turn polygon into points
distinct() %>% # remove duplicates
st_distance() %>% # calculate distance matrix
as.data.frame() %>%
gather(point_id, dist) %>% # convert to long format
pull(dist) %>% # keep only distance column
max()
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
#> 45865.15 [m]
Solution 2
You can also use the Momocs package. It was created for 2D morphometric analysis. While it wasn't essential to reproject our shape to UTM in the first case (sf can handle geographic coordinates), your polygon should be projected in the case of the Momocs package.
library(Momocs)
shape %>%
st_cast("POINT") %>% # Polygon to points
distinct() %>% # remove duplicates
st_coordinates() %>% # get coordinates matrix
coo_calliper() # calculate max length
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
#> [1] 45865.15
Comments
There are several other functions in the Momocs package. For example, you can calculate the length of a shape based on their iniertia axis i.e. alignment to the x-axis. The coo_length will return you 44432.02 [m].
For example, one can apply several functions from the Momocs package to the coordinate matrix as following:
point_matrix <- shape %>%
st_cast("POINT") %>%
distinct() %>%
st_coordinates()
#> Warning in st_cast.sf(., "POINT"): repeating attributes for all sub-geometries
#> for which they may not be constant
funs <- list("length" = coo_length,
"width" = coo_width,
"elongation" = coo_elongation)
sapply(funs, function(fun, x) fun(x), x = point_matrix)
#> length width elongation
#> 4.443202e+04 3.921162e+04 1.174917e-01
If you are after circumference of your polygons consider this code:
library(sf)
library(dplyr)
shape <- st_read(system.file("shape/nc.shp", package="sf")) # included with sf package
lengths <- shape %>%
mutate(circumference = st_length(.)) %>%
st_drop_geometry() %>%
select(NAME, circumference)
head(lengths)
NAME circumference
1 Ashe 141665.4 [m]
2 Alleghany 119929.0 [m]
3 Surry 160497.7 [m]
4 Currituck 301515.3 [m]
5 Northampton 211953.8 [m]
6 Hertford 160892.0 [m]
If you have some holes inside and do not want them included in the circumference consider removing them via nngeo::st_remove_holes().
I am trying to find the centroids location of GPS coordinates within trackline segments. Then find the indexes of nearest neighbours between these centroid points and GPS coordinates in the same trackline segment.
So far I have found the centroid of each segment, and then found a method to get the indexes of these nearest neighbours to my GPS coordinates, but am unable to make R only find nearest neighbours within the same segment.
# Calculate the average Latitude/Longitude for each 'Segment'
data_update <- data %>% as.tibble() %>%
group_by(Segment) %>% mutate(ave_lat = mean(Latitude), ave_lon = mean(Longitude))
# Find the nearest neighbour
install.packages('RANN')
library(RANN)
closest <- RANN::nn2(data_update[,2:3], data_update[,4:5], k = 1, searchtype = "radius", radius = 1)
closest <- sapply(closest, cbind) %>% as_tibble
# closest produces two columns nn.idx and nn.dist - I need nn.idx only
# data_update[,2] = Longitude values (decimal degrees)
# data_update[,3] = Latitude values (decimal degrees)
# data_update[,4] = average longitude value for each Segment (decimal degrees)
# data_update[,5] = average latitude value for each Segment (decimal degrees)
I need to use nn2 to calculate the nn.idx within each 'Segment' rather than across the entire data frame as the code above is doing.
Does anyone know how to group the nn2 function to calculate the nearest neighbour by 'Segment'? I am open to non-tidyverse options also.
Example data can be found here: https://drive.google.com/file/d/16cZPo6kXIafU0ezAoy8EgB9CHVEDUTe-/view?usp=sharing
I have data points of a species observed using camera traps and would like to measure the distance of each camera trap site (CameraStation) to the edge of a national park using R. I have a shapefile of the park (shp) and want to apply a criterion to CameraStation(s) which are <5km from the edge. My data frame (df) consists of multiple events/observations (EventID) per CameraStation. The aim is to analyse when events near the park edge are most frequent given other environmental factors such as Season, Moon Phase and DayNight (also columns in DF).
I found a package called distance in R but this is for distance sampling and not what I want to do. Which package is relevant in this situation?
I expect the following outcome:
EventID CameraStation Distance(km) Within 5km
0001 Station 1 4.3 Yes
0002 Station 1 4.3 Yes
0003 Station 2 16.2 No
0004 Station 3 0.5 Yes
...
Here's a general solution, adapted from Spacedmans answer to this question at gis.stackexchange. Note: This solution requires working in a projected coordinate system. You can transform to a projected CRS if needed using spTransform.
The gDistance function of the rgeos package calculates the distance between geometries, but for the case of points inside a polygon the distance is zero. The trick is to create a new "mask" polygon where the original polygon is a hole cut out from the mask. Then we can measure the distance between points in the hole and the mask, which is the distance to the edge of the original polygon that we really care about.
We'll use the shape file of the Yellowstone National Park Boundary found on this page.
library(sp) # for SpatialPoints and proj4string
library(rgdal) # to read shapefile with readOGR
library(rgeos) # for gDistance, gDifference, and gBuffer
# ab67 was the name of the shape file I downloaded.
yellowstone.shp <- readOGR("ab67")
# gBuffer enlarges the boundary of the polygon by the amount specified by `width`.
# The units of `width` (meters in this case) can be found in the proj4string
# for the polygon.
yellowstone_buffer <- gBuffer(yellowstone.shp, width = 5000)
# gDifference calculates the difference between the polygons, i.e. what's
# in one and not in the other. That's our mask.
mask <- gDifference(yellowstone_buffer, yellowstone.shp)
# Some points inside the park
pts <- list(x = c(536587.281264245, 507432.037861251, 542517.161278414,
477782.637790409, 517315.171218198),
y = c(85158.0056377799, 77251.498952222, 15976.0721391485,
40683.9055315169, -3790.19457474617))
# Sanity checking the mask and our points.
plot(mask)
points(pts)
# Put the points in a SpatialPointsDataFrame with camera id in a data field.
spts.df <- SpatialPointsDataFrame(pts, data = data.frame(Camera = ordered(1:length(pts$x))))
# Give our SpatialPointsDataFrame the same spatial reference as the polygon.
proj4string(spts.df) <- proj4string(yellowstone.shp)
# Calculate distances (km) from points to edge and put in a new column.
spts.df$km_to_edge <- apply(gDistance(spts.df, difference, byid=TRUE),2,min)/1000
# Determine which records are within 5 km of an edge and note in new column.
spts.df$edge <- ifelse(spts.df$km_to_edge < 5, TRUE, FALSE)
# Results
spts.df
# coordinates Camera km_to_edge edge
# 1 (536587.3, 85158.01) 1 1.855010 TRUE
# 2 (507432, 77251.5) 2 9.762755 FALSE
# 3 (542517.2, 15976.07) 3 11.668700 FALSE
# 4 (477782.6, 40683.91) 4 4.579638 TRUE
# 5 (517315.2, -3790.195) 5 8.211961 FALSE
Here's a quick solution.
Simplify the outline of your shapefile into N points. Then calculate the minimum distance for each camera trap to every point in the outline of the national park.
library(geosphrere)
n <- 500 ##The number of points summarizing the shapefile
NPs <- ##Your shapefile goes here
NP.pts <- spsample(NPs, n = n, type = "regular")
CP.pts <- ## Coordinates for a single trap
distances<-distm(coordinates(CP.pts), coordinates(NP.pts), fun = distHaversine)/1000
##Distance in Km between the trap to each point in the perimeter of the shapefile:
distances
Use distances to find the minimum distance between the shapefile and that given trap. This approach can easily be generalizable using for loops or apply functions.
I had a problem with the points data frame and shape file being projected so instead I used the example in this link to answer my question
https://gis.stackexchange.com/questions/225102/calculate-distance-between-points-and-nearest-polygon-in-r
Basically, I used this code;
df # my data frame with points
shp # my shapefile (non-projected)
dist.mat <- geosphere::dist2Line(p = df2, line = shp)
coordinates(df2)<-~Longitude+Latitude # Longitude and Latitude are columns in my df
dmat<-data.frame(dist.mat) # turned it into a data frame
dmat$km5 <- ifelse(dmat$distance < 5000, TRUE, FALSE) # in meters (5000)
coordinates(dmat)<-~lon+lat
df2$distance <- dmat$distance # added new Distance column to my df
Very simple situation : a polygon define a geographical area and I want to know whether a point, given by it gps coordinates, lies within that polygon.
I went through many SO questions and have tried various functions and packages like sp, but cannot make out why it fails.
I tried with this very simple function:
https://www.rdocumentation.org/packages/SDMTools/versions/1.1-221/topics/pnt.in.poly
install.packages("SDMTools v1.1-221")
library(SDMTools v1.1-221)
## Coordinates of the polygon corners
lat <- c(48.43119, 48.43119, 48.42647, 48.400031, 48.39775, 48.40624, 48.42060, 48.42544, 48.42943 )
lon <- c(-71.06970, -71.04180, -71.03889, -71.04944, -71.05991, -71.06764, -71.06223, -71.06987, -71.07004)
pol = cbind(lat=lat,lng=lon)
## Point to be tested
x <- data.frame(lng=-71.05609, lat=48.40909)
## Visualization, this point clearly stands in the middle of the polygon
plot(rbind(pol, x))
polygon(pol,col='#99999990')
## Is that point in the polygon?
out = pnt.in.poly(x,poly)
## Well, no (pip=0)
print(out)
The example given for this function works with me, but this simple case no... why is that?
I have not used the method that you are using, but I have one from within sp which works flawlessly on your point and polygon.
I cherry picked your code and left the lat and lon as vectors and the point coordinates as values to suit the functions requirements.
But you could just has easily have made a data frame and used the columns explicitly as lat/lon values.
Here is the gist of it:
require(sp)
## Your polygon
lat <- c(48.43119, 48.43119, 48.42647, 48.400031, 48.39775, 48.40624, 48.42060, 48.42544, 48.42943 )
lon <- c(-71.06970, -71.04180, -71.03889, -71.04944, -71.05991, -71.06764, -71.06223, -71.06987, -71.07004)
## Your Point
lng=-71.05609
lt=48.40909
# sp function which tests for points in polygons
point.in.polygon(lt, lng, lat, lon, mode.checked=FALSE)
And here is the output:
[1] 1
The interpretation of this from the documentation:
integer array values are:
0 point is strictly exterior to polygon
1 point is strictly interior to polygon
2 point lies on the relative interior of an edge of polygon
3 point is a vertex of polygon
As your point is a 1 based on this, it should be wholly within the polygon as your map shows! The key to getting good output with these types of data is serving the variables in the right formats.
you could just as easily had a data frame df with df$lat and df$lon as the two polygon variables as well as a test frame test with test$lat and test$lon as a series of points. You would just substitute each of those in the equation as such:
point.in.polygon(df$lat, df$lon, test$lat, test$lon, mode.checked=FALSE)
And it would return a vector of 0's, 1's 2's and 3's
Just be sure you get it in the right format first!
Here is a link to the function page:
I can't see it explicitly stated in the documentation for ?pnt.in.poly, but it appears the ordering of the lng and lat columns matter. You need to swap the column ordering in your pol and it works.
pol = cbind(lat=lat, lng=lon)
pnt.in.poly(x, pol)
# lng lat pip
# 1 -71.05609 48.40909 0
pol = cbind(lng=lon, lat=lat)
pnt.in.poly(x, pol)
# lng lat pip
# 1 -71.05609 48.40909 1
In spatial goemetry, lng is often thought of as the x-axis, and lat the y-axis, which you'll see is reversed in your plot()