Snap points to line in order in R - r

I have a set of GPS points and a linestring (representing a bus line) where the GPS points should belong to (both are ordered). So I used a function to snap the points to the linestring:
library(dplyr)
library(sf)
library(readr)
# Function to snap points to the closest line
snap_points_to_line <- function(points, line) {
# alinhar as pradas gps com a linha
points_align <- st_nearest_points(points, line) %>%
st_cast("POINT")
# pegar so os pontos pares
points_new_geometry <- points_align[c(seq(2, length(points_align), by = 2))]
points_align_end <- points %>%
st_set_geometry(points_new_geometry)
}
# GPS Points
gps <- structure(list(id = 1:11,
lon = c(-38.477035, -38.477143, -38.478701,
-38.479795, -38.480923, -38.481078,
-38.481885, -38.484545, -38.486156,
-38.486813, -38.486506),
lat = c(-3.743078, -3.743019, -3.742566,
-3.742246, -3.741844, -3.741853,
-3.741596, -3.740711, -3.740076,
-3.739399, -3.73886)),
class = "data.frame",
row.names = c(NA,-11L))
gps
id lon lat
1 1 -38.47704 -3.743078
2 2 -38.47714 -3.743019
3 3 -38.47870 -3.742566
4 4 -38.47980 -3.742246
5 5 -38.48092 -3.741844
6 6 -38.48108 -3.741853
7 7 -38.48188 -3.741596
8 8 -38.48454 -3.740711
9 9 -38.48616 -3.740076
10 10 -38.48681 -3.739399
11 11 -38.48651 -3.738860
# Download line
line <- read_rds(gzcon(url("https://github.com/kauebraga/dissertacao/raw/master/junk/line_so.rds")))
# Make snap
gps_snap <- snap_points_to_line(gps %>% st_as_sf(coords = c("lon", "lat"), crs = 4326), line)
The snap works fine most of the time. But there are some cases where the bus line makes a U turn and some points are snapped to the wrong side of the road because GPS position may have an error. In the figure below, the three points on the south side of the road should be on the north side:
How can I guarantee that the GPS points are snapped to the correct side of the road? Both the GPS points and linestring are in the same order (if you st_cast(line, "POINT) it will give points that grow together with the GPS) , so I hope there should be a way to address that (I don't know how!).
Any help using sf or other spatial tools in R would be much appreciated. Thanks!

Set up the data
library(sf)
library(dplyr)
library(readr)
library(rgeos)
# GPS Points
gps <- structure(list(id = 1:11,
lon = c(-38.477035, -38.477143, -38.478701,
-38.479795, -38.480923, -38.481078,
-38.481885, -38.484545, -38.486156,
-38.486813, -38.486506),
lat = c(-3.743078, -3.743019, -3.742566,
-3.742246, -3.741844, -3.741853,
-3.741596, -3.740711, -3.740076,
-3.739399, -3.73886)),
class = "data.frame",
row.names = c(NA,-11L))
# convert to sf
gps <- gps %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, remove =F)
line <- read_rds(gzcon(url("https://github.com/kauebraga/dissertacao/raw/master/junk/line_so.rds")))
Define Custom Snapping Function
This function works on the logic that the correct road segment to snap to is the one which requires the shortest distance to travel to along the linestring (network distance) from the previous point.
It does the following:
Each point is buffered by a given tolerance (in metres so we have converted to a metre CRS for your area)
The line is then intersected with our buffer. This will leave two sections of road where the traffic goes both ways, and one otherwise. This is illustrated below:
We now have two options to snap to in some cases, so we initially snap to both:
We chose one of the unambiguous points (only one snap option) as the reference point and calculate the distance along the network to the snap options for the next id.
For each point id, the one with the lowest network distance from the previous id will be the one we want.
Having found the correct point id, we then set this as the new reference point and repeat from step 4.
custom_snap <- function(line, points, tolerance, crs = 29194) {
points <- st_transform(points, crs)
line <- st_transform(line, crs)
# buffer the points by the tolerance
points_buf <- st_buffer(points, 15)
# intersect the line with the buffer
line_intersect <- st_intersection(line, points_buf)
# convert mutlinestrings (more than one road segment) into linestrings
line_intersect <- do.call(rbind,lapply(1:nrow(line_intersect),function(x){st_cast(line_intersect[x,],"LINESTRING")}))
# for each line intersection, calculate the nearest point on that line to our gps point
nearest_pt <- do.call(rbind,lapply(seq_along(points$id), function(i){
points[points$id==i,] %>% st_nearest_points(line_intersect[line_intersect$id==i,]) %>% st_sf %>%
st_cast('POINT') %>% mutate(id = i)
}))
nearest_pt<- nearest_pt[seq(2, nrow(nearest_pt), by = 2),] %>%
mutate(option = 1:nrow(.))
# find an unambiguous reference point with only one snap option
unambiguous_pt <- nearest_pt %>%
group_by(id) %>%
mutate(count = n()) %>%
ungroup() %>%
filter(count == 1) %>%
slice(1)
# calculate network distance along our line to each snapped point
dists <- rgeos::gProject(as(line,'Spatial'), as(nearest_pt,'Spatial'))
# join back to nearest points data
dists <- nearest_pt %>% cbind(dists)
# we want to recursively do the following:
# 1. calculate the network distance from our unambiguous reference point to the next id point in the data
# 2. keep the snapped point for that id that was closest *along the network* to the previous id
# 3. set the newly snapped point as our reference point
# 4. repeat
# get distances from our reference point to the next point id
for(i in unambiguous_pt$id:(max(dists$id)-1)){
next_dist <- which.min(abs(dists[dists$id== i +1,]$dists - dists[dists$id== unambiguous_pt$id,]$dists ))
next_option <- dists[dists$id== i +1,][next_dist,]$option
nearest_pt <- nearest_pt %>% filter(id != i+1 | option == next_option)
unambiguous_pt <- nearest_pt %>% filter(id ==i+1 & option == next_option)
dists <- nearest_pt %>% cbind(dists = rgeos::gProject(as(line,'Spatial'), as(nearest_pt,'Spatial')))
}
# and in the reverse direction
for(i in unambiguous_pt$id:(min(dists$id)+1)){
next_dist <- which.min(abs(dists[dists$id== i -1,]$dists - dists[dists$id== unambiguous_pt$id,]$dists ))
next_option <- dists[dists$id== i -1,][next_dist,]$option
nearest_pt <- nearest_pt %>% filter(id != i-1 | option == next_option)
unambiguous_pt <- nearest_pt %>% filter(id ==i-1 & option == next_option)
dists <- nearest_pt %>% cbind(dists = rgeos::gProject(as(line,'Spatial'), as(nearest_pt,'Spatial')))
}
# transform back into lat/lng
snapped_points <- nearest_pt %>%
st_transform(4326)
return(snapped_points)
}
Calculate which line to snap to
gps_snap <- custom_snap(line, gps, 15) %>%
cbind(st_coordinates(.))
Plot results in leaflet
library(leaflet)
# get line coords
line_coords <- line %>%
st_coordinates(.)
# plot in leaflet
leaflet() %>%
leaflet::setView(lng = -38.4798, lat = -3.741829, zoom = 18) %>%
addProviderTiles('CartoDB.Positron') %>%
addPolylines(lng = line_coords[,'X'], lat = line_coords[,'Y']) %>%
addCircles(lng = gps$lon, lat = gps$lat, radius = 3, color ='red') %>%
addCircles(lng = gps_snap$X, lat = gps_snap$Y, col ='green', radius = 4)
Green represents the snapped points, red represents the original GPS points. They are now snapped to the correct side of the road.

Related

redlistr::getAreaEOO from degree minute data input

I have been trying to calculate the EOO area for a species using the redlistr package. In the example, the authors used raster data. However, I have observation points of the species in the degree minute format.
I created a subset of data for reference:
dt <- data.frame(lon_x = c(168.36085, 151.228745, 144.984577, 144.984287, 144.984201),
lat_y = c(-46.59179, -34.005291, -37.926258, -37.919514, -37.923407),
species = "seahorse_spp1")
coords <- cbind(dt$lon_x, dt$lat_y)
dt_spdf <- SpatialPointsDataFrame(coords, dt)
# now add a coordinate reference system to the sp dataframe
prj4string <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +units=km +no_defs"
crs(dt_spdf) <- prj4string
# unit is in meter like required
# now create EOO polygon
dt.polygon <- redlistr::makeEOO(dt_spdf)
# now visually check the points and polygon to make sure they look correct
leaflet() %>%
addTiles() %>%
addCircles(data = dt_spdf, ~ lon_x, ~ lat_y, color = "red") %>%
addPolygons(data = dt.polygon)
# calculate EOO
redlistr::getAreaEOO(dt.polygon)
#> [1] 0.0003264353
And it keeps giving this very small, unrealistic value.
Does anyone have any idea where I did wrong?
Thank you!

Filter / subset data between two polygons in R - SF (concentric circle polygons)

Not sure if there's a function i'm missing, but i'm having trouble filtering / checking if point geometries fall between two polygons (concentric circles).
Can you create a mask between two concentric polygons, and then use this to filter out the point geometries that contain the feature data of interest?
I tried subsetting between two polygons using the sf_filter package in R. This did not work.
reproducible code below:
library(sf)
library(tidyverse)
library(sp)
#Create fake data
my.data <- data.frame(replicate(2,sample(-88: -14,100,rep=TRUE))) # Point data
d <- cbind(seq(-180,180,length.out=360),rep(-88,360))
e <- cbind(seq(-180,180,length.out=360),rep(-30,360))
#Project fake data
d = SpatialPoints(cbind(d[,1], d[,2]), proj4string=CRS("+proj=longlat"))
d <- spTransform(d, CRS("+init=epsg:3976"))
e = SpatialPoints(cbind(e[,1], e[,2]), proj4string=CRS("+proj=longlat"))
e <- spTransform(e, CRS("+init=epsg:3976"))
my.data = SpatialPoints(cbind(my.data[,1], my.data[,2]), proj4string=CRS("+proj=longlat"))
my.data <- spTransform(my.data, CRS("+init=epsg:3976"))
d <- sf::st_as_sf(d, coords = c("X1", "X2"),
remove = FALSE,
crs = st_crs("epsg:3976"))
e <- sf::st_as_sf(e, coords = c("X1", "X2"),
remove = FALSE,
crs = st_crs("epsg:3976"))
my.data <- sf::st_as_sf(my.data, coords = c("X1", "X2"),
remove = FALSE,
crs = st_crs("epsg:3976"))
# Create linestrings from circle
d <- d %>%
summarise(do_union = FALSE) %>%
st_cast("LINESTRING")
e <- e %>%
summarise(do_union = FALSE) %>%
st_cast("LINESTRING")
#Join geometries
nst <- rbind(d,e)
#Create polygon
nst <- nst %>%
st_cast("POLYGON")
#Filtering between polygons doesn't return anything
PFz <- st_filter(my.data,nst)
Consider this approach; it builds on three semi random cities in North Carolina (because I love the nc.shp that ships with {sf})
What it does is that it builds two buffers as sf objects, and then constructs two logical vectors - sf::st_contains() for the big circle, and small circle.
Then it is a simple logical operation of checking for points that:
are contained within the big circle, and at the same time
are not contained within the small circle
Should you want to get more fancy you could run sf::st_difference() on the two buffer objects, and get the mask directly & check for sf::st_contains() only once for the "rim" object.
library(sf)
library(dplyr)
# 3 semi rancom cities in NC (because I *deeply love* the nc.shp file)
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# small buffer - Greensboro will be in; Wilmington not
small_buffer <- cities %>%
filter(name == "Raleigh") %>%
st_geometry() %>%
st_buffer(units::as_units(100, "mile"))
# big buffer - both Greensboro & Wilmington are in
big_buffer <- cities %>%
filter(name == "Raleigh") %>%
st_geometry() %>%
st_buffer(units::as_units(150, "mile"))
# a visual overview
mapview::mapview(list(big_buffer, small_buffer, cities))
# vector of cities in big buffer
in_big_buffer <- st_contains(big_buffer,
cities,
sparse = F) %>%
t()
# vector of cities in small buffer
in_small_buffer <- st_contains(small_buffer,
cities,
sparse = F) %>%
t()
# cities in concentric circle = in big, and not in small
in_concentric_circle <- in_big_buffer & !in_small_buffer
# check - subset of cities by logical vector
cities %>%
filter(in_concentric_circle)
# Simple feature collection with 1 feature and 1 field
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: -77.91222 ymin: 34.22333 xmax: -77.91222 ymax: 34.22333
# Geodetic CRS: WGS 84
# name geometry
# 1 Wilmington POINT (-77.91222 34.22333)

In R Overlap spatial polygons dataframe (spdf) and summarise number of features of 1st spdf overlapped by 2nd spdf

I have tried several sources but no luck. Please see my codes below and I state the problem at the end of codes. I have created random hexagonal grids over large areas and wanted to summarize how many of them fall under features of 2nd spatial polygon data frame.
library (sf)
library(dplyr)
library(raster)
# load 2nd spdf
Read ibra polygons as sf object. To download paste 'Interim Biogeographic Regionalisation for Australia (IBRA)' in the search item then click on 'Interim Biogeographic Regionalisation for Australia (IBRA), Version 7 (Regions)'
ibra <- st_read("ibra7_subregions.shp")
ibra <- st_transform(ibra, crs = 4326)
# ibra has >2000 features (i.e., rows) for 89 regions of same name, group them together
ibraGrid <- ibra %>%
group_by(REG_NAME_7) %>%
st_sf() %>%
mutate(cellid = row_number()) %>%
summarise()
colnames(ibraGrid)[1] <- "id"
# crop ibra to specific boundary
box <- extent(112,155,-45,-10)
ibraGrid <- st_crop(ibraGrid, box)
# make dataframe of spatial grid (1st spdf)
ran.p <- st_sample(au, size = 1040)
Load shp of au from here then click on "nsaasr9nnd_02211a04es_geo___.zip".
au <- st_read("aust_cd66states.shp")
au <- st_transform(au, crs = 4326)
# create grid around multipoints
rand_sampl_Grid <- ran.p %>%
st_make_grid(cellsize = 0.1, square = F) %>%
st_intersection(au) %>%
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(cellid = row_number())
# sampled grid per ibra region
density_per_ib_grid <- ibraGrid %>%
st_join(rand_sampl_Grid) %>%
mutate(overlap = ifelse(!is.na(id), 1, 0)) %>%
group_by(cellid) %>%
summarize(num_sGrid = sum(overlap))
Everything worked well. But, I expected that the length of View(density_per_ib_grid$num_sGrid) would be equal to the number of features in ibraGrid (i.e., 89). Currently, View(density_per_ib_grid$num_sGrid) has length of features equal to rand_sample_Grid (i.e., ~1040). In addition, I want to repeat the process for 100 times so that num_sGrid would be the mean of 100 iterations.
The above codes worked desirably using larger spdf (which is ibraGrid in this casae) created from coordinates. Any suggestions/feedback will be highly appreciated.
I have figured out the solution. The last codes section in the above question should be as:
richness_per_ib_grid <- st_intersection(ibraGrid, rand_sampl_Grid) %>%
group_by(id) %>%
count()
out <- as.data.frame(int.result)[,-3] # print output as data frame.
Therefore the complete answer for the question above should be:
library (sf)
library(dplyr)
library(raster)
# load 2nd spdf
Read ibra polygons as sf object. To download paste 'Interim Biogeographic Regionalisation for Australia (IBRA)' in the search item then click on 'Interim Biogeographic Regionalisation for Australia (IBRA), Version 7 (Regions)'
ibra <- st_read("ibra7_subregions.shp")
ibra <- st_transform(ibra, crs = 4326)
# ibra has >2000 features (i.e., rows) for 89 regions of same name, group them together
ibraGrid <- ibra %>%
group_by(REG_NAME_7) %>%
st_sf() %>%
summarise()
colnames(ibraGrid)[1] <- "id"
# crop ibra to specific boundary
box <- extent(112,155,-45,-10)
ibraGrid <- st_crop(ibraGrid, box)
# make dataframe of spatial grid (1st spdf)
ran.p <- st_sample(au, size = 1040)
Load shp of au from here then click on "nsaasr9nnd_02211a04es_geo___.zip".
au <- st_read("aust_cd66states.shp")
au <- st_transform(au, crs = 4326)
# create grid around multipoints
rand_sampl_Grid <- ran.p %>%
st_make_grid(cellsize = 0.1, square = F) %>%
st_intersection(au) %>%
st_cast("MULTIPOLYGON") %>%
st_sf()
# sampled grid per ibra region
density_per_ib_grid <- <- st_intersection(ibraGrid, rand_sampl_Grid) %>%
group_by(id) %>%
count()
out <- as.data.frame(int.result)[,-3] # print output as data frame.

mutate cannot find function

I'm working through eBird code from this webpage:
https://github.com/CornellLabofOrnithology/ebird-best-practices/blob/master/03_covariates.Rmd
with the exception of using my own data. I have a .gpkg from gadm.org of Australia, and my own ebird data selected for Australia. I have followed out the code exactly with the exception of not using "bcr" as my dataset has no bcr codes, along with removing st_buffer(dist = 10000) from the rgdal code because this prevented me from actually downloading the MODIS data for some reason.
EDIT:I have also used the provided data from the site and still received the same error
I got stuck at this code:
lc_extract <- ebird_buff %>%
mutate(pland = map2(year_lc, data, calculate_pland, lc = landcover)) %>%
select(pland) %>%
unnest(cols = pland)
It returns this error:
Error: Problem with `mutate()` input `pland`.
x error in evaluating the argument 'x' in selecting a method for function 'exact_extract': invalid layer names
i Input `pland` is `map2(year_lc, data, calculate_pland, lc = landcover)`.)`
I can not seem to figure out how to correct it, I'm rather new to dense geo-spatial code like this.
There is a free dataset in the link, but I haven't yet tried it out, so it may be that my data is incompatible with the code? however, I have had a look at the Gis-data.gpkg provided, and my data from gadm seems fine.
The previous two codes to the one above were:
neighborhood_radius <- 5 * ceiling(max(res(landcover))) / 2
ebird_buff <- red_knot %>%
distinct(year = format(observation_date, "%Y"),
locality_id, latitude, longitude) %>%
# for 2019 use 2018 landcover data
mutate(year_lc = if_else(as.integer(year) > max_lc_year,
as.character(max_lc_year), year),
year_lc = paste0("y", year_lc)) %>%
# convert to spatial features
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
# transform to modis projection
st_transform(crs = projection(landcover)) %>%
# buffer to create neighborhood around each point
st_buffer(dist = neighborhood_radius) %>%
# nest by year
nest(data = c(year, locality_id, geometry))
calculate_pland <- function(yr, regions, lc) {
locs <- st_set_geometry(regions, NULL)
exact_extract(lc[[yr]], regions, progress = FALSE) %>%
map(~ count(., landcover = value)) %>%
tibble(locs, data = .) %>%
unnest(data)
}
This has been answered by the author of the webpage.
The solution was this code:
lc_extract <- NULL
for (yr in names(landcover)) {
# get the buffered checklists for a given year
regions <- ebird_buff$data[[which(yr == ebird_buff$year_lc)]]
# get landcover values within each buffered checklist area
ee <- exact_extract(landcover[[yr]], regions, progress = FALSE)
# count the number of each landcover class for each checklist buffer
ee_count <- map(ee, ~ count(., landcover = value))
# attach the year and locality id back to the checklists
ee_summ <- tibble(st_drop_geometry(regions), data = ee_count) %>%
unnest(data)
# bind to results
lc_extract <- bind_rows(lc_extract, ee_summ)
}
credits go to:
Matt Strimas-Mackey

st_intersection LINESTRING with borders of POLYGONs while preserving the order of points in r / sf

I need to intersect a LINESTRING with the borders POLYGON feature, while preserving the order of the resulting POINT features. The background is that I need to figure out the border crossings car took for entering/leaving a specific country, but the order of the border crossings matters.
I have implemented the following approach:
# setup test data
poly <-
list(matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE)) %>%
sf::st_polygon() %>%
sf::st_sfc() %>%
sf::st_sf()
line1 <- matrix(c(-1, 10, 5, -1),ncol=2, byrow=TRUE) %>%
sf::st_linestring() %>%
sf::st_sfc() %>%
sf::st_sf()
# reverse of line 1
line2 <- matrix(c(5, -1, -1, 10), ncol=2, byrow=TRUE) %>%
sf::st_linestring() %>%
sf::st_sfc() %>%
sf::st_sf()
# preview
leaflet::leaflet() %>%
leaflet::addPolygons(data = poly) %>%
leaflet::addPolylines(data = line1) %>%
leaflet::addPolylines(data = line2) %>%
leaflet::addTiles()
# do the intersection
# cast to multilinestring because I just need the border crossing points
ml <- sf::st_cast(poly, "MULTILINESTRING")
sf::st_intersection(ml, line1)
sf::st_intersection(ml, line2)
However, this approach loses the order of the border crossings. Does anyone have a better idea?
I found an answer that involves rgeos::gProject(). This function calculates distances between points along a line, and I can use it to derive a sort order for my points:
# do the intersection
ml <- sf::st_cast(poly, "MULTILINESTRING")
points1 <- sf::st_intersection(ml, line1) %>%
sf::st_cast("POINT")
points2 <- sf::st_intersection(ml, line2) %>%
sf::st_cast("POINT")
# Calculate sort order for the points
points1$order <- rgeos::gProject(sf::as_Spatial(line1), sf::as_Spatial(points1))
points2$order <- rgeos::gProject(sf::as_Spatial(line2), sf::as_Spatial(points2))

Resources