having trouble testing point/geometry intersections in sf - r

i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.

You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…

Related

Calculate the average of a cloud of points based on a grouping variable with sf in R

I have a bunch of points where I want to calculate the average summarized for each grouping variable:
x = st_sfc(st_polygon(list(rbind(c(0,0),c(90,0),c(90,90),c(0,90),c(0,0)))), crs = st_crs(4326))
plot(x, axes = TRUE, graticule = TRUE)
plot(p <- st_sample(x, 7), add = TRUE)
p=st_as_sf(p)
p$test=c("A","A","B","C","C","D","D")
When using dplyr, like this, I get an NA.
p %>%
group_by(test) %>%
summarize(geometry = mean(geometry))
I just want the average into the geometry, not 1 point, nor multipoints.
Not sure to fully understand what you are looking for but I am giving it a try!
So, please find one possible solution with a reprex below using sf and dplyr libraries. I guess you were looking for the aggregate() function instead of group_by()
Reprex
Code
library(sf)
library(dplyr)
R1 <- p %>% aggregate(.,
by = list(.$test),
function(x) x = x[1]) %>%
st_centroid() %>%
select(-Group.1)
#> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
#> geometries of x
Output 1 (sf object)
R1
#> Simple feature collection with 4 features and 1 field
#> Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 2.7875 ymin: 12.91954 xmax: 59.60413 ymax: 51.81421
#> Geodetic CRS: WGS 84
#> test geometry
#> 1 A POINT (27.17167 12.91954)
#> 2 B POINT (2.7875 22.54184)
#> 3 C POINT (59.60413 46.90029)
#> 4 D POINT (56.34763 51.81421)
Complementary code and Output 2 (i.e. if you just need a dataframe)
R2 <- R1 %>%
st_coordinates() %>%
cbind(st_drop_geometry(R1),.)
R2
#> test X Y
#> 1 A 27.17167 12.91954
#> 2 B 2.78750 22.54184
#> 3 C 59.60413 46.90029
#> 4 D 56.34763 51.81421
Visualization
plot(x)
plot(p, add = TRUE)
plot(R1, pch = 15, add = TRUE)
Points are your data and small squares are centroids for each group
(FYI, I set the seed to 427 for reproducibility purpose)
NB: The above uses spherical geometry. If you want to do planar computations you just need to add sf_use_s2(FALSE) at the beginning of the script. To show you the difference, here is the result using sf_use_s2(FALSE) (in this case, you can see that, for each group, the centroid is located precisely on the line connecting the two points;
it is up to you to choose according to your needs)
Created on 2022-01-03 by the reprex package (v2.0.1)

How to build Voronoi Polygons for point coordinates in R?

I have various points (2000+) for observation stations in the alps. I would like to use them to represent the closest geographic area, that is not closer to another observation station. I have done some research, and think that using Varanoi polygons may be the best way to do this.
After having attempting to build these in R, the polygon plot does not quite match my graphing in R.
I have attached the sample data points I am experimenting with, as well as the two images that show the dissimilar graphing of the points.
What do I need to be do differently to make sure that the points line up?
Points:
Longitude:
15.976667 12.846389 14.457222 13.795556 9.849167 16.055278 13.950833 15.666111 9.654722 15.596389 13.226667 15.106667 13.760000 12.226111 9.612222 17.025278 9.877500 15.368056 13.423056 12.571111 16.842222 13.711667 14.003056 12.308056 13.536389
Latitude:
48.40167 48.14889 47.56778 46.72750 47.45833 48.04472 47.82389 47.49472 47.35917 48.64917 48.25000 48.87139 47.87444 47.42806 47.20833 47.77556 47.40389 47.87583 47.53750 46.77694 47.74250 46.55000 48.37611 47.38333 47.91833
Pictures:
Map of the 25 sample points in Leaflet:
Voronoi plot:
Clearly these two are not the same images, so I must be doing something wrong. Here's the code I'm using to generate the Voronoi plot and the leaflet map.
meta25%>%
st_as_sf(coords = c("Longitude", "Latitude"),
crs = sp::CRS("+proj=longlat +datum=WGS84")) %>%
mapview()
m1 = matrix(meta25$Longitude,meta25$Latitude,ncol=2,nrow=25) %>% st_multipoint()
voronoi_grid <- st_voronoi(m1)
plot(voronoi_grid, col = NA)
plot(m1, add = TRUE, col = "blue", pch = 16)
I'm not sure what the problem is, but the matrix is not necessary. Stick to sf objects and you should be fine.
library(tidyverse)
library(sf)
# create pts from lat & lon data
pts <- tibble(latitude = y, longitude = x) %>%
st_as_sf(coords = c('latitude', 'longitude')) %>%
st_set_crs(4326)
# voronoi of pts
vor <- st_voronoi(st_combine(pts))
head(vor)
#> Geometry set for 1 feature
#> Geometry type: GEOMETRYCOLLECTION
#> Dimension: XY
#> Bounding box: xmin: 2.199166 ymin: 39.13694 xmax: 24.43833 ymax: 56.28445
#> Geodetic CRS: WGS 84
#> GEOMETRYCOLLECTION (POLYGON ((2.199166 49.37841...
# st_voronoi returns a GEOMETRYCOLLECTION,
# some plotting methods can't use a GEOMETRYCOLLECTION.
# this returns polygons instead
vor_poly <- st_collection_extract(vor)
head(vor_poly)
#> Geometry set for 6 features
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 2.199166 ymin: 39.13694 xmax: 18.32787 ymax: 56.28445
#> Geodetic CRS: WGS 84
#> First 5 geometries:
#> POLYGON ((2.199166 49.37841, 2.199166 56.28445,...
#> POLYGON ((9.946349 39.13694, 2.199166 39.13694,...
#> POLYGON ((18.32787 39.13694, 11.64381 39.13694,...
#> POLYGON ((9.794868 47.23828, 9.766296 47.38061,...
#> POLYGON ((5.225657 56.28445, 9.393793 56.28445,...
plot(pts, col = 'blue', pch = 16)
plot(vor_poly, add = T, fill = NA)
Created on 2021-04-05 by the reprex package (v0.3.0)
Thanks everyone for your help, not sure if it got quite to where I was looking for. I've since adapted the answer from here: Creating bordering polygons from spatial point data for plotting in leaflet

R: How could I reduce runtime of osmdata to an igraph conversion

Is it possible to reduce the run time of the following code?
My goal is to get an weighted igraph object from open street data area specified by a box boundary.
Currently im trying to use the overpass api so to offload the memory load so I dont have to keep big osm files in memory.
First I get a osm data specified by a bbox (only streets) as an xml structure
library(osmdata)
library(osmar)
install.packages("remotes")
remotes::install_github("hypertidy/scgraph")
library(scgraph)
dat <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_xml ()
Then I convert the resulting xml object dat to an osmar object dat_osmar and finally to an igraph object:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
How could I optimize these routines?
Maybe it is possible to separate dat (XML) object in chunks and parse it in parallell?
I go through several steps only to finally to get to a weighted non directed graph.
Currently the whole process takes 89.555 sec on my machine.
If I could cut the running time of these two stepps:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
that would help already.
One of the approaches I tried is to use osmdata_sc() instead of osmdata_xml().
This provides an silicate object and I can convert it with:
scgraph::sc_as_igraph(dat)
to an igraph.
It is decently fast but sadly the weights are getting lost so its not a solution.
The reason for it is: if I use the conversion from osmar object to an igraph object with the function osmar::as_igraph() the weight is calculated based on the distances between two edges and added to the igraph:
edges <- lapply(dat, function(x) {
n <- nrow(x)
from <- 1:(n - 1)
to <- 2:n
weights <- distHaversine(x[from, c("lon", "lat")], x[to,
c("lon", "lat")])
cbind(from_node_id = x[from, "ref"], to_node_id = x[to,
"ref"], way_id = x[1, "id"], weights = weights)
})
This is missing from scgraph::sc_as_igraph(dat)
If this could be added to silicate to igraph conversion
I could skip the dat_osmar <-as_osmar(xmlParse(dat)) step
and go overpass->silicate->igraph route which is much faster istead of overpass->xml->osmar->igraph.
osmdata package also provides a sf response with osmdata_sf()
so maybe the workflow overpass->sf->igraph is faster but also while using this way I would need the weights incorporated into the graph based on the distance of edges and im not good enough to do it currently and would realy appreciate any help.
Furthermore the connection between openstreetmap gps points and their IDs should not be lost while using sf and resulting igraph object. Meaning I should be able to find gps position to an ID from the resulting Igraph. A lookup table would be enough. If i go overpass->silicate->igraph or overpass->xml->osmar->igraph routes it is possible. Im not sure if it is still would be posible with overpass->sf->igraph route.
If you want to create a graph object starting from a network of roads in R, then I would use the following procedure.
First of all, I need to install sfnetworks from the github repo (since we recently fixed some bugs and the newest version is not on CRAN)
remotes::install_github("luukvdmeer/sfnetworks", quiet = TRUE)
Then load packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(tidygraph)
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sfnetworks)
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Download data from Overpass API
my_osm_data <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(
key = 'highway',
value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified")
) %>%
osmdata_sf(quiet = FALSE)
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sf format
Now I extract the roads and build the sfnetwork object:
system.time({
# extract the roads
my_roads <- st_geometry(my_osm_data$osm_lines)
# build the sfnetwork object
my_sfn <- as_sfnetwork(my_roads, directed = FALSE, length_as_weight = TRUE)
})
#> user system elapsed
#> 3.03 0.16 3.28
As you can see, after downloading the OSM data, it takes just a couple of seconds to run that procedure.
At the moment I ignore all fields in my_osm_data$osm_lines, but if you need to add some of the columns in my_osm_data$osm_lines to my_roads, then you can modify the previous code as follows: my_roads <- my_osm_data$osm_lines[, "relevant columns"].
A few details regarding the construction of the sfnetwork object: the parameter "directed = FALSE" specifies that we want to build an undirected graph (see the docs, here and here for more details), while the parameter length_as_weight = TRUE says that the length of the edges will be stored in a column called "weight" and used by igraph and tidygraph algorithms.
This is the printing of my_sfn object:
my_sfn
#> # A sfnetwork with 33179 nodes and 28439 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An undirected multigraph with 6312 components with spatially explicit edges
#> #
#> Registered S3 method overwritten by 'cli':
#> method from
#> print.boxx spatstat.geom
#> # Node Data: 33,179 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> x
#> <POINT [°]>
#> 1 (11.68861 47.90971)
#> 2 (11.68454 47.90937)
#> 3 (11.75216 48.17638)
#> 4 (11.75358 48.17438)
#> 5 (11.7528 48.17351)
#> 6 (11.74822 48.17286)
#> # ... with 33,173 more rows
#> #
#> # Edge Data: 28,439 x 4
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> from to x weight
#> <int> <int> <LINESTRING [°]> <dbl>
#> 1 1 2 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 1~ 306.
#> 2 3 4 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, ~ 246.
#> 3 5 6 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 1~ 382.
#> # ... with 28,436 more rows
my_sfn is an igraph object by definition:
class(my_sfn)
#> [1] "sfnetwork" "tbl_graph" "igraph"
but, if you want to be more explicit, then
as.igraph(my_sfn)
#> IGRAPH 101dcdf U-W- 33179 28439 --
#> + attr: x (v/x), x (e/x), weight (e/n)
#> + edges from 101dcdf:
#> [1] 1-- 2 3-- 4 5-- 6 7-- 8 9-- 10 11-- 12 13-- 14 15-- 16
#> [9] 17-- 18 16-- 19 20-- 21 21-- 22 23-- 24 25-- 26 27-- 28 29-- 30
#> [17] 31-- 32 33-- 34 35-- 36 37-- 38 39-- 40 41-- 42 43-- 44 45-- 46
#> [25] 14-- 47 48-- 49 50-- 51 52-- 53 54-- 55 56-- 57 36-- 58 58-- 59
#> [33] 60-- 61 62-- 63 64-- 65 66-- 67 68-- 69 70-- 71 72-- 73 74-- 75
#> [41] 76-- 77 78-- 79 80-- 81 82-- 83 84-- 85 86-- 87 88-- 89 90-- 91
#> [49] 92-- 93 94-- 95 96-- 97 98-- 99 100--101 102--103 104--105 106--107
#> [57] 108--109 110--111 112--113 80--114 115--116 117--118 119--120 121--122
#> + ... omitted several edges
You can see that the edges have a weight attribute that is equal to the length of each LINESTRING geometry:
all.equal(
target = igraph::edge_attr(as.igraph(my_sfn), "weight"),
current = as.numeric(st_length(my_roads))
)
#> [1] TRUE
Created on 2021-03-26 by the reprex package (v1.0.0)
If you want to read more details regarding sfnetworks, then you can check the website and the introductory vignettes. That being said, I don't understand what you mean by
connection between openstreetmap gps points and their IDs should not be lost
Can you add more details with a comment or an edit to the original question? Why do you need to OSM id? And what do you mean by OSM id? I think that I need more details to expand this answer.
EDIT
I just re-read #mrhellmann 's answer, and I noticed that I forgot to convert POLYGON data to lines. Anyway, I would suggest applying osmdata::osm_poly2line() immediately after running the code to download OSM data via Overpass API.
It seems like getting the xml data into another format is taking a long time. Instead of using xml, asking overpass to return an sf object and using that might be quicker. The sf object can then be manipulated & used by the sfnetworks package to get a network, while retaining the spatial aspects of the network. Weights can be added by functions from sfnetworks (or tidygraph) packages.
I think the code below focuses on taking care of the speed and the edge weight problems. Other problems, like finding the nearest nodes or edges, can be solved using the functions of the sf package, but are not addressed. Otherwise this becomes more than a one-off SO question.
The speed might be able to be increased, at the cost of spatial accuracy, by using st_simplify for the edges. One problem with this approach is that stnetworks places a node where each linestring meets another. The data returned often has a single roadway split into multiple linestrings. As an example, see two longer roads in yellow on the edges plot below. Probably a solvable problem, but may not be worthwhile in this case.
library(osmdata)
#library(osmar)
library(tidyverse)
library(sf)
library(ggplot2)
library(sfnetworks)
library(tidygraph)
# get data as an sf object rather than xml
## This is the slowest part of the code.
dat_sf <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_sf()
# Only keep lines & polygons. Points take up too much memory &
## all seem to be on lines anyway. Change polygons to LINESTRING,
## as most seem to be roundabouts or a few odd cases.
lines_sf <- dat_sf$osm_lines %>% select(osm_id) %>% st_sf()
polys_sf <- dat_sf$osm_polygons %>% select(osm_id) %>% st_sf() %>%
st_cast('LINESTRING')
# Combine the two above sf objects into one
dat_sf_bound <- rbind(lines_sf, polys_sf)
# get an sfnetwork
dat_sf_net <- as_sfnetwork(dat_sf_bound)
# add edge weight as distance
dat_sf_net <- dat_sf_net %>%
activate(edges) %>%
mutate(weight = edge_length())
dat_sf_net should look like:
> dat_sf_net
# An sfnetwork with 33255 nodes and 28608 edges
#
# CRS: EPSG:4326
#
# A directed multigraph with 6391 components with spatially explicit edges
#
# Edge Data: 28,608 x 4 (active)
# Geometry type: LINESTRING
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
from to weight geometry
<int> <int> [m] <LINESTRING [°]>
1 1 2 306.3998 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 11.68597 47.909…
2 3 4 245.9225 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, 11.7528 48.175…
3 5 6 382.2423 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 11.752 48.1732,…
4 7 8 131.1373 (11.70029 47.87861, 11.70046 47.87869, 11.70069 47.87879, 11.70138 47.87…
5 9 10 252.9170 (11.75733 48.17339, 11.75732 48.17343, 11.75726 48.17357, 11.75718 48.17…
6 11 12 131.6942 (11.75582 48.17036, 11.75551 48.1707, 11.75521 48.17106, 11.75507 48.171…
# … with 28,602 more rows
#
# Node Data: 33,255 x 1
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
geometry
<POINT [°]>
1 (11.68861 47.90971)
2 (11.68454 47.90937)
3 (11.75216 48.17638)
# … with 33,252 more rows
Plots of just edges, and of edges with nodes:
A few long roads skew the coloring, but illustrate the a single road split in two.
Edit:
Addressing the comment to select nearest edge(road) by latitude / longitude coordinates. Nodes(intersections / red dots above) do not have osm id numbers that I am aware of. Nodes are created by sfnetworks.
Starting with an sf object of lat/lon point as our made up gps coordinate.
# random point
gps <- sfheaders::sf_point(data.frame(x = 11.81854, y = 48.04514)) %>% st_set_crs(4326)
# nearest edge(road) to the point. dat_sf_net must have edges activated.
near_edge <- st_nearest_feature(gps, dat_sf_net %>% st_as_sf())
>near_edge
[1] 4359
> st_as_sf(dat_sf_net)[near_edge,]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 11.81119 ymin: 48.02841 xmax: 11.82061 ymax: 48.06845
Geodetic CRS: WGS 84
# A tibble: 1 x 5
from to osm_id geometry weight
<int> <int> <chr> <LINESTRING [°]> [m]
1 7590 7591 24232418 (11.81289 48.02841, 11.81213 48.03014, 11.81202 48.03062, 11.81… 4576.273
p3 <- gplot() +
geom_sf(data = st_as_sf(dat_sf_net), color = 'black') +
geom_sf(data = gps, color = 'red') +
geom_sf(data = st_as_sf(dat_sf_net)[near_edge,], color = 'orange') +
coord_sf(xlim = c(11.7, 11.9), ylim = c(48, 48.1))
It looks like #agila commented above. Since he's an author of sfnetworks, maybe he will have some insights.

Aggregate features by distance in sf objects

I have two sf objects: polygon county (note: this is a multiple polygon, i.e. many counties) and points monitor2.
The county looks like below. Chinese characters cannot be displayed properly, but it's not a big deal.
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 113.15 ymin: 20.58265 xmax: 124.5656 ymax: 40.10793
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
City District Province Code geometry
1 <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> 710000 MULTIPOLYGON (((116.7346 20...
2 <U+5317><U+4EAC><U+5E02> <U+671D><U+9633><U+533A> <U+5317><U+4EAC><U+5E02> 110105 MULTIPOLYGON (((116.4834 40...
3 <U+4E0A><U+6D77><U+5E02> <U+666E><U+9640><U+533A> <U+4E0A><U+6D77><U+5E02> 310107 MULTIPOLYGON (((121.3562 31...
4 <U+4E0A><U+6D77><U+5E02> <U+5B9D><U+5C71><U+533A> <U+4E0A><U+6D77><U+5E02> 230506 MULTIPOLYGON (((121.4855 31...
5 <U+5E7F><U+5DDE><U+5E02> <U+767D><U+4E91><U+533A> <U+5E7F><U+4E1C><U+7701> 440111 MULTIPOLYGON (((113.4965 23...
6 <U+798F><U+5DDE><U+5E02> <U+9F13><U+697C><U+533A> <U+798F><U+5EFA><U+7701> 320106 MULTIPOLYGON (((119.2611 26...
The monitor2 looks like below.
Simple feature collection with 6 features and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 116.17 ymin: 39.8673 xmax: 116.473 ymax: 40.2865
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
# A tibble: 6 x 6
code name city ref value geometry
<chr> <chr> <chr> <chr> <dbl> <POINT [°]>
1 1001A 万寿西宫 北京 N 47.8 (116.366 39.8673)
2 1002A 定陵 北京 Y 45.9 (116.17 40.2865)
3 1003A 东四 北京 N 42.2 (116.434 39.9522)
4 1004A 天坛 北京 N 51.2 (116.434 39.8745)
5 1005A 农展馆 北京 N 46.9 (116.473 39.9716)
6 1006A 官园 北京 N 49.5 (116.361 39.9425)
The first task is to join the value feature in monitor2 to county. I did this with st_is_within_distance and st_join. See the code below. I set distance to be 50 km. Some counties in the new polygon may have values from multiple points within the 50 km buffer.
new = st_join(county, monitor2,
join = st_is_within_distance, dist = 50)
Here comes the second task. I need to aggregate values from different points within that 50 km buffer by their distances to the centroid of the county. How do I achieve this task?
Any comments are welcome.
It's difficult to know exactly what you want without reproducible data, but here's an attempt to show how you can do this.
Get sample data. We reproject here from lat/long to something with metres so we can do distance based spatial operations. We'll use 3 counties from the sample data and use the middle county as the main polygon we want to measure distances from and add a random sample of points scattered across the three counties.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, 32119) # NC state plane projection in metres
county = st_cast(nc[2,],"POLYGON")
p1 = st_as_sf(st_sample(nc[1:3, ], 200)) # random points
# Visualize
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
We want to focus only on points within some distance from our target county. Let's see what that looks like by adding a buffer using st_buffer.
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
We can subset the points within 10000m of the central county by using st_is_within_distance which would accomplish the same as doing an intersect with the st_buffer object.
p1_10 <- p1[st_is_within_distance(county,p1,dist = 10000, sparse = FALSE),]
Measuring distance between the centroid and each element of this subset is straight forward. We can then assign the distance measurement as a variable in the subset spatial object.
p1_10$distance_to_centroid <- as.vector(st_distance(st_centroid(county), p1_10))
Here's what that looks like plotted altogether
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
plot(p1, add = TRUE, pch = 19)
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
plot(st_centroid(county), col = "red", pch = 15, cex = 1, axes = TRUE, add = TRUE)
plot(p1_10["distance_to_centroid"], add = TRUE, pch = 19)
This is what the p1_10 obj looks like here:
> p1_10
Simple feature collection with 78 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 389967.6 ymin: 293489.4 xmax: 448197.1 ymax: 315140.7
CRS: EPSG:32119
First 10 features:
x distance_to_centroid
1 POINT (437228.1 294079.7) 21703.5425
2 POINT (425029.8 305656.7) 5868.4917
3 POINT (425131.4 309137.8) 6665.0253
4 POINT (409851.2 294971.7) 14549.0585
5 POINT (393070.6 303879.7) 26207.5651
6 POINT (436666.3 296282.2) 20070.5879
7 POINT (442623.8 295976.3) 25549.5662
8 POINT (400517.2 307897.4) 18746.6918
9 POINT (418763.7 306728) 724.6165
10 POINT (405001.4 294845.7) 18125.0738
So from here you can aggregate your features by distance using whatever method you want. In dplyr, it's pretty straightforward. Suppose for example here I wanted to aggregate in 5km intervals.
library(dplyr)
p1_10 %>%
mutate(dist_group = ceiling(distance_to_centroid/5000)) %>%
group_by(dist_group) %>%
tally() %>% # stop here if you want the result to retain geography
as_tibble() %>%
select(dist_group, n)
# A tibble: 7 x 2
dist_group n
<dbl> <int>
1 1 7
2 2 15
3 3 22
4 4 13
5 5 11
6 6 9
7 7 1

identify zip codes that fall within latitude and longitudinal coordinates

I have several data frames in R. The first data frame contains the computed convex hull of a set of lat and long coordinates by market (courtesy of chull in R). It looks like this:
MyGeo<- "Part of Chicago & Wisconsin"
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
dat <- data.frame(Longitude, Latitude, MyGeo)
The second has zip codes by their latitude and longitudinal coordinates (courtesy of the US census website). It looks like this:
CensuseZip <- c("SomeZipCode1","SomeZipCode2","SomeZipCode3","SomeZipCode4","SomeZipCode5","SomeZipCode6","SomeZipCode7")
Longitude2 <- c(-131.470425,-133.457924,-131.693453,-87.64957,-87.99734,-87.895,-88.0228)
Latitude2 <- c(55.138352,56.239062,56.370538,41.87485,42.0086,42.04957,41.81055)
cen <- data.frame(Longitude2, Latitude2, CensuseZip)
Now I believe the first data table provides me with a polygon, or a border, that I should be able to use to identify zip codes that fall within that border. Ideally, I would want to create a third data table that looks something like this:
Longitude2 Latitude2 CensusZip MyGeo
-131.470425 55.138352 SomeZipCode1
-133.457924 56.239062 SomeZipCode2
-131.693453 56.370538 SomeZipCode3
-87.64957 41.87485 SomeZipCode4 Part of Chicago & Wisconsin
-87.99734 42.0086 SomeZipCode5 Part of Chicago & Wisconsin
-87.895 42.04957 SomeZipCode6 Part of Chicago & Wisconsin
-88.0228 41.81055 SomeZipCode7 Part of Chicago & Wisconsin
In essence, I am looking to identify all the zip codes that fall between the blue (see clickable image below) long and lat points. While it is visualized below, I am actually looking for the table described above.
However... I am having trouble doing this... I have tried using the below packages and script:
library(rgeos)
library(sp)
library(rgdal)
coordinates(dat) <- ~ Longitude + Latitude
coordinates(cen) <- ~ Longitude2 + Latitude2
over(cen, dat)
but I receive all NAs.
I use library(sf) to solve this type of point-in-polygon problem (sf is the successor to sp).
The function sf::st_intersection() gives you the intersection of two sf objects. In your case you can construct separate POLYGON and POINT sf objects.
library(sf)
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
## closing the polygon
Longitude[length(Longitude) + 1] <- Longitude[1]
Latitude[length(Latitude) + 1] <- Latitude[1]
## construct sf POLYGON
sf_poly <- sf::st_sf( geometry = sf::st_sfc( sf::st_polygon( x = list(matrix(c(Longitude, Latitude), ncol = 2)))) )
## construct sf POINT
sf_points <- sf::st_as_sf( cen, coords = c("Longitude2", "Latitude2"))
sf::st_intersection(sf_points, sf_poly)
# Simple feature collection with 4 features and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -88.0228 ymin: 41.81055 xmax: -87.64957 ymax: 42.04957
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry
# 4 SomeZipCode4 POINT (-87.64957 41.87485)
# 5 SomeZipCode5 POINT (-87.99734 42.0086)
# 6 SomeZipCode6 POINT (-87.895 42.04957)
# 7 SomeZipCode7 POINT (-88.0228 41.81055)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
The result is all the points which are inside the polygon
You can also use sf::st_join(sf_poly, sf_points) to give the same result
And, the function sf::st_intersects(sf_points, sf_poly) will return a list saying whether the given POINT is inside the polygon
sf::st_intersects(sf_points, sf_poly)
# Sparse geometry binary predicate list of length 7, where the predicate was `intersects'
# 1: (empty)
# 2: (empty)
# 3: (empty)
# 4: 1
# 5: 1
# 6: 1
# 7: 1
Which you can use as an index / identifier of the original sf_points object to add a new column on
is_in <- sf::st_intersects(sf_points, sf_poly)
sf_points$inside_polygon <- as.logical(is_in)
sf_points
# Simple feature collection with 7 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -133.4579 ymin: 41.81055 xmax: -87.64957 ymax: 56.37054
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry inside_polygon
# 1 SomeZipCode1 POINT (-131.4704 55.13835) NA
# 2 SomeZipCode2 POINT (-133.4579 56.23906) NA
# 3 SomeZipCode3 POINT (-131.6935 56.37054) NA
# 4 SomeZipCode4 POINT (-87.64957 41.87485) TRUE
# 5 SomeZipCode5 POINT (-87.99734 42.0086) TRUE
# 6 SomeZipCode6 POINT (-87.895 42.04957) TRUE
# 7 SomeZipCode7 POINT (-88.0228 41.81055) TRUE

Resources