R: How could I reduce runtime of osmdata to an igraph conversion - r

Is it possible to reduce the run time of the following code?
My goal is to get an weighted igraph object from open street data area specified by a box boundary.
Currently im trying to use the overpass api so to offload the memory load so I dont have to keep big osm files in memory.
First I get a osm data specified by a bbox (only streets) as an xml structure
library(osmdata)
library(osmar)
install.packages("remotes")
remotes::install_github("hypertidy/scgraph")
library(scgraph)
dat <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_xml ()
Then I convert the resulting xml object dat to an osmar object dat_osmar and finally to an igraph object:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
How could I optimize these routines?
Maybe it is possible to separate dat (XML) object in chunks and parse it in parallell?
I go through several steps only to finally to get to a weighted non directed graph.
Currently the whole process takes 89.555 sec on my machine.
If I could cut the running time of these two stepps:
dat_osmar <-as_osmar(xmlParse(dat))
dat_graoh <- as_igraph(dat_osmar)
that would help already.
One of the approaches I tried is to use osmdata_sc() instead of osmdata_xml().
This provides an silicate object and I can convert it with:
scgraph::sc_as_igraph(dat)
to an igraph.
It is decently fast but sadly the weights are getting lost so its not a solution.
The reason for it is: if I use the conversion from osmar object to an igraph object with the function osmar::as_igraph() the weight is calculated based on the distances between two edges and added to the igraph:
edges <- lapply(dat, function(x) {
n <- nrow(x)
from <- 1:(n - 1)
to <- 2:n
weights <- distHaversine(x[from, c("lon", "lat")], x[to,
c("lon", "lat")])
cbind(from_node_id = x[from, "ref"], to_node_id = x[to,
"ref"], way_id = x[1, "id"], weights = weights)
})
This is missing from scgraph::sc_as_igraph(dat)
If this could be added to silicate to igraph conversion
I could skip the dat_osmar <-as_osmar(xmlParse(dat)) step
and go overpass->silicate->igraph route which is much faster istead of overpass->xml->osmar->igraph.
osmdata package also provides a sf response with osmdata_sf()
so maybe the workflow overpass->sf->igraph is faster but also while using this way I would need the weights incorporated into the graph based on the distance of edges and im not good enough to do it currently and would realy appreciate any help.
Furthermore the connection between openstreetmap gps points and their IDs should not be lost while using sf and resulting igraph object. Meaning I should be able to find gps position to an ID from the resulting Igraph. A lookup table would be enough. If i go overpass->silicate->igraph or overpass->xml->osmar->igraph routes it is possible. Im not sure if it is still would be posible with overpass->sf->igraph route.

If you want to create a graph object starting from a network of roads in R, then I would use the following procedure.
First of all, I need to install sfnetworks from the github repo (since we recently fixed some bugs and the newest version is not on CRAN)
remotes::install_github("luukvdmeer/sfnetworks", quiet = TRUE)
Then load packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(tidygraph)
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(sfnetworks)
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
Download data from Overpass API
my_osm_data <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(
key = 'highway',
value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified")
) %>%
osmdata_sf(quiet = FALSE)
#> Issuing query to Overpass API ...
#> Rate limit: 2
#> Query complete!
#> converting OSM data to sf format
Now I extract the roads and build the sfnetwork object:
system.time({
# extract the roads
my_roads <- st_geometry(my_osm_data$osm_lines)
# build the sfnetwork object
my_sfn <- as_sfnetwork(my_roads, directed = FALSE, length_as_weight = TRUE)
})
#> user system elapsed
#> 3.03 0.16 3.28
As you can see, after downloading the OSM data, it takes just a couple of seconds to run that procedure.
At the moment I ignore all fields in my_osm_data$osm_lines, but if you need to add some of the columns in my_osm_data$osm_lines to my_roads, then you can modify the previous code as follows: my_roads <- my_osm_data$osm_lines[, "relevant columns"].
A few details regarding the construction of the sfnetwork object: the parameter "directed = FALSE" specifies that we want to build an undirected graph (see the docs, here and here for more details), while the parameter length_as_weight = TRUE says that the length of the edges will be stored in a column called "weight" and used by igraph and tidygraph algorithms.
This is the printing of my_sfn object:
my_sfn
#> # A sfnetwork with 33179 nodes and 28439 edges
#> #
#> # CRS: EPSG:4326
#> #
#> # An undirected multigraph with 6312 components with spatially explicit edges
#> #
#> Registered S3 method overwritten by 'cli':
#> method from
#> print.boxx spatstat.geom
#> # Node Data: 33,179 x 1 (active)
#> # Geometry type: POINT
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> x
#> <POINT [°]>
#> 1 (11.68861 47.90971)
#> 2 (11.68454 47.90937)
#> 3 (11.75216 48.17638)
#> 4 (11.75358 48.17438)
#> 5 (11.7528 48.17351)
#> 6 (11.74822 48.17286)
#> # ... with 33,173 more rows
#> #
#> # Edge Data: 28,439 x 4
#> # Geometry type: LINESTRING
#> # Dimension: XY
#> # Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
#> from to x weight
#> <int> <int> <LINESTRING [°]> <dbl>
#> 1 1 2 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 1~ 306.
#> 2 3 4 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, ~ 246.
#> 3 5 6 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 1~ 382.
#> # ... with 28,436 more rows
my_sfn is an igraph object by definition:
class(my_sfn)
#> [1] "sfnetwork" "tbl_graph" "igraph"
but, if you want to be more explicit, then
as.igraph(my_sfn)
#> IGRAPH 101dcdf U-W- 33179 28439 --
#> + attr: x (v/x), x (e/x), weight (e/n)
#> + edges from 101dcdf:
#> [1] 1-- 2 3-- 4 5-- 6 7-- 8 9-- 10 11-- 12 13-- 14 15-- 16
#> [9] 17-- 18 16-- 19 20-- 21 21-- 22 23-- 24 25-- 26 27-- 28 29-- 30
#> [17] 31-- 32 33-- 34 35-- 36 37-- 38 39-- 40 41-- 42 43-- 44 45-- 46
#> [25] 14-- 47 48-- 49 50-- 51 52-- 53 54-- 55 56-- 57 36-- 58 58-- 59
#> [33] 60-- 61 62-- 63 64-- 65 66-- 67 68-- 69 70-- 71 72-- 73 74-- 75
#> [41] 76-- 77 78-- 79 80-- 81 82-- 83 84-- 85 86-- 87 88-- 89 90-- 91
#> [49] 92-- 93 94-- 95 96-- 97 98-- 99 100--101 102--103 104--105 106--107
#> [57] 108--109 110--111 112--113 80--114 115--116 117--118 119--120 121--122
#> + ... omitted several edges
You can see that the edges have a weight attribute that is equal to the length of each LINESTRING geometry:
all.equal(
target = igraph::edge_attr(as.igraph(my_sfn), "weight"),
current = as.numeric(st_length(my_roads))
)
#> [1] TRUE
Created on 2021-03-26 by the reprex package (v1.0.0)
If you want to read more details regarding sfnetworks, then you can check the website and the introductory vignettes. That being said, I don't understand what you mean by
connection between openstreetmap gps points and their IDs should not be lost
Can you add more details with a comment or an edit to the original question? Why do you need to OSM id? And what do you mean by OSM id? I think that I need more details to expand this answer.
EDIT
I just re-read #mrhellmann 's answer, and I noticed that I forgot to convert POLYGON data to lines. Anyway, I would suggest applying osmdata::osm_poly2line() immediately after running the code to download OSM data via Overpass API.

It seems like getting the xml data into another format is taking a long time. Instead of using xml, asking overpass to return an sf object and using that might be quicker. The sf object can then be manipulated & used by the sfnetworks package to get a network, while retaining the spatial aspects of the network. Weights can be added by functions from sfnetworks (or tidygraph) packages.
I think the code below focuses on taking care of the speed and the edge weight problems. Other problems, like finding the nearest nodes or edges, can be solved using the functions of the sf package, but are not addressed. Otherwise this becomes more than a one-off SO question.
The speed might be able to be increased, at the cost of spatial accuracy, by using st_simplify for the edges. One problem with this approach is that stnetworks places a node where each linestring meets another. The data returned often has a single roadway split into multiple linestrings. As an example, see two longer roads in yellow on the edges plot below. Probably a solvable problem, but may not be worthwhile in this case.
library(osmdata)
#library(osmar)
library(tidyverse)
library(sf)
library(ggplot2)
library(sfnetworks)
library(tidygraph)
# get data as an sf object rather than xml
## This is the slowest part of the code.
dat_sf <- opq(bbox = c(11.68771, 47.75233, 12.35058, 48.19743 )) %>%
add_osm_feature(key = 'highway',value = c("trunk", "trunk_link", "primary","primary_link", "secondary", "secondary_link", "tertiary","tertiary_link", "residential", "unclassified" ))%>%
osmdata_sf()
# Only keep lines & polygons. Points take up too much memory &
## all seem to be on lines anyway. Change polygons to LINESTRING,
## as most seem to be roundabouts or a few odd cases.
lines_sf <- dat_sf$osm_lines %>% select(osm_id) %>% st_sf()
polys_sf <- dat_sf$osm_polygons %>% select(osm_id) %>% st_sf() %>%
st_cast('LINESTRING')
# Combine the two above sf objects into one
dat_sf_bound <- rbind(lines_sf, polys_sf)
# get an sfnetwork
dat_sf_net <- as_sfnetwork(dat_sf_bound)
# add edge weight as distance
dat_sf_net <- dat_sf_net %>%
activate(edges) %>%
mutate(weight = edge_length())
dat_sf_net should look like:
> dat_sf_net
# An sfnetwork with 33255 nodes and 28608 edges
#
# CRS: EPSG:4326
#
# A directed multigraph with 6391 components with spatially explicit edges
#
# Edge Data: 28,608 x 4 (active)
# Geometry type: LINESTRING
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
from to weight geometry
<int> <int> [m] <LINESTRING [°]>
1 1 2 306.3998 (11.68861 47.90971, 11.6878 47.90965, 11.68653 47.90954, 11.68597 47.909…
2 3 4 245.9225 (11.75216 48.17638, 11.75224 48.17626, 11.75272 48.17556, 11.7528 48.175…
3 5 6 382.2423 (11.7528 48.17351, 11.75264 48.17344, 11.75227 48.17329, 11.752 48.1732,…
4 7 8 131.1373 (11.70029 47.87861, 11.70046 47.87869, 11.70069 47.87879, 11.70138 47.87…
5 9 10 252.9170 (11.75733 48.17339, 11.75732 48.17343, 11.75726 48.17357, 11.75718 48.17…
6 11 12 131.6942 (11.75582 48.17036, 11.75551 48.1707, 11.75521 48.17106, 11.75507 48.171…
# … with 28,602 more rows
#
# Node Data: 33,255 x 1
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 11.6757 ymin: 47.74745 xmax: 12.39161 ymax: 48.22025
geometry
<POINT [°]>
1 (11.68861 47.90971)
2 (11.68454 47.90937)
3 (11.75216 48.17638)
# … with 33,252 more rows
Plots of just edges, and of edges with nodes:
A few long roads skew the coloring, but illustrate the a single road split in two.
Edit:
Addressing the comment to select nearest edge(road) by latitude / longitude coordinates. Nodes(intersections / red dots above) do not have osm id numbers that I am aware of. Nodes are created by sfnetworks.
Starting with an sf object of lat/lon point as our made up gps coordinate.
# random point
gps <- sfheaders::sf_point(data.frame(x = 11.81854, y = 48.04514)) %>% st_set_crs(4326)
# nearest edge(road) to the point. dat_sf_net must have edges activated.
near_edge <- st_nearest_feature(gps, dat_sf_net %>% st_as_sf())
>near_edge
[1] 4359
> st_as_sf(dat_sf_net)[near_edge,]
Simple feature collection with 1 feature and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 11.81119 ymin: 48.02841 xmax: 11.82061 ymax: 48.06845
Geodetic CRS: WGS 84
# A tibble: 1 x 5
from to osm_id geometry weight
<int> <int> <chr> <LINESTRING [°]> [m]
1 7590 7591 24232418 (11.81289 48.02841, 11.81213 48.03014, 11.81202 48.03062, 11.81… 4576.273
p3 <- gplot() +
geom_sf(data = st_as_sf(dat_sf_net), color = 'black') +
geom_sf(data = gps, color = 'red') +
geom_sf(data = st_as_sf(dat_sf_net)[near_edge,], color = 'orange') +
coord_sf(xlim = c(11.7, 11.9), ylim = c(48, 48.1))
It looks like #agila commented above. Since he's an author of sfnetworks, maybe he will have some insights.

Related

having trouble testing point/geometry intersections in sf

i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.
You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…

Plotting image files over states in a US Map in R

Essentially, what I want to do is utilize my work's sales data to see which of our brands is selling the most in each US state - and have the brand image file in each respective state.
For instance, if the best-selling brand of food we sell in Utah is Nestle - then I want the Nestle logo in that state.
My data set looks like this:
State Brand Sales %TTL
AK Nestle $260 8%
AL Mars $480 10%
AZ Coca Cola $319 12%
...
WY Nestle $200 25%
I have the image files from Google, but I have no idea how to make this work. I know there's the cartography package and I've been following guide - but it isn't really 1:1. I can't even get the sample code to execute because it says it can't find the online address
I don't want this done for me - but how do I start? I essentially want it to look like the map in the first image, but have it correspond with images of the brands we work with.
Tableau didn't really have an optimal solution and this was done in R originally, so I'm trying to replicate it, but it's been proving difficult.
Well, see here an adaptation:
First you need to get a sf (a map ) of the US states. I use here USAboundaries but you can use whaterver you prefer.
Note that I needed to fake your data. Use your own.
Use rowwise() and switch() to add a column to your data with the url of the png
On the loop, for each brand: select the corresponding state, create the image overlay and add the layer to the plot.
See here a reproducible example:
library(USAboundaries)
#> The USAboundariesData package needs to be installed.
#> Please try installing the package using the following command:
#> install.packages("USAboundariesData", repos = "https://ropensci.r-universe.dev", type = "source")
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(tidyverse)
library(rasterpic)
library(tidyterra)
# Part 1: The map
states <- USAboundaries::states_contemporary_lores %>%
select(State = state_abbr) %>%
# Filter AK and HW
filter(!(State %in% c("AK", "HI", "PR"))) %>%
st_transform("ESRI:102003")
states
#> Simple feature collection with 49 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -2356114 ymin: -1338125 xmax: 2258154 ymax: 1558935
#> Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
#> First 10 features:
#> State geometry
#> 1 CA MULTIPOLYGON (((-2066285 -2...
#> 2 WI MULTIPOLYGON (((708320.1 91...
#> 3 ID MULTIPOLYGON (((-1673882 95...
#> 4 MN MULTIPOLYGON (((-91052.17 1...
#> 5 IA MULTIPOLYGON (((-50588.83 5...
#> 6 MO MULTIPOLYGON (((19670.04 34...
#> 7 MD MULTIPOLYGON (((1722285 240...
#> 8 OR MULTIPOLYGON (((-2285910 94...
#> 9 MI MULTIPOLYGON (((882371.5 99...
#> 10 MT MULTIPOLYGON (((-1474367 14...
# Base map
plot <- ggplot(states) +
geom_sf(fill = "grey90") +
theme_minimal() +
theme(panel.background = element_rect(fill = "lightblue"))
plot
# Part 2: your data (I have to fake it)
# Use here your own data
# Assign 3 random brands
brands <- data.frame(State = states$State)
set.seed(1234)
brands$Brand <- sample(c("Nestle", "Mars", "Coca Cola"), nrow(brands), replace = TRUE)
# Part 3: find your pngs
logos <- brands %>%
rowwise() %>%
mutate(png = switch(Brand,
"Nestle" = "https://1000marcas.net/wp-content/uploads/2020/01/Logo-Nestle.png",
"Mars" = "https://1000marcas.net/wp-content/uploads/2020/02/Logo-Mars.png",
"Coca Cola" = "https://w7.pngwing.com/pngs/873/613/png-transparent-world-of-coca-cola-fizzy-drinks-diet-coke-coca-cola-text-logo-cola.png",
"null"
))
# Now loop
for (i in seq_len(nrow(logos))) {
logo <- logos[i, ]
shape <- states[states$State == logo$State, ]
img <- rasterpic_img(shape, logo$png, mask = TRUE)
plot <- plot + geom_spatraster_rgb(data = img)
}
plot
Created on 2022-06-03 by the reprex package (v2.0.1)

How to solve this problem with the st_distance function from sf giving an error when inspecting the data?

I have this data:
df <- data.frame (pc_home = c(1042, 2052, NA, 4021, 9423, NA, 1502, 5942),
pc_work = c(NA, 2105, NA, 4352, 8984, NA, 1495, 6050),
centroid_home = c(c(122239.347627534, 487236.185950724), c(121552.622967901, 487511.344167049), c(NA, NA), c(120168.155075649, 489952.753092173), c(119154.137476474, 489381.429089547), c(NA,NA), c(120723.216386427, 487950.166456445), c(120570.498333358, 487104.749088018))
centroid_work = c(c(NA, NA), c(121337.696586159, 486235.561338213), c(NA, NA), c(123060.850070339, 486752.640463608), c(124354.37048732, 487473.329840357), c(NA,NA), c(123171.113425247, 488458.596501631), c(123952.971290978, 489249.568149519))
)
The centroids were calculated using st_centroid() on a shapefile. The c(NA,NA) were the result of missing postal codes used to calculate centroids.
And I use this code:
library(sf)
df <- df %>%
mutate(dist_hw = st_distance(centroid_home, centroid_work))
No errors, but inspecting the data, I get weird results. In the dataframe view I see no results, and when I try to sort (to see if there are any results), I get this error:
Error in `stop_subscript()`:
! Can't subset elements that don't exist.
x Locations 4324, 7679, 11034, 13428, 16783, etc. don't exist.
i There are only 3355 elements.
I wonder if the error is caused by the NAs or something else?
If it is caused by the NAs, how do I solve that?
All I want is to calculate distance between the points.
Hard to do with the sample data provided. It needs to be in an sf style data frame, and needs a crs as well. Assuming you have those, but had difficulty posting them, the solution below should work. Your sf object needs to have two geometry columns, and it looks like yours should.
Using st_distance() should work, with the by_element = T argument. Examples below to either use st_distance() directly, or in dplyr::mutate to add a column for distance to the sf data frame.
library(sf)
library(tidyverse)
#### Making reproducible data
# get the nc data, make the geometry column a point with st_centroid
nc = st_read(system.file("shape/nc.shp", package="sf")) %>%
select(NAME) %>% st_centroid()
# jitter the centroid point and add (cbind) as a second geometry column
geo2 <- st_geometry(nc) %>% st_jitter()
nc <- cbind(nc, geo2)
####
# Find the distance between the points, row-by-row
st_distance(nc$geometry,nc$geometry.1, by_element = T) %>% head()
#> Units: [m]
#> [1] 965.8162 2030.5782 1833.3081 1909.5538 1408.7908 820.0569
# or use mutate to add a column to the sf df.
nc %>% mutate(dist = st_distance(geometry, geometry.1, by_element = T))
#> Simple feature collection with 100 features and 2 fields
#> Active geometry column: geometry
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -84.05986 ymin: 34.07671 xmax: -75.8095 ymax: 36.49111
#> Geodetic CRS: NAD27
#> First 10 features:
#> NAME geometry geometry.1
#> 1 Ashe POINT (-81.49823 36.4314) POINT (-81.49685 36.44001)
#> 2 Alleghany POINT (-81.12513 36.49111) POINT (-81.13681 36.47545)
#> 3 Surry POINT (-80.68573 36.41252) POINT (-80.69163 36.39673)
#> 4 Currituck POINT (-76.02719 36.40714) POINT (-76.02305 36.39029)
#> 5 Northampton POINT (-77.41046 36.42236) POINT (-77.40909 36.40974)
#> 6 Hertford POINT (-76.99472 36.36142) POINT (-76.98777 36.36623)
#> 7 Camden POINT (-76.23402 36.40122) POINT (-76.23969 36.4181)
#> 8 Gates POINT (-76.70446 36.44428) POINT (-76.70953 36.45603)
#> 9 Warren POINT (-78.11042 36.39693) POINT (-78.11619 36.38541)
#> 10 Stokes POINT (-80.23429 36.40042) POINT (-80.24365 36.39904)
#> dist
#> 1 965.8162 [m]
#> 2 2030.5782 [m]
#> 3 1833.3081 [m]
#> 4 1909.5538 [m]
#> 5 1408.7908 [m]
#> 6 820.0569 [m]
#> 7 1944.8192 [m]
#> 8 1382.9058 [m]
#> 9 1381.7946 [m]
#> 10 851.1106 [m]
Created on 2022-04-13 by the reprex package (v2.0.1)

Finding Overlapping Buffers in R

I have coordinates which I added a 75m buffer around. I want to filter the buffers by year and then determine if any buffers from observations within the same year are overlapping. I want to be able to identify which buffers are overlapping to either omit them if they are repeated observations or merge them if they are touching.
e_af_df <- na.omit(e_af_simp_21)
e_af_obs <- st_as_sf(e_af_df, coords=c(7, 6), crs = st_crs(4326))
e_af_t <- st_transform(e_af_obs, 5070)
e_buffers = st_buffer(e_af_t, dist=75)
e_buffers$Year <- format(as.Date(e_buffers$Date, format="%m/%d/%Y"), "%Y")
b.2016 <- subset(e_buffers, e_buffers$Year == "2016")
So far this works to draw my buffer (sf polygons) and filter by year, using 2016 as an example here. I then try to find the overlapping buffers using st_intersection.
o.2016 = st_intersection(b.2016)
summary(o.2016)
This tells me that 1718 buffers are overlapping, which I find unlikely since there is only a total of 2768 for that year. I think there might be a double-counting of buffer IDs. Regardless, I am unable to see which buffers are intersecting with each other, just that intersections are occurring.
For some other years, doing the same thing results in an error.
Error in CPL_nary_intersection(x) : GEOS exception
I read that changing the precision may fix this so I used st_set_precision. Yet this only fixed the error for certain years as well.
Does anyone know of a simple way to find overlapping buffers or a way to fix this code?
Let's go step by step. As I don't have access to your data, let's create some dummy polygons:
library(sf)
library(dplyr)
p <- st_point(x=c(10, 10))
area <- st_buffer(p, 10)
polygons <- st_buffer(st_sample(area, 25), 1)
Now, let's calculate intersections between them:
inter <- st_intersection(st_as_sf(polygons))
inter
#> Simple feature collection with 36 features and 2 fields
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: -0.264212 ymin: 1.4698 xmax: 19.96589 ymax: 20.05445
#> CRS: NA
#> First 10 features:
#> n.overlaps origins x
#> 1 1 1 POLYGON ((18.95204 14.79623...
#> 2 1 2 MULTIPOLYGON (((5.254102 3....
#> 3 1 3 POLYGON ((2.574736 14.99139...
#> 4 1 4 POLYGON ((1.735788 12.80523...
#> 5 1 5 POLYGON ((18.25878 13.02976...
#> 6 1 6 POLYGON ((2.64105 10.30672,...
#> 7 1 7 POLYGON ((8.714673 17.29225...
#> 2.1 2 2, 8 POLYGON ((5.154202 2.707236...
#> 8 1 8 POLYGON ((5.592733 3.483773...
#> 9 1 9 POLYGON ((3.852673 16.92456...
Please note columns n.overlaps and origins. Let's filter out only intersections:
a <- inter |>
filter(n.overlaps > 1)
plot(polygons)
plot(a, col = "blue", add = TRUE)
Created on 2022-02-10 by the reprex package (v2.0.1)
Please note, the functions:
st_intersects(polygons)
st_overlaps(polygons)
st_touches(polygons)
are binary predicates, they can be used to check if geometries intersect, etc. Example:
st_intersects(polygons, sparse = TRUE)
#> Sparse geometry binary predicate list of length 25, where the
#> predicate was `intersects'
#> first 10 elements:
#> 1: 1, 15, 20, 21
#> 2: 2
#> 3: 3, 9, 14
#> [...]
Based on it you can write a simple function which will remove (or union) the geometries.
Regards,
Grzegorz

Calculate the average of a cloud of points based on a grouping variable with sf in R

I have a bunch of points where I want to calculate the average summarized for each grouping variable:
x = st_sfc(st_polygon(list(rbind(c(0,0),c(90,0),c(90,90),c(0,90),c(0,0)))), crs = st_crs(4326))
plot(x, axes = TRUE, graticule = TRUE)
plot(p <- st_sample(x, 7), add = TRUE)
p=st_as_sf(p)
p$test=c("A","A","B","C","C","D","D")
When using dplyr, like this, I get an NA.
p %>%
group_by(test) %>%
summarize(geometry = mean(geometry))
I just want the average into the geometry, not 1 point, nor multipoints.
Not sure to fully understand what you are looking for but I am giving it a try!
So, please find one possible solution with a reprex below using sf and dplyr libraries. I guess you were looking for the aggregate() function instead of group_by()
Reprex
Code
library(sf)
library(dplyr)
R1 <- p %>% aggregate(.,
by = list(.$test),
function(x) x = x[1]) %>%
st_centroid() %>%
select(-Group.1)
#> Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
#> geometries of x
Output 1 (sf object)
R1
#> Simple feature collection with 4 features and 1 field
#> Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 2.7875 ymin: 12.91954 xmax: 59.60413 ymax: 51.81421
#> Geodetic CRS: WGS 84
#> test geometry
#> 1 A POINT (27.17167 12.91954)
#> 2 B POINT (2.7875 22.54184)
#> 3 C POINT (59.60413 46.90029)
#> 4 D POINT (56.34763 51.81421)
Complementary code and Output 2 (i.e. if you just need a dataframe)
R2 <- R1 %>%
st_coordinates() %>%
cbind(st_drop_geometry(R1),.)
R2
#> test X Y
#> 1 A 27.17167 12.91954
#> 2 B 2.78750 22.54184
#> 3 C 59.60413 46.90029
#> 4 D 56.34763 51.81421
Visualization
plot(x)
plot(p, add = TRUE)
plot(R1, pch = 15, add = TRUE)
Points are your data and small squares are centroids for each group
(FYI, I set the seed to 427 for reproducibility purpose)
NB: The above uses spherical geometry. If you want to do planar computations you just need to add sf_use_s2(FALSE) at the beginning of the script. To show you the difference, here is the result using sf_use_s2(FALSE) (in this case, you can see that, for each group, the centroid is located precisely on the line connecting the two points;
it is up to you to choose according to your needs)
Created on 2022-01-03 by the reprex package (v2.0.1)

Resources