How to efficiently create Linestrings from points? - r

I have geom POINTs in two separate data frames. What I want to do is to connect points with a line (later on a map) so that's why I want to create Linestring for each pair of points from those data frames. I made it like this:
coordsCust <- table %>%
st_as_sf(coords = c("lonCust","latCust"), crs = 4326)
coordsApp <- table %>%
st_as_sf(coords = c("lonApp","latApp"), crs = 4326) %>%
st_geometry()
and Linestring:
lines <- st_sfc(mapply(function(a,b){
st_cast(st_union(a,b),"LINESTRING")},
coordsCust$geometry, coordsApp$geometry, SIMPLIFY=FALSE))
This code works, I can create Linestrings for each pair of points, row by row:
LINESTRING (14.035 51.65182, 14.33418 53.53346)
LINESTRING (20.42767 49.98073, 16.62978 52.31037)
LINESTRING (20.18762 50.03337, 16.62978 52.31037)
LINESTRING (19.04625 49.79234, 16.62978 52.31037)
LINESTRING (21.35808 50.92382, 16.62978 52.31037)
The issue is that for 30 000 rows this solution works really slow - about 21 seconds. Is there any other way to create linestrings from points? Something that works much faster? I searched for some solutions on the web but in vain. I've read something about converting sf to matrix and using pmap but have no idea how to implement it here.
UPDATE: if I want to use sfheaders::sf_linestring function I need to join geometries from both datasets. I do it like this:
df <- cbind(coordsCust,coordsApp)
and the final data frame (I showed most important part of it) is shown below:
Unfortunately sf_linestring doesn't work properly on this dataframe. I need to create linestring between POINTs for each row separately as shown on the screen.

Without an exmaple data set it's hard to completly answer your question. But if you can get your data.frame into 'long' form, then sfheaders can do this in an instant
n <- 30000
df <- data.frame(
x = rnorm(n)
, y = rnorm(n)
)
df$id <- rep(1:(n/2), each = 2)
sfheaders::sf_linestring(
obj = df
, x = "x"
, y = "y"
, linestring_id = "id"
)
# Simple feature collection with 15000 features and 1 field
# geometry type: LINESTRING
# dimension: XY
# bbox: xmin: -4.297631 ymin: -4.118291 xmax: 3.782847 ymax: 4.053399
# CRS: NA
# First 10 features:
# id geometry
# 1 1 LINESTRING (0.2780517 0.243...
# 2 2 LINESTRING (0.4261505 2.503...
# 3 3 LINESTRING (0.8662821 -0.11...
# 4 4 LINESTRING (-0.5335952 -0.1...
# 5 5 LINESTRING (1.154309 -1.352...
# 6 6 LINESTRING (0.05512324 -0.4...
# 7 7 LINESTRING (1.945868 -0.744...
# 8 8 LINESTRING (0.0427066 -0.08...
# 9 9 LINESTRING (0.06738045 0.41...
# 10 10 LINESTRING (0.4128964 -0.04...

Related

having trouble testing point/geometry intersections in sf

i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.
You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…

How to measure length of a section sf linestring inserted between two features from sf multilinestring object in r?

I have a dataset in linestring format and an ocean bathymetry multilinestring (several lines with depth information). My aim is to measure the length section of these linestrings at each depth range (between two multilinestrings) and to have the sum lengths per interval.
EXPECTED OUTPUT EXAMPLE
interval length
1 -2250 and -2500 5200.56 [m]
2 -2500 and -2750 xxxxxxx [m]
3 -2750 and -3000 xxxxxxx [m]
4 -3000 and -3250 xxxxxxx [m]
But, when I use st_intersection() I can get what depth this stretch of line is, but besides the information that comes about the depth being for example -1500 m instead of an interval, (example: -1500 m to -1750 m) I can't measure the length as st_intersection() only counts intersection points. Are there way to make this in r?
DATASET EXAMPLE
library(sf)
library(dplyr)
#creating dataset
id <- c("A","A", "B","B","C","C")
lat <- c(-25.31157, -25.42952, -25.4253, -25.19177, -25.18697, -25.12748)
long <- c(-41.39523, -39.99665, -41.00311, -41.29756, -41.30314, -39.37707)
df <- dplyr::tibble(id = as.factor(id), lat, long)
#convert sf
df.sf = sf::st_as_sf(df,
coords = c("long","lat"),
crs = 4326)
#creating linestrings
lines <- df.sf %>%
dplyr::group_by(id) %>%
dplyr::summarise(do_union = FALSE) %>%
sf::st_cast("LINESTRING")
######attempt at something similar:
#intersect
intersection <- sf::st_intersection (bathy, lines) %>% sf::st_as_sf() %>%
dplyr::mutate(length=sf::st_length(.)) %>%
sf::st_drop_geometry()
#sum length per depth
output <- intersection %>%
group_by(depth) %>%
summarise(length = sum(length)) %>%
arrange(desc(length))
> output
depth length[m]
1 -2250 0 [m]
2 -2500 0 [m]
3 -2750 0 [m]
4 -3000 0 [m]
I unfortunately couldn't recreate a subset of the multilinestring object here, so I put in my github to download a subset in shapefile format if necessary, just click on "CODE" and then on "download ZIP" enter link description here. The object is like this:
> bathy
Simple feature collection with 15690 features and 1 field
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: -53.37185 ymin: -35.77762 xmax: -25 ymax: 7.081626
Geodetic CRS: WGS 84
First 10 features:
PROFUNDIDA geometry
1 -50 MULTILINESTRING ((-52.4267 ...
2 -50 MULTILINESTRING ((-52.77632...
3 -75 MULTILINESTRING ((-51.04274...
4 -75 MULTILINESTRING ((-52.38656...
5 -100 MULTILINESTRING ((-51.07005...
6 -100 MULTILINESTRING ((-52.18633...
7 -200 MULTILINESTRING ((-51.97665...
8 -300 MULTILINESTRING ((-51.95862...
9 -400 MULTILINESTRING ((-51.94465...
10 -500 MULTILINESTRING ((-51.93161...
Here's a rough answer that might help. It would be easier to provide a reprex if your data was smaller. It helps to just pick the absolute minimum amount of data to reproduce your problem.
In this case, all you would need is one linestring for a path, and just a couple bathymetry linestrings. I suggest subsetting your data a lot, and then using the datapasta package or reprex to make an example that can be posted easily.
Assuming intersection is the intersection of just 1 LINESTRING with the bathy data, this might work:
# st_intersects returns both POINT and MULTIPOINT geometries.
# It returns MULTIPOINT when the line x intersects the feature y
# multiple times
# We want just POINT geometries
# this is an unfortunately complex:
intersection %>%
group_by(geomtype = st_geometry_type(geometry)) %>%
group_modify(~ st_cast(.x, "POINT")) %>%
ungroup() %>%
select(-geomtype) %>%
st_as_sf() ->
intersection_points
# Once we have one POINT for each intesection,
# along with the associated information
# (Depth in this example) we can calculate the distance
# between adjacent points using dplyr::lead and sf::st_distance
intersection_points %>%
mutate(geometry2 = lead(geometry)) %>%
mutate(length = st_distance(geometry, geometry2, by_element = TRUE)) ->
intersection_points
# Add Depth ranges:
intersection_points %>%
mutate(interval = paste(as.character(PROFUNDIDA),
"and",
as.character(lead(PROFUNDIDA))))

How to write a function to matches values in two dataframes (make a faster version)

I have a dataframe with coordinates of regions of interest, and another dataframe with temperature readings (bio1) taken in research stations, and their coordinates.
I'd like to create a new column to match the region of interest with the temperature of the nearest research station.
I have managed to do this with the following code (here's a simplified fake dataframe pair)
df1 <- data.frame(latitude = c(10.5,6,2), longitude = c(18,9,4))
df2 <- data.frame(vy = c(10,5,3), vx = c(20,10,3), bio1 = c('a','b','c'))
for(i in 1:nrow(df1)){
df1$temperature[i] <- df2$bio1[which(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i]) ==
min(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i])))]
}
So, this code checks all the combinations and choses the one with the smallest distance between latitude and longitude at each row.
I checked and it seems to work, but it's very slow to use on large dataframes.
Can you solve this issue with a faster method?
Something like this might work
library(tidyverse)
library(sf)
# put some id's in df1
df1$id <- LETTERS[1:3]
# make df1 and df2 simple objects
sf1 <- df1 %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
sf2 <- df2 %>%
st_as_sf(coords = c("vy", "vx"), crs = 4326)
# find nearest sf2 in sf1
sf1 %>%
mutate(nearest_bio = sf2$bio1[st_nearest_feature(sf2)])
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 4 ymin: 2 xmax: 18 ymax: 10.5
# Geodetic CRS: WGS 84
# id geometry nearest_bio
# 1 A POINT (18 10.5) b
# 2 B POINT (9 6) c
# 3 C POINT (4 2) b

Create numerous lines in Simple Features from list of coordinates in R

I am trying to detect whether pairs of objects (trees) are separated by roads or lie on the same side of them. I have downloaded my road network and think I more or less understand how to use st_intersects. So all I am missing are line segments between the pairs of trees I am considering in order to test intersections with the roads..
However, I cannot seem to figure out how to create lines between my objects. I have a large number of pairs (300K+), so must be able to do this programmatically, whereas all the examples I am finding seem to be "hand coded".
Suppose the following two matrices, containing the coordinates of the "origin" and "destination" of each pair.
orig = matrix(runif(20),ncol=2)
dest = matrix(runif(20),ncol=2)
In this example, I need to create 10 lines: one between orig[1,] and dest[1,], another (distinct) one between orig[2,] and dest[2,], etc. My understanding is that I should be using st_multilinestring, but I cannot figure out how to formulate the call. Typically, I either end up with "XYZM" points, or with a multi-segment line starting at orig[1,] and terminating at dest[10,] after going through all other coordinates. And when it is not one of these outcomes, it is a whole host of errors.
Is st_multilinestring what I should be using and if so, how does one do this? Thanks!!
Here's a way to construct the sfc / sf object using library(sfheaders)
library(sf)
library(sfheaders)
## If you add a pseudo-id column
orig <- cbind( orig, 1:nrow( orig ) )
dest <- cbind( dest, 1:nrow( dest ) )
## you can bind these matrices together
m <- rbind( orig, dest )
## set the order by the 'id' column
m <- m[ order( m[,3] ), ]
## then use `sfheaders` to create your linestrings
sfc <- sfheaders::sfc_linestring(
obj = m
, linestring_id = 3 ## 3rd column
)
sfc
# Geometry set for 10 features
# geometry type: LINESTRING
# dimension: XY
# bbox: xmin: 0.01952919 ymin: 0.04603703 xmax: 0.9172785 ymax: 0.9516615
# epsg (SRID): NA
# proj4string: NA
# First 5 geometries:
# LINESTRING (0.7636528 0.2465392, 0.05899529 0.7...
# LINESTRING (0.6435893 0.9158161, 0.01952919 0.1...
# LINESTRING (0.05632407 0.3106372, 0.03306822 0....
# LINESTRING (0.1978259 0.07432209, 0.2907429 0.0...
# LINESTRING (0.1658199 0.6436758, 0.1407145 0.75...
Loop over rows of your origin and destination matrices using lapply and create a vector of LINESTRING objects:
> lines = do.call(st_sfc,
lapply(
1:nrow(orig),
function(i){
st_linestring(
matrix(
c(orig[i,],dest[i,]), ncol=2,byrow=TRUE)
)
}
)
)
This gives you this:
> lines
Geometry set for 10 features
geometry type: LINESTRING
dimension: XY
bbox: xmin: 0.06157865 ymin: 0.007712881 xmax: 0.967166 ymax: 0.9864812
epsg (SRID): NA
proj4string: NA
First 5 geometries:
LINESTRING (0.6646269 0.1545195, 0.8333102 0.40...
LINESTRING (0.5588124 0.5166538, 0.3213998 0.08...
LINESTRING (0.06157865 0.6138778, 0.06212246 0....
LINESTRING (0.202455 0.4883115, 0.5569435 0.986...
LINESTRING (0.3120373 0.8189916, 0.8499419 0.73...
Let's check we got all that the right way round. Where's the fourth line come from and going to?
> orig[4,]
[1] 0.2024550 0.4883115
> dest[4,]
[1] 0.5569435 0.9864812
which looks like the coordinates in the fourth LINESTRING output.
You can then st_intersects this with another set of features and see which of these cross them.
(You might also need to add the coordinate system to them...)

identify zip codes that fall within latitude and longitudinal coordinates

I have several data frames in R. The first data frame contains the computed convex hull of a set of lat and long coordinates by market (courtesy of chull in R). It looks like this:
MyGeo<- "Part of Chicago & Wisconsin"
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
dat <- data.frame(Longitude, Latitude, MyGeo)
The second has zip codes by their latitude and longitudinal coordinates (courtesy of the US census website). It looks like this:
CensuseZip <- c("SomeZipCode1","SomeZipCode2","SomeZipCode3","SomeZipCode4","SomeZipCode5","SomeZipCode6","SomeZipCode7")
Longitude2 <- c(-131.470425,-133.457924,-131.693453,-87.64957,-87.99734,-87.895,-88.0228)
Latitude2 <- c(55.138352,56.239062,56.370538,41.87485,42.0086,42.04957,41.81055)
cen <- data.frame(Longitude2, Latitude2, CensuseZip)
Now I believe the first data table provides me with a polygon, or a border, that I should be able to use to identify zip codes that fall within that border. Ideally, I would want to create a third data table that looks something like this:
Longitude2 Latitude2 CensusZip MyGeo
-131.470425 55.138352 SomeZipCode1
-133.457924 56.239062 SomeZipCode2
-131.693453 56.370538 SomeZipCode3
-87.64957 41.87485 SomeZipCode4 Part of Chicago & Wisconsin
-87.99734 42.0086 SomeZipCode5 Part of Chicago & Wisconsin
-87.895 42.04957 SomeZipCode6 Part of Chicago & Wisconsin
-88.0228 41.81055 SomeZipCode7 Part of Chicago & Wisconsin
In essence, I am looking to identify all the zip codes that fall between the blue (see clickable image below) long and lat points. While it is visualized below, I am actually looking for the table described above.
However... I am having trouble doing this... I have tried using the below packages and script:
library(rgeos)
library(sp)
library(rgdal)
coordinates(dat) <- ~ Longitude + Latitude
coordinates(cen) <- ~ Longitude2 + Latitude2
over(cen, dat)
but I receive all NAs.
I use library(sf) to solve this type of point-in-polygon problem (sf is the successor to sp).
The function sf::st_intersection() gives you the intersection of two sf objects. In your case you can construct separate POLYGON and POINT sf objects.
library(sf)
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
## closing the polygon
Longitude[length(Longitude) + 1] <- Longitude[1]
Latitude[length(Latitude) + 1] <- Latitude[1]
## construct sf POLYGON
sf_poly <- sf::st_sf( geometry = sf::st_sfc( sf::st_polygon( x = list(matrix(c(Longitude, Latitude), ncol = 2)))) )
## construct sf POINT
sf_points <- sf::st_as_sf( cen, coords = c("Longitude2", "Latitude2"))
sf::st_intersection(sf_points, sf_poly)
# Simple feature collection with 4 features and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -88.0228 ymin: 41.81055 xmax: -87.64957 ymax: 42.04957
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry
# 4 SomeZipCode4 POINT (-87.64957 41.87485)
# 5 SomeZipCode5 POINT (-87.99734 42.0086)
# 6 SomeZipCode6 POINT (-87.895 42.04957)
# 7 SomeZipCode7 POINT (-88.0228 41.81055)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
The result is all the points which are inside the polygon
You can also use sf::st_join(sf_poly, sf_points) to give the same result
And, the function sf::st_intersects(sf_points, sf_poly) will return a list saying whether the given POINT is inside the polygon
sf::st_intersects(sf_points, sf_poly)
# Sparse geometry binary predicate list of length 7, where the predicate was `intersects'
# 1: (empty)
# 2: (empty)
# 3: (empty)
# 4: 1
# 5: 1
# 6: 1
# 7: 1
Which you can use as an index / identifier of the original sf_points object to add a new column on
is_in <- sf::st_intersects(sf_points, sf_poly)
sf_points$inside_polygon <- as.logical(is_in)
sf_points
# Simple feature collection with 7 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -133.4579 ymin: 41.81055 xmax: -87.64957 ymax: 56.37054
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry inside_polygon
# 1 SomeZipCode1 POINT (-131.4704 55.13835) NA
# 2 SomeZipCode2 POINT (-133.4579 56.23906) NA
# 3 SomeZipCode3 POINT (-131.6935 56.37054) NA
# 4 SomeZipCode4 POINT (-87.64957 41.87485) TRUE
# 5 SomeZipCode5 POINT (-87.99734 42.0086) TRUE
# 6 SomeZipCode6 POINT (-87.895 42.04957) TRUE
# 7 SomeZipCode7 POINT (-88.0228 41.81055) TRUE

Resources