polygons from coordinates - r

I've got a data.frame with lats and lngs that define the boundaries of rectangular boxes, like so
geohash north_lat south_lat east_lng west_lng
1 gbsuv 48.69141 48.64746 -4.306641 -4.350586
2 gbsuy 48.69141 48.64746 -4.262695 -4.306641
What's the easiest way to convert this into an sf object that holds a column of POLYGONs?

The key to creating polygons is that the coordinates have to be in sequence to form a closed area (i.e., the last point is the same as the first point).
So your data will need a bit of manipulation to create the coordinates, and put them in order. In my example I've done this with an lapply
Then the rest can be taken from the sf examples
lst <- lapply(1:nrow(df), function(x){
## create a matrix of coordinates that also 'close' the polygon
res <- matrix(c(df[x, 'north_lat'], df[x, 'west_lng'],
df[x, 'north_lat'], df[x, 'east_lng'],
df[x, 'south_lat'], df[x, 'east_lng'],
df[x, 'south_lat'], df[x, 'west_lng'],
df[x, 'north_lat'], df[x, 'west_lng']) ## need to close the polygon
, ncol =2, byrow = T
)
## create polygon objects
st_polygon(list(res))
})
## st_sfc : creates simple features collection
## st_sf : creates simple feature object
sfdf <- st_sf(geohash = df[, 'geohash'], st_sfc(lst))
sfdf
# Simple feature collection with 2 features and 1 field
# geometry type: POLYGON
# dimension: XY
# bbox: xmin: 48.64746 ymin: -4.350586 xmax: 48.69141 ymax: -4.262695
# epsg (SRID): NA
# proj4string: NA
# geohash st_sfc.lst.
# 1 gbsuv POLYGON((48.69141 -4.350586...
# 2 gbsuy POLYGON((48.69141 -4.306641...
plot(sfdf)

Related

How to write a function to matches values in two dataframes (make a faster version)

I have a dataframe with coordinates of regions of interest, and another dataframe with temperature readings (bio1) taken in research stations, and their coordinates.
I'd like to create a new column to match the region of interest with the temperature of the nearest research station.
I have managed to do this with the following code (here's a simplified fake dataframe pair)
df1 <- data.frame(latitude = c(10.5,6,2), longitude = c(18,9,4))
df2 <- data.frame(vy = c(10,5,3), vx = c(20,10,3), bio1 = c('a','b','c'))
for(i in 1:nrow(df1)){
df1$temperature[i] <- df2$bio1[which(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i]) ==
min(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i])))]
}
So, this code checks all the combinations and choses the one with the smallest distance between latitude and longitude at each row.
I checked and it seems to work, but it's very slow to use on large dataframes.
Can you solve this issue with a faster method?
Something like this might work
library(tidyverse)
library(sf)
# put some id's in df1
df1$id <- LETTERS[1:3]
# make df1 and df2 simple objects
sf1 <- df1 %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
sf2 <- df2 %>%
st_as_sf(coords = c("vy", "vx"), crs = 4326)
# find nearest sf2 in sf1
sf1 %>%
mutate(nearest_bio = sf2$bio1[st_nearest_feature(sf2)])
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 4 ymin: 2 xmax: 18 ymax: 10.5
# Geodetic CRS: WGS 84
# id geometry nearest_bio
# 1 A POINT (18 10.5) b
# 2 B POINT (9 6) c
# 3 C POINT (4 2) b

How to efficiently create Linestrings from points?

I have geom POINTs in two separate data frames. What I want to do is to connect points with a line (later on a map) so that's why I want to create Linestring for each pair of points from those data frames. I made it like this:
coordsCust <- table %>%
st_as_sf(coords = c("lonCust","latCust"), crs = 4326)
coordsApp <- table %>%
st_as_sf(coords = c("lonApp","latApp"), crs = 4326) %>%
st_geometry()
and Linestring:
lines <- st_sfc(mapply(function(a,b){
st_cast(st_union(a,b),"LINESTRING")},
coordsCust$geometry, coordsApp$geometry, SIMPLIFY=FALSE))
This code works, I can create Linestrings for each pair of points, row by row:
LINESTRING (14.035 51.65182, 14.33418 53.53346)
LINESTRING (20.42767 49.98073, 16.62978 52.31037)
LINESTRING (20.18762 50.03337, 16.62978 52.31037)
LINESTRING (19.04625 49.79234, 16.62978 52.31037)
LINESTRING (21.35808 50.92382, 16.62978 52.31037)
The issue is that for 30 000 rows this solution works really slow - about 21 seconds. Is there any other way to create linestrings from points? Something that works much faster? I searched for some solutions on the web but in vain. I've read something about converting sf to matrix and using pmap but have no idea how to implement it here.
UPDATE: if I want to use sfheaders::sf_linestring function I need to join geometries from both datasets. I do it like this:
df <- cbind(coordsCust,coordsApp)
and the final data frame (I showed most important part of it) is shown below:
Unfortunately sf_linestring doesn't work properly on this dataframe. I need to create linestring between POINTs for each row separately as shown on the screen.
Without an exmaple data set it's hard to completly answer your question. But if you can get your data.frame into 'long' form, then sfheaders can do this in an instant
n <- 30000
df <- data.frame(
x = rnorm(n)
, y = rnorm(n)
)
df$id <- rep(1:(n/2), each = 2)
sfheaders::sf_linestring(
obj = df
, x = "x"
, y = "y"
, linestring_id = "id"
)
# Simple feature collection with 15000 features and 1 field
# geometry type: LINESTRING
# dimension: XY
# bbox: xmin: -4.297631 ymin: -4.118291 xmax: 3.782847 ymax: 4.053399
# CRS: NA
# First 10 features:
# id geometry
# 1 1 LINESTRING (0.2780517 0.243...
# 2 2 LINESTRING (0.4261505 2.503...
# 3 3 LINESTRING (0.8662821 -0.11...
# 4 4 LINESTRING (-0.5335952 -0.1...
# 5 5 LINESTRING (1.154309 -1.352...
# 6 6 LINESTRING (0.05512324 -0.4...
# 7 7 LINESTRING (1.945868 -0.744...
# 8 8 LINESTRING (0.0427066 -0.08...
# 9 9 LINESTRING (0.06738045 0.41...
# 10 10 LINESTRING (0.4128964 -0.04...

Create numerous lines in Simple Features from list of coordinates in R

I am trying to detect whether pairs of objects (trees) are separated by roads or lie on the same side of them. I have downloaded my road network and think I more or less understand how to use st_intersects. So all I am missing are line segments between the pairs of trees I am considering in order to test intersections with the roads..
However, I cannot seem to figure out how to create lines between my objects. I have a large number of pairs (300K+), so must be able to do this programmatically, whereas all the examples I am finding seem to be "hand coded".
Suppose the following two matrices, containing the coordinates of the "origin" and "destination" of each pair.
orig = matrix(runif(20),ncol=2)
dest = matrix(runif(20),ncol=2)
In this example, I need to create 10 lines: one between orig[1,] and dest[1,], another (distinct) one between orig[2,] and dest[2,], etc. My understanding is that I should be using st_multilinestring, but I cannot figure out how to formulate the call. Typically, I either end up with "XYZM" points, or with a multi-segment line starting at orig[1,] and terminating at dest[10,] after going through all other coordinates. And when it is not one of these outcomes, it is a whole host of errors.
Is st_multilinestring what I should be using and if so, how does one do this? Thanks!!
Here's a way to construct the sfc / sf object using library(sfheaders)
library(sf)
library(sfheaders)
## If you add a pseudo-id column
orig <- cbind( orig, 1:nrow( orig ) )
dest <- cbind( dest, 1:nrow( dest ) )
## you can bind these matrices together
m <- rbind( orig, dest )
## set the order by the 'id' column
m <- m[ order( m[,3] ), ]
## then use `sfheaders` to create your linestrings
sfc <- sfheaders::sfc_linestring(
obj = m
, linestring_id = 3 ## 3rd column
)
sfc
# Geometry set for 10 features
# geometry type: LINESTRING
# dimension: XY
# bbox: xmin: 0.01952919 ymin: 0.04603703 xmax: 0.9172785 ymax: 0.9516615
# epsg (SRID): NA
# proj4string: NA
# First 5 geometries:
# LINESTRING (0.7636528 0.2465392, 0.05899529 0.7...
# LINESTRING (0.6435893 0.9158161, 0.01952919 0.1...
# LINESTRING (0.05632407 0.3106372, 0.03306822 0....
# LINESTRING (0.1978259 0.07432209, 0.2907429 0.0...
# LINESTRING (0.1658199 0.6436758, 0.1407145 0.75...
Loop over rows of your origin and destination matrices using lapply and create a vector of LINESTRING objects:
> lines = do.call(st_sfc,
lapply(
1:nrow(orig),
function(i){
st_linestring(
matrix(
c(orig[i,],dest[i,]), ncol=2,byrow=TRUE)
)
}
)
)
This gives you this:
> lines
Geometry set for 10 features
geometry type: LINESTRING
dimension: XY
bbox: xmin: 0.06157865 ymin: 0.007712881 xmax: 0.967166 ymax: 0.9864812
epsg (SRID): NA
proj4string: NA
First 5 geometries:
LINESTRING (0.6646269 0.1545195, 0.8333102 0.40...
LINESTRING (0.5588124 0.5166538, 0.3213998 0.08...
LINESTRING (0.06157865 0.6138778, 0.06212246 0....
LINESTRING (0.202455 0.4883115, 0.5569435 0.986...
LINESTRING (0.3120373 0.8189916, 0.8499419 0.73...
Let's check we got all that the right way round. Where's the fourth line come from and going to?
> orig[4,]
[1] 0.2024550 0.4883115
> dest[4,]
[1] 0.5569435 0.9864812
which looks like the coordinates in the fourth LINESTRING output.
You can then st_intersects this with another set of features and see which of these cross them.
(You might also need to add the coordinate system to them...)

identify zip codes that fall within latitude and longitudinal coordinates

I have several data frames in R. The first data frame contains the computed convex hull of a set of lat and long coordinates by market (courtesy of chull in R). It looks like this:
MyGeo<- "Part of Chicago & Wisconsin"
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
dat <- data.frame(Longitude, Latitude, MyGeo)
The second has zip codes by their latitude and longitudinal coordinates (courtesy of the US census website). It looks like this:
CensuseZip <- c("SomeZipCode1","SomeZipCode2","SomeZipCode3","SomeZipCode4","SomeZipCode5","SomeZipCode6","SomeZipCode7")
Longitude2 <- c(-131.470425,-133.457924,-131.693453,-87.64957,-87.99734,-87.895,-88.0228)
Latitude2 <- c(55.138352,56.239062,56.370538,41.87485,42.0086,42.04957,41.81055)
cen <- data.frame(Longitude2, Latitude2, CensuseZip)
Now I believe the first data table provides me with a polygon, or a border, that I should be able to use to identify zip codes that fall within that border. Ideally, I would want to create a third data table that looks something like this:
Longitude2 Latitude2 CensusZip MyGeo
-131.470425 55.138352 SomeZipCode1
-133.457924 56.239062 SomeZipCode2
-131.693453 56.370538 SomeZipCode3
-87.64957 41.87485 SomeZipCode4 Part of Chicago & Wisconsin
-87.99734 42.0086 SomeZipCode5 Part of Chicago & Wisconsin
-87.895 42.04957 SomeZipCode6 Part of Chicago & Wisconsin
-88.0228 41.81055 SomeZipCode7 Part of Chicago & Wisconsin
In essence, I am looking to identify all the zip codes that fall between the blue (see clickable image below) long and lat points. While it is visualized below, I am actually looking for the table described above.
However... I am having trouble doing this... I have tried using the below packages and script:
library(rgeos)
library(sp)
library(rgdal)
coordinates(dat) <- ~ Longitude + Latitude
coordinates(cen) <- ~ Longitude2 + Latitude2
over(cen, dat)
but I receive all NAs.
I use library(sf) to solve this type of point-in-polygon problem (sf is the successor to sp).
The function sf::st_intersection() gives you the intersection of two sf objects. In your case you can construct separate POLYGON and POINT sf objects.
library(sf)
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
## closing the polygon
Longitude[length(Longitude) + 1] <- Longitude[1]
Latitude[length(Latitude) + 1] <- Latitude[1]
## construct sf POLYGON
sf_poly <- sf::st_sf( geometry = sf::st_sfc( sf::st_polygon( x = list(matrix(c(Longitude, Latitude), ncol = 2)))) )
## construct sf POINT
sf_points <- sf::st_as_sf( cen, coords = c("Longitude2", "Latitude2"))
sf::st_intersection(sf_points, sf_poly)
# Simple feature collection with 4 features and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -88.0228 ymin: 41.81055 xmax: -87.64957 ymax: 42.04957
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry
# 4 SomeZipCode4 POINT (-87.64957 41.87485)
# 5 SomeZipCode5 POINT (-87.99734 42.0086)
# 6 SomeZipCode6 POINT (-87.895 42.04957)
# 7 SomeZipCode7 POINT (-88.0228 41.81055)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
The result is all the points which are inside the polygon
You can also use sf::st_join(sf_poly, sf_points) to give the same result
And, the function sf::st_intersects(sf_points, sf_poly) will return a list saying whether the given POINT is inside the polygon
sf::st_intersects(sf_points, sf_poly)
# Sparse geometry binary predicate list of length 7, where the predicate was `intersects'
# 1: (empty)
# 2: (empty)
# 3: (empty)
# 4: 1
# 5: 1
# 6: 1
# 7: 1
Which you can use as an index / identifier of the original sf_points object to add a new column on
is_in <- sf::st_intersects(sf_points, sf_poly)
sf_points$inside_polygon <- as.logical(is_in)
sf_points
# Simple feature collection with 7 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -133.4579 ymin: 41.81055 xmax: -87.64957 ymax: 56.37054
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry inside_polygon
# 1 SomeZipCode1 POINT (-131.4704 55.13835) NA
# 2 SomeZipCode2 POINT (-133.4579 56.23906) NA
# 3 SomeZipCode3 POINT (-131.6935 56.37054) NA
# 4 SomeZipCode4 POINT (-87.64957 41.87485) TRUE
# 5 SomeZipCode5 POINT (-87.99734 42.0086) TRUE
# 6 SomeZipCode6 POINT (-87.895 42.04957) TRUE
# 7 SomeZipCode7 POINT (-88.0228 41.81055) TRUE

How to create (x,y) coordinate in R

So I have 2 random variables X and Y
x <- runif(1000,min=0,max=10)
lambda=2*x+0.2*x*sin(x)
y <- rpois(1000,lambda)
And I want to create a vector J=(xi,yi) for i=1,...,1000
I'm not sure how to do this in the most efficient way.
Thanks !
so you have x and y already,
then you put them into a data frame, and using library sf (simple features) you may turn it into an spatial object, here it will come out with no projection, since it's an arbitrary set of data and not something geographical, otherwise you should ad st_set_crs() to the code below:
library(sf)
x <- runif(1000,min=0,max=10)
lambda=2*x+0.2*x*sin(x)
y <- rpois(1000,lam
bda)
df <- data.frame(x=x, y=y, z=runif(1000)) %>% st_as_sf(coords=c("x", "y"))
> df
Simple feature collection with 1000 features and 0 fields
geometry type: POINT
dimension: XY
bbox: xmin: 0.005045172 ymin: 0 xmax: 9.994533 ymax: 30
epsg (SRID): NA
proj4string: NA
First 10 features:
geometry
1 POINT (8.375505 20)
2 POINT (0.08116931 0)
3 POINT (3.786693 5)
4 POINT (7.68517 17)
5 POINT (9.363003 25)
6 POINT (5.114014 9)
7 POINT (5.70659 12)
8 POINT (9.936392 22)
9 POINT (9.164108 15)
10 POINT (7.524004 19)
plot(df)

Resources