How to calculate geo-distance from a polygon? - r

I have a shapefile with 50+ different polygonal shapes (representing 50+ different regions) and 10,000+ data points that are supposed to be present in one of the regions. The thing is, the 10,000+ points are already coded with a region they are supposed to be in, and I want to figure out how far they are from this coded region in geo-spatial distance.
My current approach (code below), which involves converting shapefiles to owin objects from the sp library and using distfun gets me distances in lat,long euclidean space. But I would like to get geo-spatial distances (eventually to convert to km). Where should I go next?
#basically cribbed from http://cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf (page 9)
shp <- readShapeSpatial("myShapeFile.shp", proj4string=CRS("+proj=longlat +datum=WGS84"))
regions <- lapply(slot(shp, "polygons"), function(x) SpatialPolygons(list(x)))
windows <- lapply(regions, as.owin)
# need to convert this to geo distance
distance_from_region <- function(regionData, regionName) {
w <- windows[[regionName]]
regionData$dists <- distfun(w)(regionData$lat, regionData$long)
regionData
}

I'd project the data to a euclidean (or near euclidean) coordinate system - unless you are spanning a large chunk of the globe then this is feasible. Use spTransform from maptools or sp or rgdal (I forget which) and convert to a UTM zone near your data.
You also might do better with package rgeos and the gDistance function:
gDistance by default returns the cartesian minimum distance
between the two geometries in the units of the current projection.
If your data is over a large chunk of globe then... tricky... 42...
Barry

Related

Identifying points located near to a polygons boundary

I am trying to identify all points (postcodes in my case) that are located near to the coastline of the UK (i.e., a polygon). I am using R to process this.
I downloaded the geographical outline of United Kingdom from here as a shapefile. A list of all postcodes for the UK were accessed from the ONS here. Please note that the latter file is very large (211MB zipped).
To begin, I loaded in both files into R, and then convert them to the same coordinate reference system (OSGB1936; 27700). For the polygon of the UK, I convert this to lines that represent the boundary/coastline (note that while Northern Ireland shares a common boundary with Ireland, I will subset any postcodes erroneously matched as near the coastline by lat/long later). I then convert the points into spatial points.
# Load libraries
library(sf)
library(data.table)
# Load data
uk_shp <- read_sf("./GBR_adm/GBR_adm0.shp") # Load UK shapefile (ignore the download file says GBR, it is UK)
uk_shp <- st_transform(uk_shp, crs = 27700) # Convert to co-ordinate reference system (CRS) that allow buffers in correct units later (note: 4326 is World CRS)
uk_coast <- st_cast(uk_shp,"MULTILINESTRING") # Convert polygon to a line (i.e., coastline)
# Load in postcodes
pcd <- fread("./ONSPD_FEB_2022_UK/Data/ONSPD_FEB_2022_UK.csv") # Load all postcodes for Great Britain - this is a very large file so I also create a single
pcd <- pcd[, c(1:3, 43:44)] # Drop unnecessary information/columns to save memory
# Convert to spatial points data frame
pcd_sp <- pcd %>% # For object of postcodes
st_as_sf(coords = c("long", "lat")) %>% # Define as spatial object and identify which columns tell us the position of points
st_set_crs(27700) # Set CRS
I originally thought the most efficient approach to take would be to define what a coastal region is (here defined as within 5km of the coastline), create a buffer to represent that around the coastline, and then use a point-in-polygon function to select all points within the buffers. However, the code below had not finished running overnight which probably suggests that it was the incorrect approach and I an unsure why it is taking so long.
uk_coast <- st_buffer(uk_coast, 5000) # Create 5km buffer
pcd_coastal <- st_intersection(uk_buf, pcd_sp) # Point-in-polygon (i.e., keep only the postcodes that are located in the buffer region)
So I changed my approach to calculate the straight-line distance of each point to the nearest coastline. In running the code below, it gives incorrect distances. For example below, I select one postcode (AB12 4XP) which is located ~2.6km from the coastline, however the code below gives ~82km which is very wrong. I had tried st_nearest_feature() but could not get it to work (it may do, but was beyond my attempts).
test <- pcd_sp[pcd_sp$pcd == "AB124XP",] # Subset test postcode
dist <- st_distance(test, uk_coast, by_element = TRUE, which = "Euclidean") # Calculate distance
I am unsure how to proceed from here - I don't think it is the wrong CRS. It might be that the multilinestring conversion is causing problems. Does anyone have suggestions what to do?
sf has an st_is_within_distance function that can test if points are within a distance of a line. My test data is 10,000 random points in the bounding box of the UK shape, and the UK shape in OSGB grid coordinates.
> system.time({indist = st_is_within_distance(uk_coast, pts, dist=5000)})
user system elapsed
30.907 0.003 30.928
But this isn't building a spatial index. The docs say that it does build a spatial index if the coordinates are "geographic" and the flag for using spherical geometry is set. I don't understand why it can't build one for cartesian coordinates, but lets see how much faster it is...
Transform takes no time at all:
> ukLL = st_transform(uk_coast, 4326)
> ptsLL = st_transform(pts, 4326)
Then test...
system.time({indistLL = st_is_within_distance(ukLL, ptsLL, dist=5000)})
user system elapsed
1.405 0.000 1.404
Just over a second. Any difference between the two? Let's see:
> setdiff(indistLL[[1]], indist[[1]])
[1] 3123
> setdiff(indist[[1]], indistLL[[1]])
integer(0)
So point 3123 is in the set using lat-long, but not the set using OSGB. There's nothing in OSGB that isn't in the lat-long set.
Quick plot to show the selected points:
> plot(uk_coast$geometry)
> plot(pts$geometry[indistLL[[1]]], add=TRUE)

Create buffers in km units for global point dataset in R?

I'm new to using spatial data so this probably seems like a very simple question, but something I'm struggling to get my head around.
I have a global dataset of sample sites and corresponding coordinates. I am using st_buffer in the sf package to create buffers of different sizes around these points. However, I need these buffers to be in km, for example 2,10,50 km radius, rather than in the units of the CRS projection (currently in long/lat, WGS84). As I understand it, UTM is the only planar projection, but does this mean I have to split my global dataset into each of the UTM zones before converting to UTM, and then create the buffers for each of these separately?
Is it possible to then convert the buffers back to my previous CRS projection?
Thank you!
It is not necessary to step out of the comfort of WGS84 to do a metric buffer; most of the tools are ready to combine longlat CRS with a metric definition of distance (it is a very common use case).
When buffering in WGS84 I kind of prefer terra::buffer() to sf::st_buffer() as it is likely to produce a smoother shape - the S2 functions that work under the hood of unprojected {sf} do not work smooth enough for me and the outcome is somewhat grainy. But I digress...
Consider this piece of code, what it does is:
looks up the coordinates of a semi random landmark (Greenwich Observatory in London)
buffers it by 25 kilometers
displays the result
Note how the terra documentation states that the buffer is in meters for unprojected coordinates.
library(nominatimlite)
library(sf)
library(terra)
a_point <- geo_lite_sf("Royal Observatory, Greenwich")
st_crs(a_point) # WGS84
terra_buffer <- a_point |>
vect() |>
buffer(width = 25000) %>% # 25 kilometers
st_as_sf()
mapview::mapview(terra_buffer)

Create Grid in R for kriging in gstat

lat long
7.16 124.21
8.6 123.35
8.43 124.28
8.15 125.08
Consider these coordinates, these coordinates correspond to weather stations that measure rainfall data.
The intro to the gstat package in R uses the meuse dataset. At some point in this tutorial: https://rpubs.com/nabilabd/118172, the guys makes use of a "meuse.grid" in this line of code:
data("meuse.grid")
I do not have such a file and I do not know how to create it, can I create one using these coordinates? Or at least point me to material that discusses how to create a custom grid for a custom area (i.e not using administrative boundaries from GADM).
Probably wording this wrong, don't even know if this question makes sense to R savvy people. Still, would love to hear some direction, or at least tips. Thanks a lot!
Total noob at R and statistics.
EDIT: See the sample grid that the tutorial I posted looks like, that's the thing I want to make.
EDIT 2: Would this method be viable? https://rstudio-pubs-static.s3.amazonaws.com/46259_d328295794034414944deea60552a942.html
I am going to share my approach to create a grid for kriging. There are probably more efficient or elegant ways to achieve the same task, but I hope this will be a start to facilitate some discussions.
The original poster was thinking about 1 km for every 10 pixels, but that is probably too much. I am going to create a grid with cell size equals to 1 km * 1 km. In addition, the original poster did not specify an origin of the grid, so I will spend some time determining a good starting point. I also assume that the Spherical Mercator projection coordinate system is the appropriate choice for the projection. This is a common projection for Google Map or Open Street Maps.
1. Load Packages
I am going to use the following packages. sp, rgdal, and raster are packages provide many useful functions for spatial analysis. leaflet and mapview are packages for quick exploratory visualization of spatial data.
# Load packages
library(sp)
library(rgdal)
library(raster)
library(leaflet)
library(mapview)
2. Exploratory Visualization of the station locations
I created an interactive map to inspect the location of the four stations. Because the original poster provided the latitude and longitude of these four stations, I can create a SpatialPointsDataFrame with Latitude/Longitude projection. Notice the EPSG code for Latitude/Longitude projection is 4326. To learn more about EPSG code, please see this tutorial (https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/OverviewCoordinateReferenceSystems.pdf).
# Create a data frame showing the **Latitude/Longitude**
station <- data.frame(lat = c(7.16, 8.6, 8.43, 8.15),
long = c(124.21, 123.35, 124.28, 125.08),
station = 1:4)
# Convert to SpatialPointsDataFrame
coordinates(station) <- ~long + lat
# Set the projection. They were latitude and longitude, so use WGS84 long-lat projection
proj4string(station) <- CRS("+init=epsg:4326")
# View the station location using the mapview function
mapview(station)
The mapview function will create an interactive map. We can use this map to determine what could be a suitable for the origin of the grid.
3. Determine the origin
After inspecting the map, I decided that the origin could be around longitude 123 and latitude 7. This origin will be on the lower left of the grid. Now I need to find the coordinate representing the same point under Spherical Mercator projection.
# Set the origin
ori <- SpatialPoints(cbind(123, 7), proj4string = CRS("+init=epsg:4326"))
# Convert the projection of ori
# Use EPSG: 3857 (Spherical Mercator)
ori_t <- spTransform(ori, CRSobj = CRS("+init=epsg:3857"))
I first created a SpatialPoints object based on the latitude and longitude of the origin. After that I used the spTransform to perform project transformation. The object ori_t now is the origin with Spherical Mercator projection. Notice that the EPSG code for Spherical Mercator is 3857.
To see the value of coordinates, we can use the coordinates function as follows.
coordinates(ori_t)
coords.x1 coords.x2
[1,] 13692297 781182.2
4. Determine the extent of the grid
Now I need to decide the extent of the grid that can cover all the four points and the desired area for kriging, which depends on the cell size and the number of cells. The following code sets up the extent based on the information. I have decided that the cell size is 1 km * 1 km, but I need to experiment on what would be a good cell number for both x- and y-direction.
# The origin has been rounded to the nearest 100
x_ori <- round(coordinates(ori_t)[1, 1]/100) * 100
y_ori <- round(coordinates(ori_t)[1, 2]/100) * 100
# Define how many cells for x and y axis
x_cell <- 250
y_cell <- 200
# Define the resolution to be 1000 meters
cell_size <- 1000
# Create the extent
ext <- extent(x_ori, x_ori + (x_cell * cell_size), y_ori, y_ori + (y_cell * cell_size))
Based on the extent I created, I can create a raster layer with number all equal to 0. Then I can use the mapview function again to see if the raster and the four stations matches well.
# Initialize a raster layer
ras <- raster(ext)
# Set the resolution to be
res(ras) <- c(cell_size, cell_size)
ras[] <- 0
# Project the raster
projection(ras) <- CRS("+init=epsg:3857")
# Create interactive map
mapview(station) + mapview(ras)
I repeated this process several times. Finally I decided that the number of cells is 250 and 200 for x- and y-direction, respectively.
5. Create spatial grid
Now I have created a raster layer with proper extent. I can first save this raster as a GeoTiff for future use.
# Save the raster layer
writeRaster(ras, filename = "ras.tif", format="GTiff")
Finally, to use the kriging functions from the package gstat, I need to convert the raster to SpatialPixels.
# Convert to spatial pixel
st_grid <- rasterToPoints(ras, spatial = TRUE)
gridded(st_grid) <- TRUE
st_grid <- as(st_grid, "SpatialPixels")
The st_grid is a SpatialPixels that can be used in kriging.
This is an iterative process to determine a suitable grid. Throughout the process, users can change the projection, origin, cell size, or cell number depends on the needs of their analysis.
#yzw and #Edzer bring up good points for creating a regular rectangular grid, but sometimes, there is the need to create an irregular grid over a defined polygon, usually for kriging.
This is a sparsely documented topic. One good answer can be found here. I expand on it with code below:
Consider the the built in meuse dataset. meuse.grid is an irregularly shaped grid. How do we make an grid like meuse.grid for our unique study area?
library(sp)
data(meuse.grid)
ggplot(data = meuse.grid) + geom_point(aes(x, y))
Imagine an irregularly shaped SpatialPolygon or SpatialPolygonsDataFrame, called spdf. You first build a regular rectangular grid over it, then subset the points in that regular grid by the irregularly-shaped polygon.
# First, make a rectangular grid over your `SpatialPolygonsDataFrame`
grd <- makegrid(spdf, n = 100)
colnames(grd) <- c("x", "y")
# Next, convert the grid to `SpatialPoints` and subset these points by the polygon.
grd_pts <- SpatialPoints(
coords = grd,
proj4string = CRS(proj4string(spdf))
)
# subset all points in `grd_pts` that fall within `spdf`
grd_pts_in <- grd_pts[spdf, ]
# Then, visualize your clipped grid which can be used for kriging
ggplot(as.data.frame(coordinates(grd_pts_in))) +
geom_point(aes(x, y))
If you have your study area as a polygon, imported as a SpatialPolygons, you could either use package raster to rasterize it, or use sp::spsample to sample it using sampling type regular.
If you don't have such a polygon, you can create points regularly spread over a rectangular long/lat area using expand.grid, using seq to generate a sequence of long and lat values.

Creating a buffer around a geographic point and then checking whether a list of coordinates is inside that buffer?

I have the following puzzle regarding spatial data in R:
I have a dataset with street segments (with their respective starting and ending coordinates). I want to create a buffer of X meters around these points and then check whether a list of lat/lon points is within that buffer. Is there a way to do this in R?
I'm able to map the points and map the buffers using a combination of various packages: maptools, ggmap, rgdal, sp, and rgeos. But this procedure seems to only map the points and the buffers without allowing me to then check if the other coordinates I have are within the buffers. Ideally, I'd like to produce a vector of 1s and 0s describing whether the list of lat/lon points are within the buffer around the street segments.
Any ideas?
This is the code I've been using, but I get all missing values (and I know this shouldn't be the case). I've also tried using the gContains function from rgeos but it crashed R.
#Load shapefile in R and transform to appropriate CRS
shp <- readOGR(dsn="/Users/Maps/shapes", layer="shp")
shp_transf <- spTransform(shp_transf, CRS( "+init=epsg:21897" ))
#Create buffer around polygons
shp_buff <- gBuffer(shp_transf, width=40, byid=TRUE, quadsegs=10)
#Make my dataframe of lat/lon points into same projection as buffers
points <- SpatialPoints(points,proj4string=CRS(proj4string(shp_buff)))
#Use over function from SP pacakge
result <- as.integer(over(points, shp_buff)$OBJECTID)
Turns out, the over() function wasn't working before because the shapefile with the buffers and the list of GPS coordinates were in a different CRS. I fixed it this way:
shp <- spTransform(shp, CRS("+proj=longlat +datum=WGS84"))
proj4string(points) <- CRS("+proj=longlat +datum=WGS84")
Then, you run the over() function and it'll give you the number of times each point is in each buffer area:
x <- over(points, shp)
table(x$ID)
181 304
118 8
So a particular GPS pulse came from segment ID 181 a total of 118 times and from segment ID 304 a total of 8 times.

Assigning spatial coordinates to an array in R

I have downloaded a text file of data from the following link: http://radon.unibas.ch/files/us_rn_50km.zip
After unzipping I use the following lines of code to plot up the data:
# load libraries
library(fields)
# function to rotate a matrix (and transpose)
rotate <- function(x) t(apply(x, 2, rev))
# read data
data <- as.matrix(read.table("~/Downloads/us_rn_50km.txt", skip=6))
data[data<=0] <- NA
# rotate data
data <- rotate(data)
# plot data
mean.rn <- mean(data, na.rm=T)
image.plot(data, main=paste("Mean Rn emissions =", sprintf("%.3f", mean.rn)) )
This all looks OK, but I want to be able to plot the data on a lat-long grid. I think I need to convert this array into an sp class object but I don't know how. I know the following (from the web site): "The projection used to project the latitude and longitude coordinates is that used for the Decade of North American Geology (DNAG) maps. The projection type is Spherical Transverse Mercator with a base latitude of zero degrees and a reference longitude of 100 degrees W. The scale factor used is 0.926 with no false easting or northing. The longitude-latitude datum is NAD27 and the units of the xy-coordinates are in meters. The ellipsoid used is Clarke 1866. The resolution of the map is 50x50km". But don't know what to do with this data. I tried:
proj4string(data)=CRS("+init=epsg:4267")
data.sp <- SpatialPoints(data, CRS("+proj=longlat+ellps=clrk66+datum=NAD27") )
But had various problems (with NA's) and fundamentally I think that the data isn't in the right format.. I think that the SpatialPoints function wants a data on location (in 2-D) and a third array of values associated with those locations (x,y,z data - I guess my problem is working out the x and the y's from my data!)
Any help greatly appreciated!
Thanks,
Alex
The file in question is an ASCII raster grid. Coordinates are implicit in this format; a header describes the position of the (usually) lower left corner, as well as the grid dimensions and resolution. After this header section, values separated by white space describe how the variable varies across the grid, with values given in row-major order. Open it in a text editor if you're interested.
You can import such files to R with the fantastic raster package, as follows:
download.file('http://radon.unibas.ch/files/us_rn_50km.zip',
destfile={f <- tempfile()})
unzip(f, exdir=tempdir())
r <- raster(file.path(tempdir(), 'us_rn_50km.txt'))
You can plot it immediately, without assigning the projection:
If you didn't want to transform it to another CRS, you wouldn't necessarily need to assign the current coordinate system. But since you do want to transform it to WGS84 (geographic), you need to first assign the CRS:
proj4string(r) <- '+proj=tmerc +lon_0=-100 +lat_0=0 +k_0=0.926 +datum=NAD27 +ellps=clrk66 +a=6378137 +b=6378137 +units=m +no_defs'
Unfortunately I'm not entirely sure whether this proj4string correctly reflects the info given at the website that provided the data (it would be great if they actually provided the definition in a standard format).
After assigning the CRS, you can project the raster with projectRaster:
r.wgs84 <- projectRaster(r, crs='+init=epsg:4326')
And if you want, write it out to a raster format of your choice, e.g.:
writeRaster(r.wgs84, filename='whatever.tif')

Resources