R Matching coordinate points into geopolitical areas - polygon

I'm trying to find the correspondence between coordinate points and municipalities in a given state. This is, for every coordinate point, the municipality in which it is contained. On one hand, I have the shapefile of the state and its municipalities, on the other, a data frame with coordinate points corresponding to certain named locations.
This is the shapefile zip and the shp file is 22mun.shp
This is an example data frame of locations:
locations <-
tibble(name = c("A", "B"),
Latitude = c(20.598670161163884, 20.741478412120905),
Longitude = c(-100.3565447475217, -99.94098729516392))
In this example, I should obtain that point A is in the municipality named "Queretaro" and that point B is in the municipality "Ezequiel Montes".
How can I find such correspondence on a larger scale where I might have multiple points within a given municipality? Any insights on how to do this efficiently or guidance would be very appreciated.
Thank you

Related

Identifying points located near to a polygons boundary

I am trying to identify all points (postcodes in my case) that are located near to the coastline of the UK (i.e., a polygon). I am using R to process this.
I downloaded the geographical outline of United Kingdom from here as a shapefile. A list of all postcodes for the UK were accessed from the ONS here. Please note that the latter file is very large (211MB zipped).
To begin, I loaded in both files into R, and then convert them to the same coordinate reference system (OSGB1936; 27700). For the polygon of the UK, I convert this to lines that represent the boundary/coastline (note that while Northern Ireland shares a common boundary with Ireland, I will subset any postcodes erroneously matched as near the coastline by lat/long later). I then convert the points into spatial points.
# Load libraries
library(sf)
library(data.table)
# Load data
uk_shp <- read_sf("./GBR_adm/GBR_adm0.shp") # Load UK shapefile (ignore the download file says GBR, it is UK)
uk_shp <- st_transform(uk_shp, crs = 27700) # Convert to co-ordinate reference system (CRS) that allow buffers in correct units later (note: 4326 is World CRS)
uk_coast <- st_cast(uk_shp,"MULTILINESTRING") # Convert polygon to a line (i.e., coastline)
# Load in postcodes
pcd <- fread("./ONSPD_FEB_2022_UK/Data/ONSPD_FEB_2022_UK.csv") # Load all postcodes for Great Britain - this is a very large file so I also create a single
pcd <- pcd[, c(1:3, 43:44)] # Drop unnecessary information/columns to save memory
# Convert to spatial points data frame
pcd_sp <- pcd %>% # For object of postcodes
st_as_sf(coords = c("long", "lat")) %>% # Define as spatial object and identify which columns tell us the position of points
st_set_crs(27700) # Set CRS
I originally thought the most efficient approach to take would be to define what a coastal region is (here defined as within 5km of the coastline), create a buffer to represent that around the coastline, and then use a point-in-polygon function to select all points within the buffers. However, the code below had not finished running overnight which probably suggests that it was the incorrect approach and I an unsure why it is taking so long.
uk_coast <- st_buffer(uk_coast, 5000) # Create 5km buffer
pcd_coastal <- st_intersection(uk_buf, pcd_sp) # Point-in-polygon (i.e., keep only the postcodes that are located in the buffer region)
So I changed my approach to calculate the straight-line distance of each point to the nearest coastline. In running the code below, it gives incorrect distances. For example below, I select one postcode (AB12 4XP) which is located ~2.6km from the coastline, however the code below gives ~82km which is very wrong. I had tried st_nearest_feature() but could not get it to work (it may do, but was beyond my attempts).
test <- pcd_sp[pcd_sp$pcd == "AB124XP",] # Subset test postcode
dist <- st_distance(test, uk_coast, by_element = TRUE, which = "Euclidean") # Calculate distance
I am unsure how to proceed from here - I don't think it is the wrong CRS. It might be that the multilinestring conversion is causing problems. Does anyone have suggestions what to do?
sf has an st_is_within_distance function that can test if points are within a distance of a line. My test data is 10,000 random points in the bounding box of the UK shape, and the UK shape in OSGB grid coordinates.
> system.time({indist = st_is_within_distance(uk_coast, pts, dist=5000)})
user system elapsed
30.907 0.003 30.928
But this isn't building a spatial index. The docs say that it does build a spatial index if the coordinates are "geographic" and the flag for using spherical geometry is set. I don't understand why it can't build one for cartesian coordinates, but lets see how much faster it is...
Transform takes no time at all:
> ukLL = st_transform(uk_coast, 4326)
> ptsLL = st_transform(pts, 4326)
Then test...
system.time({indistLL = st_is_within_distance(ukLL, ptsLL, dist=5000)})
user system elapsed
1.405 0.000 1.404
Just over a second. Any difference between the two? Let's see:
> setdiff(indistLL[[1]], indist[[1]])
[1] 3123
> setdiff(indist[[1]], indistLL[[1]])
integer(0)
So point 3123 is in the set using lat-long, but not the set using OSGB. There's nothing in OSGB that isn't in the lat-long set.
Quick plot to show the selected points:
> plot(uk_coast$geometry)
> plot(pts$geometry[indistLL[[1]]], add=TRUE)

Trying to convert Numerical Values of Lat/Long into Spatial Data

I am working with a dataset that features chemical analyses from different locations within a cave, with each analysis ordered by a site number and that sites latitude and longitude. This first image is what I had done originally simply using ggplot.
Concentrations mapped by color over map
But what I want to do is use the shapefile of the cave system from which the data is sourced from and do something similar by plotting the points over the system and then coloring them by concentration. This below is the shapefile that I uploaded
Cave system shapefile
So basically I want to be able to map the chemical data from my dataset used to map the first figure, but on the map of the shapefile. Initially it kept on saying that it could not plot on top of it. So I figured I had to convert the latitude and longitude into spatial coordinates that could then be mapped on the shapefile.
Master_Cave_data <- Master_cave_data %>%
st_as_sf(MastMaster_cave_data, agr = "identity", coord = Lat_DD)
This was what I had thought to use in order to convert the numerical Latitude cooridnates into spatial data.

County average from latitude and longitude

I have a big data frame (832k rows) with latitude and longitude in a gridded format plus one variable. I would like to plot the average of this variable per county. The problem is that I do not have the identification of county or state by point, only the coordinates.
Sorry, I am not sure how to include a replicable example
Two approaches:
1) Calculate average of all the lat/lon grids. This approach skews your county centre towards higher density grids
2) Calculate bounds[min-max lat/lon] of grids and average the bounds. This approach places the county centre in exactly centre of the grid span.
You will need to obtain the county (or state) data and then spatially join it with your dataframe. One possible source for such data is the TIGER shapefile published by the U.S. Census (see e.g. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-nation-u-s-current-county-and-equivalent-national-shapefile).
You can then use the sf package to read the shapefile into R, join it with your data, and then use regular summary functions to summarize your data by county.
library(sf)
filename <- 'https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/tl_2016_us_county.zip'
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file(filename,tmpfile)
unzip(zipfile = tmpfile, exdir = tmpdir)
county_data <- st_read(paste0(tmpdir, '/tl_2016_us_county.shp'))
unlink(tmpfile)
unlink(tmpdir)

Calculate minimum distance between multiple polygons with R

I'm still somewhat new to R and the sf package...
I have two sets of multipolygon data that I am trying to analyze. My first set of polygons (fires) contains hundreds of wildfire perimeters. The second set (towns) contains hundreds of urban areas boundaries.
For each fire, I would like to calculate the distance to the closest town (fire polygon edge to closest town polygon edge), and add that as a field to each fire.
So far I have mostly been using the sf package for spatial data. In my searches, I can only find minimum distance methods for polygons to points, points to points, lines to points, etc. but cannot seem to find polygon to polygon examples. Any help to send me in the right direction would be much appreciated! Thank you.
#TimSalabim Thank you for sending me in the right direction. I was able to accomplish what I was after. Maybe not the most elegant solution, but it worked.
# create an index of the nearest feature
index <- st_nearest_feature(x = poly1, y = poly2)
# slice based on the index
poly2 <- poly2 %>% slice(index)
# calculate distance between polygons
poly_dist <- st_distance(x = poly1, y= poly2, by_element = TRUE)
# add the distance calculations to the fire polygons
poly1$distance <- poly_dist

Assigning covariate associated to spatial points to a bigger set of spatial points in R?

I have two data sets with spatial points (in .csv format): data1 with 220 spatial points with latitude and longitude and data2 with 80 spatial points with latitude and longitude. For data2 I have one covariate indicated the genetic origin of each points. Spatial points in both datasets are not exactly the same.
I would like to assign the genetic origin for spatial points in data1. It seems that I need to define around each point in data2 a square (or other) to be able to associate a genetic origin at each points in data1.
I am using R and I think packages as raster or sp may be useful.
Thanks for your help.
Best,
Marie.
You need to make your mind up about how you want to assign "genetic origin". One approach that seem to be hinting at is assigning it to its nearest neighbor.
When asking a question you should always include some example data.
library(raster)
d1 <- data.frame(lon=c(1,5,55,31), lat=c(3,7,20,22))
d2 <- data.frame(lon=c(4,2,8,65,5,4), lat=c(50,-90,20,32,10,10), origin=LETTERS[1:6], stringsAsFactors=FALSE)
Here is how you can assign origin based on the nearest known origin
# make sure your data are (x,y) or (longitude,latitude), not the reverse
pd <- pointDistance(d1, d2[,1:2], lonlat=TRUE)
nd <- apply(pd, 1, which.min)
d1$origin <- d2$origin[nd]

Resources