Apply point-in-polygon to multiple polygons with identifier - r

I have a dataframe with thousands of lat long along with other attributes
> head(df)
created_on latitude longitude day Value Order Total.Value
1 55:40.8 13.01504 80.19199 7/9/2022 2 74 19
2 08:27.0 12.97431 80.19029 7/9/2022 19 49 14
3 39:59.5 12.95778 80.19588 7/9/2022 26 205 50
4 20:43.0 13.07842 80.18144 7/8/2022 1 178 7
5 34:06.2 12.92485 80.09914 7/10/2022 7 106 12
6 08:12.1 12.88727 80.23384 7/10/2022 10 167 17
I have a few polygons in WKT format
head(Poly)
WKT name
1 POLYGON ((80.156 13.058, 80.156 13.040, 80.182 13.052, 80.156 13.058)) Polygon-1
2 POLYGON ((80.2015739 13.050, 80.223 13.033, 80.223 13.049, 80.201 13.050)) Polygon-2
3 POLYGON ((80.185 13.021, 80.200 13.009, 80.202 13.020, 80.185 13.021)) Polygon-3
I want to apply Point-in-Polygon to the df and identify the lat longs that lie in any of these polygons. The output should contain the original attributes of the lat longs as well as the name of the polygon in which they lie.
I have been able to do it for one polygon at a time but it becomes tedious when I have up to 10-20 polygon to run through. I used the rgeos package to translate the WKT into a dataframe and the sp package to apply point in polygon). Below is the code already used
Poly <- read.csv("PIP0.csv")
str <- Poly[2,1]
test <- readWKT(str)
#convert wkt to data.frame of coordinates
coords <- as.data.frame(coordinates(test#polygons[[1]]#Polygons[[1]]))
Points <- read.csv("PIP2.csv")
#logical vector to determine whether points in the df-Point are present in Poly
A <- as.logical(point.in.polygon(Points$latitude, Points$longitude, coords$y, coords$x, mode.checked = FALSE ))
Points$flag[A] <- "This point is present in Polygon"
Thanks,

Related

Calculating measure of spatial segregation?

There is five polygons for five different cities (see attached file in the link, it's called bound.shp). I also have a point file "points.csv" with longitude and latitude where for each point I know the proportion of people belonging to group m and group h.
I am trying to calculate the spatial segregation proposed by Reardon and O’Sullivan, “Measures of Spatial Segregation”
There is a package called "seg" which should allow us to do it. I am trying to do it but so far no success.
Here is the link to the example file: LINK. After downloading the "example". This is what I do:
setwd("~/example")
library(seg)
library(sf)
bound <- st_read("bound.shp")
points <- st_read("points.csv", options=c("X_POSSIBLE_NAMES=x","Y_POSSIBLE_NAMES=y"))
#I apply the following formula
seg::spseg(bound, points[ ,c(group_m, group_h)] , smoothing = "kernel", sigma = bandwidth)
Error: 'x' must be a numeric matrix with two columns
Can someone help me solve this issue? Or is there an alternate method which I can use?
Thanks a lot.
I don't know what exactly spseg function does but when evaluating the spseg function in the seg package documentation;
First argument x should be dataframe or object of class Spatial.
Second argument data should be matrix or dataframe.
After evaluating the Examples for spseg function, it should have been noted that the data should have the same number of rows as the id number of the Spatial object. In your sample, the id is the cities that have different polygons.
First, let's examine the bound data;
setwd("~/example")
library(seg)
library(sf)
#For the fortify function
library(ggplot2)
bound <- st_read("bound.shp")
bound <- as_Spatial(bound)
class(bound)
"SpatialPolygonsDataFrame"
attr(,"package")
"sp"
tail(fortify(bound))
Regions defined for each Polygons
long lat order hole piece id group
5379 83.99410 27.17326 972 FALSE 1 5 5.1
5380 83.99583 27.17339 973 FALSE 1 5 5.1
5381 83.99705 27.17430 974 FALSE 1 5 5.1
5382 83.99792 27.17552 975 FALSE 1 5 5.1
5383 83.99810 27.17690 976 FALSE 1 5 5.1
5384 83.99812 27.17700 977 FALSE 1 5 5.1
So you have 5 id's in your SpatialPolygonsDataFrame. Now, let's read the point.csv with read.csv function since the data is required to be in matrix format for the spseg function.
points <- read.csv("c://Users/cemozen/Downloads/example/points.csv")
tail(points)
group_m group_h x y
950 4.95 78.49000 84.32887 26.81203
951 5.30 86.22167 84.27448 26.76932
952 8.68 77.85333 84.33353 26.80942
953 7.75 82.34000 84.35270 26.82850
954 7.75 82.34000 84.35270 26.82850
955 7.75 82.34000 84.35270 26.82850
In the documentation and the example within, it has been strictly stated that; the row number of the points which have two attributes (group_m and group_h in our data), should be equal to the id number (which is the cities). Maybe, you should calculate a value by using the mean for each polygon or any other statistics for each city in your data to be able to get only one value for each polygon.
On the other hand, I just would like to show that the function is working properly after feeding with a matrix that has 5 rows and 2 groups.
sample_spseg <- spseg(bound, as.matrix(points[1:5,c("group_m", "group_h")]))
print(sample_spseg)
Reardon and O'Sullivan's spatial segregation measures
Dissimilarity (D) : 0.0209283
Relative diversity (R): -0.008781
Information theory (H): -0.0066197
Exposure/Isolation (P):
group_m group_h
group_m 0.07577679 0.9242232
group_h 0.07516285 0.9248372
--
The exposure/isolation matrix should be read horizontally.
Read 'help(spseg)' for more details.
first: I do not have experience with the seg-package and it's function.
What I read from your question, is that you want to perform the spseg-function, om the points within each area?
If so, here is a possible apprach:
library(sf)
library(tidyverse)
library(seg)
library(mapview) # for quick viewing only
# read polygons, make valif to avoid probp;ems later on
areas <- st_read("./temp/example/bound.shp") %>%
sf::st_make_valid()
# read points and convert to sf object
points <- read.csv("./temp/example/points.csv") %>%
sf::st_as_sf(coords = c("x", "y"), crs = 4326) %>%
#spatial join city (use st_intersection())
sf::st_join(areas)
# what do we have so far??
mapview::mapview(points, zcol = "city")
# get the coordinates back into a data.frame
mydata <- cbind(points, st_coordinates(points))
# drop the geometry, we do not need it anymore
st_geometry(mydata) <- NULL
# looks like...
head(mydata)
# group_m group_h city X Y
# 1 8.02 84.51 2 84.02780 27.31180
# 2 8.02 84.51 2 84.02780 27.31180
# 3 8.02 84.51 2 84.02780 27.31180
# 4 5.01 84.96 2 84.04308 27.27651
# 5 5.01 84.96 2 84.04622 27.27152
# 6 5.01 84.96 2 84.04622 27.27152
# Split to a list by city
L <- split(mydata, mydata$city)
# loop over list and perform sppseg function
final <- lapply(L, function(i) spseg(x = i[, 4:5], data = i[, 1:2]))
# test for the first city
final[[1]]
# Reardon and O'Sullivan's spatial segregation measures
#
# Dissimilarity (D) : 0.0063
# Relative diversity (R): -0.0088
# Information theory (H): -0.0067
# Exposure/Isolation (P):
# group_m group_h
# group_m 0.1160976 0.8839024
# group_h 0.1157357 0.8842643
# --
# The exposure/isolation matrix should be read horizontally.
# Read 'help(spseg)' for more details.
spplot(final[[1]], main = "Equal")

Is there an R function to generate a column of U.S county names based on latitude and longitude? [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

County area calculated from NLCD (Landcover data) rasters is too large

I'm trying to calculate landcover repartition for each US county.
I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.
The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.
The following describes the code used and the method used to calculate. The county data can be downloaded here:
https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/
The following code assumes the county shapefile is saved and unzipped.
Get and read the data
#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)
#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")
# Get NCLD data
nlcd_data <- get_nlcd(apache,
year = 2011,
label = "Apache",
force.redo = TRUE)
nlcd_data #inspect the object, can see that number of cells is around 57 million
I have then extracted the values of the raster and put them into a frequency table. From there I calculate the resulting area. Since the NLCD data is 30m resolution, I multiply the number of cell of each category by 900 and divide the result by 1 million to obtain the area in Square Kilometer.
The calculated total area is too large.
# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells
landcover_freq<-table(landcover)
df_landcover <- as.data.frame(landcover_freq)
res <- df_landcover %>%
mutate(area_type_sqm = Freq*900,
area_type_km=area_type_sqm/1e6,
area_sqkm = sum(area_type_km))%>%
group_by(landcover)%>%
mutate(pc_land =round(100*area_type_km/area_sqkm,1))
head(arrange(res,desc(pc_land)))
# A tibble: 6 x 6
# Groups: landcover [6]
landcover Freq area_type_sqm area_type_km area_sqkm pc_land
<fct> <int> <dbl> <dbl> <dbl> <dbl>
1 52 33455938 30110344200 30110. 51107. 58.9
2 42 16073820 14466438000 14466. 51107. 28.3
3 71 5999412 5399470800 5399. 51107. 10.6
4 31 488652 439786800 440. 51107. 0.9
5 21 362722 326449800 326. 51107. 0.6
6 22 95545 85990500 86.0 51107. 0.2
## Total area calculated from raster is 51,107 square km
apache_area <- as.data.frame(apache) %>%
mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
select(AREA)
apache_area$AREA
29055.47 #Official area of apache county (in square km)
Visual inspection of the shapefile and the raster:
The difference doesn't seem large enough to justify the difference
apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)
The reason is that you get the data returned in the Mercator projection.
crs(nlcd_data)
CRS arguments:
+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=#null +wktext
+no_defs
This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.
The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.
The reported spatial resolution is
res(nlcd_data)
[1] 30 30
So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.
First get the data
library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)
# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))
To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).
f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)
# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)
# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)
The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.
library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(x)
#[1] 30 30
Note that +proj=aea stands for the Albers Equal Area projection!
ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)
par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)
As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.
You could transform the Mercator data back to the original +proj=aea, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.
Finally, to get the area covered by each land cover class
m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)
# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]
Voila:
f[, c("class", "area")]
# class area
#1 Open Water 21.5
#2 Developed, Open Space 175.3
#3 Developed, Low Intensity 46.4
#4 Developed, Medium Intensity 9.7
#5 Developed, High Intensity 1.0
#6 Barren Land 232.5
#7 Deciduous Forest 44.6
#8 Evergreen Forest 7604.6
#9 Mixed Forest 2.0
#10 Shrub/Scrub 18262.8
#11 Herbaceuous 2514.4
#12 Hay/Pasture 1.9
#13 Cultivated Crops 0.0
#14 Woody Wetlands 38.8
#15 Emergent Herbaceuous Wetlands 53.6
And the total is as expected
sum(f$area)
#[1] 29009.1
PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.
Y'all, thanks so much for catching this, and thanks Robert for alerting me on Github! I hadn't realized that WCS was serving up transformed data. Luckily, the MRLC has made the data available in its native format as well, so I've gone ahead and pushed an update to FedData that provides those data (in CONUS Albers).
FWIW (no pictures, because this is my first SO post):
#devtools::install_github("ropensci/FedData")
library(FedData)
library(magrittr)
library(dplyr)
library(tigris)
library(raster)
#Get Apache polygon
apache <-
tigris::counties(state = "AZ") %>%
dplyr::filter(NAME == "Apache")
# Get NCLD data
nlcd_data <-
FedData::get_nlcd(
template = apache,
year = 2011,
label = "Apache",
force.redo = TRUE
)
# Transform Apache polygon to NLCD CRS
apache %<>%
sf::st_transform(raster::crs(nlcd_data))
# Plot NLCD raster and transformed Apache polygon
raster::plot(nlcd_data)
apache %>%
sf::st_geometry() %>%
plot(border = "white", lwd = 2, add = TRUE)
https://i.imgur.com/5b09t7P.png
# Mask NLCD data by Apache County (and plot result)
nlcd_data %<>%
raster::mask(apache)
plot(nlcd_data)
https://i.imgur.com/UinQ0v3.png
# Table of areas in km^2
nlcd_data_table <-
nlcd_data %>%
values() %>%
tibble::tibble(landcover = .) %>%
na.omit() %>%
dplyr::group_by(landcover) %>%
count(name = "freq") %>%
dplyr::mutate(area = (freq * 900) %>%
units::set_units("m^2") %>%
units::set_units("km^2"))
nlcd_data_table
#> # A tibble: 15 x 3
#> # Groups: landcover [15]
#> landcover freq area
#> <int> <int> [km^2]
#> 1 11 24134 21.7206
#> 2 21 194720 175.2480
#> 3 22 51321 46.1889
#> 4 23 10485 9.4365
#> 5 24 1069 0.9621
#> 6 31 259987 233.9883
#> 7 41 48906 44.0154
#> 8 42 8456196 7610.5764
#> 9 43 2179 1.9611
#> 10 52 19941185 17947.0665
#> 11 71 3190650 2871.5850
#> 12 81 2169 1.9521
#> 13 82 4 0.0036
#> 14 90 42628 38.3652
#> 15 95 59014 53.1126
# Total area NLCD raster:
sum(nlcd_data_table$area)
#> 29056.18 [km^2]
# Area of Apache county
apache %>%
sf::st_area() %>%
units::set_units("km^2")
#> 29056.19 [km^2]
Created on 2021-05-26 by the reprex package (v2.0.0)

How to extract the values of a cell in a raster (in this case landcover type) with data points (coordinates) in a data frame

I have a data frame (called "events") of camera trap data with coordinates and want to extract the habitat type using a raster file for each location and add the habitat type to my data frame. How do I extract this using the raster and the data frame coordinates? How do I add this to another main data frame afterwards?
## Creating the raster file from a shapefile
myfile <- shapefile("dpky.lc5.shp")
myfile#data$VALUE<-as.numeric(myfile#data$VALUE) # VALUE gives the numeric code for habitat type.
sr <- "+init=EPSG:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
r <- raster(myfile, res=100, crs=sr)
myraster<-rasterize(myfile,r,field="VALUE")
myras_spdf <- as(myraster, "SpatialPixelsDataFrame")
myras_df <- as.data.frame(myras_spdf)
## Data frame with coordinates
events <- read.csv("DPKY.Clean.csv",h=T,sep=";")
events.sp<-SpatialPoints(events[,c("Longitude","Latitude")],proj4string = CRS("+init=EPSG:4326"))
events.sp
I have not been able to find any code for this problem yet specific to my problem. I did manage using another .gri file but that code doesn't work for this.
It appears that you have points and polygons, and what to query their values with points. In other words, extract the values for the points from the polygons. If that is the case, it makes no sense to create a RasterLayer (and/or a SpatialPixels) object.
Always provide some example data (p has the polygons, d is a data.frame with coordinates)
library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
set.seed(10)
d <- coordinates(spsample(p, 4, "regular"))
colnames(d) <- c("lon", "lat")
d <- data.frame(id=1:nrow(d), d)
Solution
x <- extract(p, d[,c("lon", "lat")])
Now you can do
cbind(d, x[,c(4,6)])
# id lon lat NAME_1 NAME_2
#1 1 5.889636 49.53576 Luxembourg Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg Luxembourg
#3 3 5.889636 49.82246 Diekirch Redange
#4 4 6.176340 49.82246 Diekirch Diekirch
Or something like this
d$NAME_2 <- x$NAME_2
d
# id lon lat NAME_2
#1 1 5.889636 49.53576 Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg
#3 3 5.889636 49.82246 Redange
#4 4 6.176340 49.82246 Diekirch

Determine whether coordinates fall within geographic region [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

Resources