R: applying custom function row by row with mutate()

R: applying custom function row by row with mutate() - r

I have created a function that uses st_join() from the sf package to extract the congressional district (a polygon) from a set of latitude and longitude coordinates, using a different shapefile to identify the congressional district depending on a "congress" argument that is specified. (This is necessary because districts are periodically redrawn, so the boundaries change over time.) The next step is to apply the function row by row to a data frame containing multiple rows of coordinates (and associated "congress" values) so that the congress value for a given row determines which shapefile to use, and then assign the extracted district to a new variable.
I'm running into trouble applying this function row-by-row. I first tried using the rowwise() and mutate() functions from dplyr, but got a "must be size 1" error. Based on the comments to this question, I put list() around the variable assigned inside the mutate() function, but this has resulted in the new variable being a list instead a single character string.
I would greatly appreciate help figuring out a way to either (i) modify the function so that it can be applied row by row using rowwise() and mutate() or (ii) apply my function row-by-row in some other way.
Reproducible code is below; you just need to download two shapefiles from https://cdmaps.polisci.ucla.edu/ ("districts104.zip" and "districts111.zip"), unzip them, and put them in your working directory.
library(tidyverse)
library(sf)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111)
latitude <- c(37.32935, 37.32935)
longitude <- c(-122.00954, -122.00954)
df_test <- data.frame(congress, latitude, longitude)
point_geo_test <- st_as_sf(df_test,
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104)) # prep for st_join()
sf_use_s2(FALSE) # preempt evaluation error that would otherwise pop up when using the st_join function
extract_district <- function(points, cong) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
paste(st_join_results$STATENAME, st_join_results$DISTRICT, sep = "-")
}
point_geo_test <- point_geo_test %>%
rowwise %>%
mutate(district = list(extract_district(points = point_geo_test, cong = congress)))

Edit 7 July:
From your comments I understand you were looking for something different, the assumption I made about why your function was giving multiple values was wrong. Hence this new answer from scratch:
The custom function you've written doesn't lend itself to row-by-row application, because it already processes all rows at once:
Given the following input:
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test contains these values:
> point_geo_test
[...]
congress geometry
1 104 POINT (-122.0095 37.32935)
2 111 POINT (-122.0095 37.32935)
3 104 POINT (73.72036 41.1134)
4 111 POINT (73.72036 41.1134)
5 104 POINT (-87.86885 42.15549)
6 111 POINT (-87.86885 42.15549)
and extract_district() returns this:
> extract_district(point_geo_test, 104)
[...]
[1] "California-14" "California-14" "NA-NA" "NA-NA" "Illinois-10" "Illinois-10"
This is already a result for each row. The only problem is, while they are the correct results for the coordinates of each row, they the name for those coordinates only during congress 104. Hence, these values are only valid for the rows in point_geo_test where congress == 104.
Extracting correct values for all rows
We will create a function that returns the correct data for all rows, eg the correct name for the coordinates during the associated congress.
I've simplified your code slightly: the df_test is not an intermediate data frame any more, but defined directly in the creation of point_geo_test. Any values I extract, I'll save into this data frame as well.
library(tidyverse)
library(sf)
sf_use_s2(FALSE)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test <- st_as_sf(data.frame(congress, latitude, longitude),
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104))
To keep the code more flexible and organized, I'll create a generic function that can fetch any parameter for the given coordinates:
extract_values <- function(points, parameter) {
# initialize return values, one for each row in `points`
values <- rep(NA, nrow(points))
# for each congress present in `points`, lookup parameter and store in the rows with matching congress
for(cong in unique(points$congress)) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
values[points$congress == cong] <- st_join_results[[parameter]][points$congress == cong]
}
return(values)
}
Examples:
> extract_values(point_geo_test, 'STATENAME')
[1] "California" "California" NA NA "Illinois" "Illinois"
> extract_values(point_geo_test, 'DISTRICT')
[1] "14" "15" NA NA "10" "10"
Storing values
point_geo_test$state <- extract_values(point_geo_test, 'STATENAME')
point_geo_test$district <- extract_values(point_geo_test, 'DISTRICT')
point_geo_test$name <- paste(point_geo_test$state, point_geo_test$district, sep = "-")
Result:
> point_geo_test
Simple feature collection with 6 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -122.0095 ymin: 37.32935 xmax: 73.72036 ymax: 42.15549
Geodetic CRS: GRS 1980(IUGG, 1980)
congress state district name geometry
1 104 California 14 California-14 POINT (-122.0095 37.32935)
2 111 California 15 California-15 POINT (-122.0095 37.32935)
3 104 <NA> <NA> NA-NA POINT (73.72036 41.1134)
4 111 <NA> <NA> NA-NA POINT (73.72036 41.1134)
5 104 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)
6 111 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)

Related

Is there an R function to generate a column of U.S county names based on latitude and longitude? [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!

Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS

You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)

See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.

Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233

It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

How to write a function to matches values in two dataframes (make a faster version)

I have a dataframe with coordinates of regions of interest, and another dataframe with temperature readings (bio1) taken in research stations, and their coordinates.
I'd like to create a new column to match the region of interest with the temperature of the nearest research station.
I have managed to do this with the following code (here's a simplified fake dataframe pair)
df1 <- data.frame(latitude = c(10.5,6,2), longitude = c(18,9,4))
df2 <- data.frame(vy = c(10,5,3), vx = c(20,10,3), bio1 = c('a','b','c'))
for(i in 1:nrow(df1)){
df1$temperature[i] <- df2$bio1[which(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i]) ==
min(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i])))]
}
So, this code checks all the combinations and choses the one with the smallest distance between latitude and longitude at each row.
I checked and it seems to work, but it's very slow to use on large dataframes.
Can you solve this issue with a faster method?

Something like this might work
library(tidyverse)
library(sf)
# put some id's in df1
df1$id <- LETTERS[1:3]
# make df1 and df2 simple objects
sf1 <- df1 %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
sf2 <- df2 %>%
st_as_sf(coords = c("vy", "vx"), crs = 4326)
# find nearest sf2 in sf1
sf1 %>%
mutate(nearest_bio = sf2$bio1[st_nearest_feature(sf2)])
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 4 ymin: 2 xmax: 18 ymax: 10.5
# Geodetic CRS: WGS 84
# id geometry nearest_bio
# 1 A POINT (18 10.5) b
# 2 B POINT (9 6) c
# 3 C POINT (4 2) b

Checking if a point falls within polygon Shapefile

I have a shapefile about NYC Yellow cab service zones: taxi_zones.shp. It can be download here: https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
I want to check whether certain locations fall into any of the zones. Here is the R code I use:
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- do.call("st_sfc",c(lapply(1:nrow(pnts),
function(i) {st_point(as.numeric(pnts[i, ]))}), list("crs" = 4326)))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
zones <- apply(st_intersects(tt_trans, pnts_trans, sparse = FALSE), 2,
function(col) {
tt_trans[which(col), ]$LocationID
})
The first two points are within the zones defined by the shapefile. However, the third point is not. And the fourth point has incorrect coordinates. How should I modify the code so that for points outside the zones and points with incorrect coordinates, it returns 'NA'?

I have my own approach. Would that fulfill your requirements? I can't tell you what specifically is wrong with your code, but this one is also a bit cleaner:
library(sf)
tt <- read_sf('./Downloads/taxi_zones/taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817, -74.00668, 0, 500),
"y" = c(40.75798, 40.73178, 0, 400)
)
pnts_sf <- st_as_sf(pnts, coords = c('x', 'y'), crs = st_crs(4326))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
pnts_trans <- pnts_sf %>% mutate(
intersection = as.integer(st_intersects( pnts_trans,tt_trans)))
The result would be
geometry intersection
1 POINT (-73.97817 40.75798) 161
2 POINT (-74.00668 40.73178) 158
3 POINT (0 0) NA
4 POINT (500 400) NA

I suggest you consider joining your spatial objects via sf::st_join(), as shown bellow; what it does is that it combines the attributes of your polygon objects and points objects.
The default behaviour is "left" join = points lacking polygons will get NA. It can be tweaked by setting left = FALSE in join parameters, resulting in "inner" join behaviour = points not contained in polygons will be omitted from result.
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- sf::st_as_sf(pnts, coords = c("x", "y"), crs = 4326)
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
res <- sf::st_join(pnts_trans, tt_trans)
print(res)
Simple feature collection with 4 features and 6 fields (with 1 geometry empty)
geometry type: POINT
dimension: XY
bbox: xmin: 2152087 ymin: -130624.1 xmax: 9480615 ymax: 1178046
projected CRS: NAD27 / US National Atlas Equal Area
OBJECTID Shape_Leng Shape_Area zone LocationID borough geometry
1 161 0.03580391 7.191307e-05 Midtown Center 161 Manhattan POINT (2153474 -127064.5)
2 158 0.05480999 1.855683e-04 Meatpacking/West Village West 158 Manhattan POINT (2152087 -130624.1)
3 NA NA NA <NA> NA <NA> POINT (9480615 1178046)
4 NA NA NA <NA> NA <NA> POINT EMPTY

While loop inside a for loop to calculate geospatial distance between 2 datasets in R

I have a data.table with 957 geocodes. I want to match it with another dataset with 317 geocodes. The matching condition is geospatial proximity. I want to match each observation from the first dataset to an observation from the second one such that the distance between both observations is 5000 meters or less.
My data looks like this:
> muni[1:3]
mun Lat_Decimal Lon_Decimal
1: 1001 21.76672 -102.2818
2: 1002 22.16597 -102.0657
3: 1003 21.86138 -102.7248
> stations[1:3]
station_number station_lat station_long
1: 10003 25.100 -106.567
2: 10018 24.944 -106.259
3: 10031 24.523 -105.952
I am using the distm function from library(geosphere) to calculate the distance.
I figured the way to attack this problem is a while loop. The idea is to take the first observation from muni and measure the distance to the first observation in stations. If the distance is 5000 meters or less, then assign the station_number of the first observation in station to the first observation in muni. If the distance is greater than 5000, then try the next observation in muni until the distance is 5000 meters or less.
Essentially, it's a loop that finds the first observation in stations that's 5000 meters or closer to an observation in muni.
This is a preliminary attempt at it:
for (i in 1:957) {
j = 1
while (distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
stations[j, .(station_long, station_lat)]) > 5000 & j <= 317) {
muni[i, station_number := as.integer(stations[j, station_number])]
muni[i, distance := distm(muni[i, .(Lon_Decimal, Lat_Decimal)],
stations[j, .(station_long, station_lat)])]
j = j + 1
}
}
I can tell this is not working because none of the rows in ´muni´ appear to have been overwritten after running this loop for (i in 1:3). I suppose there is an error in my loop that is ignoring the station_number := and distance := parts.
I would expect this loop to overwrite muni such that all the entire column had a station_number.

I've read your few sample points as data.frames and converted them to sf below for the answer. If you're attached to geosphere, forgive the pun, everything should still apply the same, given that geosphere::distm also returns a matrix of distances.
First we get your data into sf format:
library(sf)
stations_raw <- "station_number station_lat station_long
1: 10003 25.100 -106.567
2: 10018 24.944 -106.259
3: 10031 24.523 -105.952"
mun_raw <- "mun Lat_Decimal Lon_Decimal
1: 1001 21.76672 -102.2818
2: 1002 22.16597 -102.0657
3: 1003 21.86138 -102.7248"
mun_df <- read.table(text = mun_raw)
stations_df <- read.table(text = stations_raw)
mun_sf <- st_as_sf(mun_df, coords = c("Lon_Decimal", "Lat_Decimal"), crs = 4326)
stations_sf <- st_as_sf(stations_df,
coords = c("station_long", "station_lat"), crs = 4326)
Then, we find the minimum for each interaction between dots:
closest <- list()
for(i in seq_len(nrow(mun_sf))){
closest[[i]] <- stations_sf[which.min(
st_distance(stations_sf, mun_sf[i,])),]
}
Finally, we extract the identifiers and attach them to the original df, removing the mun_id as you request:
mun_sf$closest_station <- purrr::map_chr(closest, "station_number")
mun_sf <- mun_sf[, c("closest_station", "geometry")]
mun_sf
#> Simple feature collection with 3 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -102.7248 ymin: 21.76672 xmax: -102.0657 ymax: 22.16597
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> closest_station geometry
#> 1: 10031 POINT (-102.2818 21.76672)
#> 2: 10031 POINT (-102.0657 22.16597)
#> 3: 10031 POINT (-102.7248 21.86138)
The plot below helps visually check that, in this toy example, we've got the right answer.
ggplot() +
geom_sf(data = mun_sf, colour = "red") +
geom_sf_text(data = mun_sf, aes(label = mun), nudge_x = 0.25) +
geom_sf(data = stations_sf, colour = "blue") +
geom_sf_text(data = stations_sf, aes(label = station_number), nudge_x = -0.25)
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may
#> not give correct results for longitude/latitude data

Determine whether coordinates fall within geographic region [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!

Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS

You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)

See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.

Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233

It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: applying custom function row by row with mutate() - r

Related

Is there an R function to generate a column of U.S county names based on latitude and longitude? [duplicate]

How to write a function to matches values in two dataframes (make a faster version)

Checking if a point falls within polygon Shapefile

While loop inside a for loop to calculate geospatial distance between 2 datasets in R

Determine whether coordinates fall within geographic region [duplicate]

Categories

Resources