Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)
Related
Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)
I'm trying to calculate landcover repartition for each US county.
I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.
The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.
The following describes the code used and the method used to calculate. The county data can be downloaded here:
https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/
The following code assumes the county shapefile is saved and unzipped.
Get and read the data
#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)
#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")
# Get NCLD data
nlcd_data <- get_nlcd(apache,
year = 2011,
label = "Apache",
force.redo = TRUE)
nlcd_data #inspect the object, can see that number of cells is around 57 million
I have then extracted the values of the raster and put them into a frequency table. From there I calculate the resulting area. Since the NLCD data is 30m resolution, I multiply the number of cell of each category by 900 and divide the result by 1 million to obtain the area in Square Kilometer.
The calculated total area is too large.
# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells
landcover_freq<-table(landcover)
df_landcover <- as.data.frame(landcover_freq)
res <- df_landcover %>%
mutate(area_type_sqm = Freq*900,
area_type_km=area_type_sqm/1e6,
area_sqkm = sum(area_type_km))%>%
group_by(landcover)%>%
mutate(pc_land =round(100*area_type_km/area_sqkm,1))
head(arrange(res,desc(pc_land)))
# A tibble: 6 x 6
# Groups: landcover [6]
landcover Freq area_type_sqm area_type_km area_sqkm pc_land
<fct> <int> <dbl> <dbl> <dbl> <dbl>
1 52 33455938 30110344200 30110. 51107. 58.9
2 42 16073820 14466438000 14466. 51107. 28.3
3 71 5999412 5399470800 5399. 51107. 10.6
4 31 488652 439786800 440. 51107. 0.9
5 21 362722 326449800 326. 51107. 0.6
6 22 95545 85990500 86.0 51107. 0.2
## Total area calculated from raster is 51,107 square km
apache_area <- as.data.frame(apache) %>%
mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
select(AREA)
apache_area$AREA
29055.47 #Official area of apache county (in square km)
Visual inspection of the shapefile and the raster:
The difference doesn't seem large enough to justify the difference
apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)
The reason is that you get the data returned in the Mercator projection.
crs(nlcd_data)
CRS arguments:
+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=#null +wktext
+no_defs
This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.
The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.
The reported spatial resolution is
res(nlcd_data)
[1] 30 30
So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.
First get the data
library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)
# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))
To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).
f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)
# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)
# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)
The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.
library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(x)
#[1] 30 30
Note that +proj=aea stands for the Albers Equal Area projection!
ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)
par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)
As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.
You could transform the Mercator data back to the original +proj=aea, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.
Finally, to get the area covered by each land cover class
m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)
# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]
Voila:
f[, c("class", "area")]
# class area
#1 Open Water 21.5
#2 Developed, Open Space 175.3
#3 Developed, Low Intensity 46.4
#4 Developed, Medium Intensity 9.7
#5 Developed, High Intensity 1.0
#6 Barren Land 232.5
#7 Deciduous Forest 44.6
#8 Evergreen Forest 7604.6
#9 Mixed Forest 2.0
#10 Shrub/Scrub 18262.8
#11 Herbaceuous 2514.4
#12 Hay/Pasture 1.9
#13 Cultivated Crops 0.0
#14 Woody Wetlands 38.8
#15 Emergent Herbaceuous Wetlands 53.6
And the total is as expected
sum(f$area)
#[1] 29009.1
PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.
Y'all, thanks so much for catching this, and thanks Robert for alerting me on Github! I hadn't realized that WCS was serving up transformed data. Luckily, the MRLC has made the data available in its native format as well, so I've gone ahead and pushed an update to FedData that provides those data (in CONUS Albers).
FWIW (no pictures, because this is my first SO post):
#devtools::install_github("ropensci/FedData")
library(FedData)
library(magrittr)
library(dplyr)
library(tigris)
library(raster)
#Get Apache polygon
apache <-
tigris::counties(state = "AZ") %>%
dplyr::filter(NAME == "Apache")
# Get NCLD data
nlcd_data <-
FedData::get_nlcd(
template = apache,
year = 2011,
label = "Apache",
force.redo = TRUE
)
# Transform Apache polygon to NLCD CRS
apache %<>%
sf::st_transform(raster::crs(nlcd_data))
# Plot NLCD raster and transformed Apache polygon
raster::plot(nlcd_data)
apache %>%
sf::st_geometry() %>%
plot(border = "white", lwd = 2, add = TRUE)
https://i.imgur.com/5b09t7P.png
# Mask NLCD data by Apache County (and plot result)
nlcd_data %<>%
raster::mask(apache)
plot(nlcd_data)
https://i.imgur.com/UinQ0v3.png
# Table of areas in km^2
nlcd_data_table <-
nlcd_data %>%
values() %>%
tibble::tibble(landcover = .) %>%
na.omit() %>%
dplyr::group_by(landcover) %>%
count(name = "freq") %>%
dplyr::mutate(area = (freq * 900) %>%
units::set_units("m^2") %>%
units::set_units("km^2"))
nlcd_data_table
#> # A tibble: 15 x 3
#> # Groups: landcover [15]
#> landcover freq area
#> <int> <int> [km^2]
#> 1 11 24134 21.7206
#> 2 21 194720 175.2480
#> 3 22 51321 46.1889
#> 4 23 10485 9.4365
#> 5 24 1069 0.9621
#> 6 31 259987 233.9883
#> 7 41 48906 44.0154
#> 8 42 8456196 7610.5764
#> 9 43 2179 1.9611
#> 10 52 19941185 17947.0665
#> 11 71 3190650 2871.5850
#> 12 81 2169 1.9521
#> 13 82 4 0.0036
#> 14 90 42628 38.3652
#> 15 95 59014 53.1126
# Total area NLCD raster:
sum(nlcd_data_table$area)
#> 29056.18 [km^2]
# Area of Apache county
apache %>%
sf::st_area() %>%
units::set_units("km^2")
#> 29056.19 [km^2]
Created on 2021-05-26 by the reprex package (v2.0.0)
I have a shapefile about NYC Yellow cab service zones: taxi_zones.shp. It can be download here: https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
I want to check whether certain locations fall into any of the zones. Here is the R code I use:
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- do.call("st_sfc",c(lapply(1:nrow(pnts),
function(i) {st_point(as.numeric(pnts[i, ]))}), list("crs" = 4326)))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
zones <- apply(st_intersects(tt_trans, pnts_trans, sparse = FALSE), 2,
function(col) {
tt_trans[which(col), ]$LocationID
})
The first two points are within the zones defined by the shapefile. However, the third point is not. And the fourth point has incorrect coordinates. How should I modify the code so that for points outside the zones and points with incorrect coordinates, it returns 'NA'?
I have my own approach. Would that fulfill your requirements? I can't tell you what specifically is wrong with your code, but this one is also a bit cleaner:
library(sf)
tt <- read_sf('./Downloads/taxi_zones/taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817, -74.00668, 0, 500),
"y" = c(40.75798, 40.73178, 0, 400)
)
pnts_sf <- st_as_sf(pnts, coords = c('x', 'y'), crs = st_crs(4326))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
pnts_trans <- pnts_sf %>% mutate(
intersection = as.integer(st_intersects( pnts_trans,tt_trans)))
The result would be
geometry intersection
1 POINT (-73.97817 40.75798) 161
2 POINT (-74.00668 40.73178) 158
3 POINT (0 0) NA
4 POINT (500 400) NA
I suggest you consider joining your spatial objects via sf::st_join(), as shown bellow; what it does is that it combines the attributes of your polygon objects and points objects.
The default behaviour is "left" join = points lacking polygons will get NA. It can be tweaked by setting left = FALSE in join parameters, resulting in "inner" join behaviour = points not contained in polygons will be omitted from result.
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- sf::st_as_sf(pnts, coords = c("x", "y"), crs = 4326)
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
res <- sf::st_join(pnts_trans, tt_trans)
print(res)
Simple feature collection with 4 features and 6 fields (with 1 geometry empty)
geometry type: POINT
dimension: XY
bbox: xmin: 2152087 ymin: -130624.1 xmax: 9480615 ymax: 1178046
projected CRS: NAD27 / US National Atlas Equal Area
OBJECTID Shape_Leng Shape_Area zone LocationID borough geometry
1 161 0.03580391 7.191307e-05 Midtown Center 161 Manhattan POINT (2153474 -127064.5)
2 158 0.05480999 1.855683e-04 Meatpacking/West Village West 158 Manhattan POINT (2152087 -130624.1)
3 NA NA NA <NA> NA <NA> POINT (9480615 1178046)
4 NA NA NA <NA> NA <NA> POINT EMPTY
I have a data frame (called "events") of camera trap data with coordinates and want to extract the habitat type using a raster file for each location and add the habitat type to my data frame. How do I extract this using the raster and the data frame coordinates? How do I add this to another main data frame afterwards?
## Creating the raster file from a shapefile
myfile <- shapefile("dpky.lc5.shp")
myfile#data$VALUE<-as.numeric(myfile#data$VALUE) # VALUE gives the numeric code for habitat type.
sr <- "+init=EPSG:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
r <- raster(myfile, res=100, crs=sr)
myraster<-rasterize(myfile,r,field="VALUE")
myras_spdf <- as(myraster, "SpatialPixelsDataFrame")
myras_df <- as.data.frame(myras_spdf)
## Data frame with coordinates
events <- read.csv("DPKY.Clean.csv",h=T,sep=";")
events.sp<-SpatialPoints(events[,c("Longitude","Latitude")],proj4string = CRS("+init=EPSG:4326"))
events.sp
I have not been able to find any code for this problem yet specific to my problem. I did manage using another .gri file but that code doesn't work for this.
It appears that you have points and polygons, and what to query their values with points. In other words, extract the values for the points from the polygons. If that is the case, it makes no sense to create a RasterLayer (and/or a SpatialPixels) object.
Always provide some example data (p has the polygons, d is a data.frame with coordinates)
library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
set.seed(10)
d <- coordinates(spsample(p, 4, "regular"))
colnames(d) <- c("lon", "lat")
d <- data.frame(id=1:nrow(d), d)
Solution
x <- extract(p, d[,c("lon", "lat")])
Now you can do
cbind(d, x[,c(4,6)])
# id lon lat NAME_1 NAME_2
#1 1 5.889636 49.53576 Luxembourg Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg Luxembourg
#3 3 5.889636 49.82246 Diekirch Redange
#4 4 6.176340 49.82246 Diekirch Diekirch
Or something like this
d$NAME_2 <- x$NAME_2
d
# id lon lat NAME_2
#1 1 5.889636 49.53576 Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg
#3 3 5.889636 49.82246 Redange
#4 4 6.176340 49.82246 Diekirch
I want to assign country Exclusive Economic Zones to point data from a raster where the points represent Aragonite saturation levels in the ocean.
The raster is a single layer that gives an Aragonite value for many latitude/longitude points in the ocean.
I want to assign each latitude/longitude point to an exclusive economic zone.
This site does it for single pairs of coordinates but I have 15,000 points so I am hoping it is possible to do in R.
The data look like this:
long lat Aragonite
1 20.89833 84.66917 1.542071
2 22.69496 84.66917 1.538187
3 24.49159 84.66917 1.537830
4 26.28822 84.66917 1.534834
5 28.08485 84.66917 1.534595
6 29.88148 84.66917 1.532505
Previously I have used the below code to assign countries to raster points but this gives NA back for many of the points in the ocean that are within national EEZ's.
#convert the raster to points for assigning countries
r.pts <- rasterToPoints(r, spatial = TRUE)
#view new proj 4 string of spatialpointsdataframe
proj4string(r.pts)
##[1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
###converting reclassified points to countries
# The single argument to this function, points, is a data.frame in which:
# - column 1 contains the longitude in degrees
# - column 2 contains the latitude in degrees
coords2country = function(r.pts)
{
countriesSP <- getMap(resolution='high')
#countriesSP <- getMap(resolution='high') #you could use high res map from rworldxtra if you were concerned about detail
#setting CRS directly to that from rworldmap
r.pts = SpatialPoints(r.pts, proj4string=CRS(proj4string(countriesSP)))
# use 'over' to get indices of the Polygons object containing each point
indices = over(r.pts, countriesSP)
# return the ADMIN names of each country
indices$ADMIN
#indices$ISO3 # returns the ISO3 code
#indices$continent # returns the continent (6 continent model)
#indices$REGION # returns the continent (7 continent model)
}
#get country names for each pair of points
rCountries <- coords2country(r.pts)
Is there any way to do a similar function to coords2countries but for EEZ's in the ocean?
EDIT: some data for reproducible example
dput(head(r.pts))
structure(list(layer = c(5, 5, 5, 5, 5, 5), x = c(-178.311660375408,-176.511660375408, -174.711660375408, -172.911660375408, -171.111660375408,-169.311660375408), y = c(73.1088933113454, 73.1088933113454,73.1088933113454, 73.1088933113454, 73.1088933113454, 73.1088933113454),.Names = c("layer", "x", "y"),row.names = c(NA, 6L), class = "data.frame")
You need a shapefile that includes the EEZs. Download the one that is here: World EEZ v9 (2016-10-21, 123 MB) http://www.marineregions.org/downloads.php#marbound
You can load the EEZ shapefile with the readOGR() function from the rgdal package. Unzip the EEZ shapefile zip to your working directory and run countriesSP <- rgdal::readOGR(dsn = ".", layer = "eez_boundaries") in place of countriesSP <- getMap(resolution='high')
None of the example data you provided falls in a country's EEZ, so I can't tell if this actually works, but it probably should...
library(sp)
library(rworldmap)
library(rgeos)
r <- read.table(header = TRUE, text = "
long lat Aragonite
1 20.89833 84.66917 1.542071
2 22.69496 84.66917 1.538187
3 24.49159 84.66917 1.537830
4 26.28822 84.66917 1.534834
5 28.08485 84.66917 1.534595
6 29.88148 84.66917 1.532505
")
# or
#r <- data.frame(long = c(-178.311660375408,-176.511660375408, -174.711660375408, -172.911660375408, -171.111660375408,-169.311660375408),
# lat = c(73.1088933113454, 73.1088933113454,73.1088933113454, 73.1088933113454, 73.1088933113454, 73.1088933113454))
r.pts <- sp::SpatialPoints(r)
# download file from here: http://www.marineregions.org/download_file.php?fn=v9_20161021
# put the zip file in your working directory: getwd()
unzip('World_EEZ_v9_20161021.zip')
# countriesSP <- rworldmap::getMap(resolution = "high")
# or
countriesSP <- rgdal::readOGR(dsn = ".", layer = "eez_boundaries")
r.pts <- sp::SpatialPoints(r.pts, proj4string = sp::CRS(proj4string(countriesSP)))
indices <- over(r.pts, countriesSP)
indices$ADMIN