County area calculated from NLCD (Landcover data) rasters is too large - r

I'm trying to calculate landcover repartition for each US county.
I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.
The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.
The following describes the code used and the method used to calculate. The county data can be downloaded here:
https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/
The following code assumes the county shapefile is saved and unzipped.
Get and read the data
#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)
#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")
# Get NCLD data
nlcd_data <- get_nlcd(apache,
year = 2011,
label = "Apache",
force.redo = TRUE)
nlcd_data #inspect the object, can see that number of cells is around 57 million
I have then extracted the values of the raster and put them into a frequency table. From there I calculate the resulting area. Since the NLCD data is 30m resolution, I multiply the number of cell of each category by 900 and divide the result by 1 million to obtain the area in Square Kilometer.
The calculated total area is too large.
# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells
landcover_freq<-table(landcover)
df_landcover <- as.data.frame(landcover_freq)
res <- df_landcover %>%
mutate(area_type_sqm = Freq*900,
area_type_km=area_type_sqm/1e6,
area_sqkm = sum(area_type_km))%>%
group_by(landcover)%>%
mutate(pc_land =round(100*area_type_km/area_sqkm,1))
head(arrange(res,desc(pc_land)))
# A tibble: 6 x 6
# Groups: landcover [6]
landcover Freq area_type_sqm area_type_km area_sqkm pc_land
<fct> <int> <dbl> <dbl> <dbl> <dbl>
1 52 33455938 30110344200 30110. 51107. 58.9
2 42 16073820 14466438000 14466. 51107. 28.3
3 71 5999412 5399470800 5399. 51107. 10.6
4 31 488652 439786800 440. 51107. 0.9
5 21 362722 326449800 326. 51107. 0.6
6 22 95545 85990500 86.0 51107. 0.2
## Total area calculated from raster is 51,107 square km
apache_area <- as.data.frame(apache) %>%
mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
select(AREA)
apache_area$AREA
29055.47 #Official area of apache county (in square km)
Visual inspection of the shapefile and the raster:
The difference doesn't seem large enough to justify the difference
apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)

The reason is that you get the data returned in the Mercator projection.
crs(nlcd_data)
CRS arguments:
+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=#null +wktext
+no_defs
This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.
The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.
The reported spatial resolution is
res(nlcd_data)
[1] 30 30
So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.
First get the data
library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)
# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))
To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).
f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)
# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)
# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)
The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.
library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(x)
#[1] 30 30
Note that +proj=aea stands for the Albers Equal Area projection!
ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)
par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)
As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.
You could transform the Mercator data back to the original +proj=aea, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.
Finally, to get the area covered by each land cover class
m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)
# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]
Voila:
f[, c("class", "area")]
# class area
#1 Open Water 21.5
#2 Developed, Open Space 175.3
#3 Developed, Low Intensity 46.4
#4 Developed, Medium Intensity 9.7
#5 Developed, High Intensity 1.0
#6 Barren Land 232.5
#7 Deciduous Forest 44.6
#8 Evergreen Forest 7604.6
#9 Mixed Forest 2.0
#10 Shrub/Scrub 18262.8
#11 Herbaceuous 2514.4
#12 Hay/Pasture 1.9
#13 Cultivated Crops 0.0
#14 Woody Wetlands 38.8
#15 Emergent Herbaceuous Wetlands 53.6
And the total is as expected
sum(f$area)
#[1] 29009.1
PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.

Y'all, thanks so much for catching this, and thanks Robert for alerting me on Github! I hadn't realized that WCS was serving up transformed data. Luckily, the MRLC has made the data available in its native format as well, so I've gone ahead and pushed an update to FedData that provides those data (in CONUS Albers).
FWIW (no pictures, because this is my first SO post):
#devtools::install_github("ropensci/FedData")
library(FedData)
library(magrittr)
library(dplyr)
library(tigris)
library(raster)
#Get Apache polygon
apache <-
tigris::counties(state = "AZ") %>%
dplyr::filter(NAME == "Apache")
# Get NCLD data
nlcd_data <-
FedData::get_nlcd(
template = apache,
year = 2011,
label = "Apache",
force.redo = TRUE
)
# Transform Apache polygon to NLCD CRS
apache %<>%
sf::st_transform(raster::crs(nlcd_data))
# Plot NLCD raster and transformed Apache polygon
raster::plot(nlcd_data)
apache %>%
sf::st_geometry() %>%
plot(border = "white", lwd = 2, add = TRUE)
https://i.imgur.com/5b09t7P.png
# Mask NLCD data by Apache County (and plot result)
nlcd_data %<>%
raster::mask(apache)
plot(nlcd_data)
https://i.imgur.com/UinQ0v3.png
# Table of areas in km^2
nlcd_data_table <-
nlcd_data %>%
values() %>%
tibble::tibble(landcover = .) %>%
na.omit() %>%
dplyr::group_by(landcover) %>%
count(name = "freq") %>%
dplyr::mutate(area = (freq * 900) %>%
units::set_units("m^2") %>%
units::set_units("km^2"))
nlcd_data_table
#> # A tibble: 15 x 3
#> # Groups: landcover [15]
#> landcover freq area
#> <int> <int> [km^2]
#> 1 11 24134 21.7206
#> 2 21 194720 175.2480
#> 3 22 51321 46.1889
#> 4 23 10485 9.4365
#> 5 24 1069 0.9621
#> 6 31 259987 233.9883
#> 7 41 48906 44.0154
#> 8 42 8456196 7610.5764
#> 9 43 2179 1.9611
#> 10 52 19941185 17947.0665
#> 11 71 3190650 2871.5850
#> 12 81 2169 1.9521
#> 13 82 4 0.0036
#> 14 90 42628 38.3652
#> 15 95 59014 53.1126
# Total area NLCD raster:
sum(nlcd_data_table$area)
#> 29056.18 [km^2]
# Area of Apache county
apache %>%
sf::st_area() %>%
units::set_units("km^2")
#> 29056.19 [km^2]
Created on 2021-05-26 by the reprex package (v2.0.0)

Related

How to plot coordinates on a spherical shape file correctly?

So I'm trying to plot a bunch of coordinates on the earth and track how many coordinates are in each country. I have plotted the map and coordinates fine, but when I try to use the intersection to count how many coordinates fall within each country (polygon) it results in error. I've tried using the st_make_valid function to fix the earth shape file but it messes up the geometry. I'm new to using R so any help would be greatly appreciated.
I have used the following code to plot the earth shape file and the coordinates on top:
library(tidyverse)
library(sf)
library(rmapshaper)
library(rnaturalearth)
library(rnaturalearthdata)
library(sp)
library(raster)
###############
# Load Data
###############
# Read in data from .csv file
MeteoriteData <- read.csv("C:/Users/ChaseDickson_/Desktop/College/AERO 689/Semester Project/Meteorite Landings.csv")
# Convert these points to an SF object, specifying the X and Y
# column names, and supplying the CRS as 4326 (which is WGS84)
MeteoriteData.sf <- st_as_sf(MeteoriteData, coords=c('long', 'lat'), crs=4326)
world <- (ne_countries(scale = "medium", returnclass = "sf"))
MeteoriteMap <- ggplot(data = world) +
geom_sf() +
geom_sf(data = MeteoriteData.sf, size = 0.5, shape = 23, fill = "darkred") +
theme_bw()
MeteoriteMap
Which gives the following plot
However, when getting the intersection of the code I used this
intersection <- st_intersection(x = world, y = MeteoriteData.sf)
But it gave the error
Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented, :
Loop 96 is not valid: Edge 743 crosses edge 998
To fix this I changed the world sf by adding st_make_valid like so:
world <- st_make_valid(ne_countries(scale = "small", returnclass = "sf"))
Now this allows the intersection function to work as such:
intersection <- st_intersection(x = world, y = MeteoriteData.sf)
int_result <- intersection %>%
group_by(sovereignt) %>%
count()
And the output is recorded shown below
However, this messes the countries (polygons) up in the plot and will give inaccurate data as the new earth shape file is shown below:
Any help figuring out how to maintain the first plot, but still get the intersection function and count to work after adding st_make_valid would be greatly appreciated!
The {rnaturalearth} package has had a long and productive history, but - kind of like the similar {maps} package - it belongs to a different, less demanding era. You should consider doing a Marie Kondo on it: thank it for its service, and let it go.
So instead of trying to repair its failings look for a different instance of world dataset, which is a very common and thus standardized use case.
Consider this piece of code, and note that it is not a single piece of wrong geometry, but 6 (out of 241). Correcting them one by one would be a fruitless task.
library(sf)
rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")|>
st_is_valid() |>
table()
# FALSE TRUE
# 6 235
My preferred source of the world country data is the {giscoR} package, which interfaces GISCO spatial dataset, ultimately maintained by Eurostat.
It is very handy, known to be valid and actively maintained.
giscoR::gisco_get_countries(resolution = "20") |>
st_is_valid() |>
table()
# TRUE
# 257
The rest of your code - the intersection and plotting part - should work just fine once you get rid of invalid geometries.
Just a note regarding st_intersection() (and shamelessly building on top of Jindra's answer.. ) -
this happens to be an extremely expensive method to retrieve hit count, especially if resulting geometries are just a by-product that will be disregarded (and it's expensive for this too, check st_join())
For counting you can save a lot (like ~250ms vs ~16s(!) for 100 random points on my machine) by opting for st_intersects():
library(sf)
library(ggplot2)
countries <- giscoR::gisco_get_countries(resolution = "20")
set.seed(1)
random_points <- data.frame(x = runif(100,-180,180), y = runif(100,-90,90)) |>
st_as_sf(coords = c("x","y"), crs = "WGS84")
#> Measure st_intersection():
system.time({
countries_isect <- st_intersection(countries, random_points)
})
#> user system elapsed
#> 15.97 0.47 16.50
#> Measure st_intersects():
system.time({
countries$hits <- lengths(st_intersects(countries, random_points))
})
#> user system elapsed
#> 0.22 0.00 0.22
ggplot(countries) +
geom_sf(data = countries) +
geom_sf(data = random_points) +
theme_void()
Results, i.e. non-zero lengths(st_intersects(countries, random_points)):
countries[countries$hits > 0,c("NAME_ENGL","hits")] |> st_drop_geometry()
#> NAME_ENGL hits
#> 4 Antarctica 7
#> 13 Australia 1
#> 30 Brazil 4
#> 31 China 3
#> 48 Greenland 1
#> 55 Canada 3
#> 116 Kazakhstan 1
#> 117 Laos 1
#> 125 Cambodia 1
#> 141 Mauritania 1
#> 155 Oman 1
#> 162 Paraguay 1
#> 184 Mali 1
#> 185 Russian Federation 5
#> 221 United States 3
#> 223 Venezuela 1
#> 249 Thailand 1
Created on 2022-11-23 with reprex v2.0.2

Geosphere package distance from point to polygon

I'm trying to use the geosphere package in R to get the distance to a polygon from a set of points that lie outside of that polygon.
The polygon is a shapefile of the Antarctic coastline, found here: https://data.bas.ac.uk/items/e6cf8946-e493-4c36-b4f5-58f7a2ee2a74/ and the points are animal tracking data.
I have tried using the syntax specified in the geosphere documentation (https://www.rdocumentation.org/packages/geosphere/versions/1.5-14/topics/dist2Line)
which is as follows:
dist2Line(p, line, distfun=distGeo)
#my attempt so far:
#libraries
library(rgdal)
library(sf)
library(rgeos)
library(tidyverse)
library(geosphere)
#my points
points <-read.csv("Analyses/example_points.csv") #this is the table included below of 4 example locations.
|ID|LON |LAT |
|--|----------|----------|
|a |-2.515478 |-69.53887 |
|b |-2.601405 |-69.79783 |
|c |-0.153548 |-69.45126 |
|d |26.06987 |-69.55020 |
#my line
line <- <- readOGR('Environmental_Data/COAST/add_coastline_high_res_polygon_v7_5.shp/') #this is the shapefile linked above
#convert points to spatial object
coordinates(points) <- ~LON+LAT
distance <- geosphere::dist2Line(p = points, line = line, distfun = distGEO)
However, I get an error: "Error in .spDistPoint2Line(p, line, distfun) :
Points are projected. They should be in degrees (longitude/latitude)".
The package documentation states that p can be:
"longitude/latitude of point(s). Can be a vector of two numbers, a matrix of 2 columns (first one is longitude, second is latitude) or a SpatialPoints object*" - which is what I'm providing it with. I have seen the same issue encountered on a Reddit post (unanswered) but not on here.
My desired output is as below (distances under distance to coast are made up for now!). I have ~3000 locations I need to find the distance to the coastline for.
ID
LON
LAT
Dist_to_coast (km)
a
-2.515478
-69.53887
40
b
-2.601405
-69.79783
24
c
-0.153548
-69.45126
74
d
26.06987
-69.55020
23
Is there an alternative/better means of doing this?
Thank you.
You have loaded sf, any particular reason for not using sf::st_distance() for the task? Would still need to transform though, as there are 4 sample points vs ~140MB shapefile with ~17000 polygons, points were transformed:
library(ggplot2)
library(dplyr)
library(sf)
coastline <- st_read("add_coastline_high_res_polygon_v7_6.shp/")
p <- readr::read_delim(
"ID|LON |LAT
a |-2.515478 |-69.53887
b |-2.601405 |-69.79783
c |-0.153548 |-69.45126
d |26.06987 |-69.55020" , delim = "|", trim_ws = T) %>%
st_as_sf(coords = c("LON", "LAT"), crs = "WGS84") %>%
# transform points to match crs of the shapefile
st_transform(st_crs(coastline))
# number of different surface polygons
table(coastline$surface)
#>
#> ice shelf ice tongue land rumple
#> 325 37 17233 64
# create a single multipolygon, can take a while;
# you may need to filter first to define any surface types you might want to
# include / exclude ("land" also includes islands)
system.time({
ucoastline <- st_union(coastline)
})
#> user system elapsed
#> 103.40 11.72 116.08
p$dist_ucoastline <- st_distance(p,ucoastline)
# or perhaps select land polygon with max area to
# ignore ice and all the islands:
land_max <- coastline %>%
slice_max(st_area(.))
p$land_max <- st_distance(p,land_max)
ggplot() +
geom_sf(data = st_simplify(ucoastline,dTolerance = 1000), fill = "lightblue", color = NA) +
geom_sf(data = st_simplify(land_max,dTolerance = 1000), fill = "gray70") +
geom_sf(data = p, shape =4, color="red", size = 5) +
theme_bw()
Result:
# convert coordinates back to WGS84,
# geometries to coordinate columns
bind_cols(
st_transform(p, crs = "WGS84") %>% st_coordinates(),
st_drop_geometry(p)
)
#> # A tibble: 4 × 5
#> X Y ID dist_ucoastline[,1] land_max[,1]
#> <dbl> <dbl> <chr> [m] [m]
#> 1 -2.52 -69.5 a 40742. 180479.
#> 2 -2.60 -69.8 b 39750. 157043.
#> 3 -0.154 -69.5 c 6629. 186878.
#> 4 26.1 -69.6 d 45683. 121500.
Created on 2022-11-23 with reprex v2.0.2

Is there an R function to generate a column of U.S county names based on latitude and longitude? [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

How to convert X and Y coordinates into Latitude and longitude?

Following is an example of the data frame I have that was obtained from a publicly available crime data set for St. Louis. The documentation related to the data states that the Xcoord and Ycoord are in
State Plane North American Datum 1983 (NAD83) format
CodedMonth Description XCoord YCoord
1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR 908297.3 1018623.0
2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC 903995.7 1014255.0
3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT 0.0 0.0
4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT 890704.7 1010659.0
5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) 881105.8 1008297.0
6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 882929.6 992941.3
how do I convert these into Xcoord and Ycoord columns into lon and lat format so that I can plot this using ggmap
I have found a couple of answers Convert latitude/longitude to state plane coordinates
But I cant seem to get it to work for my data
You can use the sf package to convert it to a simple features geography.
In order to get this to work you need to know what coordinate system you are working with, and based on the description you provide (State Plane NAD83 and are near St. Louis), My first guess was EPSG 26996 (NAD83 / Missouri East USFT), but that plotted in the middle of lake Huron, so I tried ESRI: 102696. You can look up projections at spatialreference.org.
library(sf)
library(tidyverse)
library(ggmap)
my_df <- read_csv("C:/Users/Brian/Documents/temp.csv")
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = 102696)
This sets the x and y to spatial coordinates. You need to re-project into a geographic system like WGS84 to convert to lat long. st_transform does this for us using crs = 4326, which is the WGS 84 coordinate system
my_latlon_df <- st_transform(my_sf_df, crs = 4326 )
my_latlon_df <- my_latlon_df%>%
mutate( lat= st_coordinates(my_latlon_df)[,1],
lon = st_coordinates(my_latlon_df)[,2])
my_latlon_df
# Simple feature collection with 6 features and 5 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -93.26566 ymin: 35.80151 xmax: -90.19163 ymax: 38.63065
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# # A tibble: 6 x 6
# X1 CodedMonth Description geometry lat lon
# * <chr> <chr> <chr> <POINT [°]> <dbl> <dbl>
# 1 1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR (-90.19163 38.63065) -82.2 44.7
# 2 2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC (-90.20674 38.6187) -82.3 44.7
# 3 3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT (-93.26566 35.80151) -93.3 35.8
# 4 4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT (-90.25329 38.60893) -82.4 44.6
# 5 5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) (-90.2869 38.60251) -82.5 44.6
# 6 6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 (-90.28065 38.56034) -82.5 44.5
We now have geographic coordinates with lat and long as columns of our data frame. The no location information is going to cause problems, since it will plot at the origin of the state plane coordinate plane, which is down in Arkansas somewhere. Let's remove it so we can focus on the good points
# let's exclude point 3 for now
my_latlon_df <- my_latlon_df[-3,]
box <- st_bbox(my_latlon_df) # bounding box
names(box) <- NULL # removing non-complient labels
buffer = .2
box2 <- box + c(-buffer, -buffer, buffer, buffer) # buffering
base_map <- get_map(location = box2, source = "osm") # getting base map
# plotting
ggmap(base_map)+
geom_sf(data = my_latlon_df,
color = "red",
size = 2
)+
scale_x_continuous(limits = c(-90.35, -90.1))+
scale_y_continuous(limits = c(38.5, 38.7))
Unfortunately, if you don't know what coordinate system your x and y points are in, it can become a frustrating game of trial and error. The projected coordinate systems basically create a Cartesian plane on the surface of the globe, and the choice of origin, scale and other parameters are specific to each projection. There isn't nearly as much difference in geographic coordinate systems such as WGS84.
The correct geographic system/projection is "ESRI:102696" so the code should read:
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = "ESRI:102696" )

Determine whether coordinates fall within geographic region [duplicate]

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values
If not in R is there any way to do this using google geocoder or any other type of fast querying service?
Thanks!
Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.
Method 1 (using sf):
library(sf)
library(spData)
## pointsDF: A data.frame whose first column contains longitudes and
## whose second column contains latitudes.
##
## states: An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
## names.
lonlat_to_state <- function(pointsDF,
states = spData::us_states,
name_col = "NAME") {
## Convert points data.frame to an sf POINTS object
pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)
## Transform spatial data to some planar coordinate system
## (e.g. Web Mercator) as required for geometric operations
states <- st_transform(states, crs = 3857)
pts <- st_transform(pts, crs = 3857)
## Find names of state (if any) intersected by each point
state_names <- states[[name_col]]
ii <- as.integer(st_intersects(pts, states))
state_names[ii]
}
## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon" NA
If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:
library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon" NA
For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:
USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon" NA
Method 2 (using sp):
Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.
Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
lonlat_to_state_sp <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per state (plus DC, minus HI & AK)
states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
states_sp <- map2SpatialPolygons(states, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=WGS84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, states_sp)
# Return the state names of the Polygons object containing each point
stateNames <- sapply(states_sp#polygons, function(x) x#ID)
stateNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
You can do it in a few lines of R.
library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
See ?over in the sp package.
You'll need to have the state boundaries as a SpatialPolygonsDataFrame.
Example data (polygons and points)
library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)
Use raster::extract
extract(p, xy)
# point.ID poly.ID ID_1 NAME_1 ID_2 NAME_2 AREA
#1 1 1 1 Diekirch 1 Clervaux 312
#2 2 2 1 Diekirch 2 Diekirch 218
#3 3 3 1 Diekirch 3 Redange 259
#4 4 4 1 Diekirch 4 Vianden 76
#5 5 5 1 Diekirch 5 Wiltz 263
#6 6 6 2 Grevenmacher 6 Echternach 188
#7 7 7 2 Grevenmacher 7 Remich 129
#8 8 8 2 Grevenmacher 12 Grevenmacher 210
#9 9 9 3 Luxembourg 8 Capellen 185
#10 10 10 3 Luxembourg 9 Esch-sur-Alzette 251
#11 11 11 3 Luxembourg 10 Luxembourg 237
#12 12 12 3 Luxembourg 11 Mersch 233
It's very straightforward using sf:
library(maps)
library(sf)
## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))
#.. and perform a spatial join!
st_join(testPoints, US)
ID geometry
1 wisconsin POINT (-90 44)
2 oregon POINT (-120 44)

Resources