Calculate longitude/latitude for geographic centroid - r

I want to do some spatial statistic analysis with the county-level crop yield data in Nebraska for the STAT class. For that I need the longitude and latitude of the geographic centroids of each county. Anybody know how to do it in R? I know it can be done in ArcGIS but I have no access to it now.

You didn't give any details where you got your shapefile from, but I got one from here and you can use gCentroid from rgeos thusly:
library(rgdal)
library(sp)
library(rgeos)
nebraska <- readOGR("CountyBoundsUTM/", "CountyUTM")
gCentroid(nebraska, byid=TRUE)
## SpatialPoints:
## x y
## 0 721768.5 4636738
## 1 430938.8 4524651
## 2 698036.4 4566570
## 3 370970.6 4641340
## ...
## 89 623301.6 4603228
## 90 618883.0 4486931
## 91 439295.3 4582756
## 92 493680.8 4522680
## Coordinate Reference System (CRS) arguments: +proj=utm +zone=14 +datum=NAD83
## +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0

You can also extract centroids of SpatialPolygons* objects with coordinates, though the centroids won't be returned as SpatialPoints as with rgeos::gCentroid.
For example:
library(rgdal)
download.file('http://dds.cr.usgs.gov/pub/data/nationalatlas/countyp020_nt00009.tar.gz',
f <- tempfile()) # ~ 4.5 Mb
untar(f, exdir=tempdir())
counties <- readOGR(tempdir(), 'countyp020')
xy <- coordinates(counties)
head(xy)
# [,1] [,2]
# 0 -153.3905 69.30193
# 1 -156.0582 71.33094
# 2 -155.6695 71.24763
# 3 -155.5164 71.23148
# 4 -155.1846 71.18189
# 5 -155.6126 71.00725
Note that, as pointed out by #Spacedman in the comments, the polygons should be projected to a planar coordinate system first.

Recent versions of the sf package (I think since version 1) use the S2 library from Google for spherical geometry calculations. The advantage is that centroid calculations are not simply planar. The relevant method is st_centroid(). An example for a region with a significant spatial extent:
library(rnaturalearth)
# ne_countries() returns 'sp'-type data by default
nc <- ne_countries(continent = "Asia", returnclass = "sf")
library(sf)
# long-lat data in WGS84
st_crs(nc)
# use st_geometry() to plot only the polygons and not the associated data
plot(st_geometry(nc), axes = T)
plot(st_centroid(st_geometry(nc)), pch = "+", col = "red", add = T)
# 'sf' integrates nicely with 'ggplot2':
library(ggplot2)
ggplot(nc) + geom_sf() +
geom_sf(aes(geometry = st_centroid(st_geometry(nc))), colour = "red")

You can use the get_map() function from the ggplot2 package to extract the US county map data from the maps package to a dataframe. Then you can calculate the mid points of the ranges of the lat/lon columns by county (or whatever method you want to use to define geographic center).

Related

Convert SpatialPolygonsDataFrame to projected coordinates using spTransform

Im trying to do a point pattern analysis. To do this I have to convert a SpatialPolygonsDataFrame so it contains projected coordinates instead of curved coordinates. However I keep getting the same error:
Error in as.owin.SpatialPolygons(Netherlands_flat) :
Only projected coordinates may be converted to spatstat class objects
this is the data I used for a border:
download.file("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip",destfile="ne_10m_admin_1_states_provinces.zip")
unzip("ne_10m_admin_1_states_provinces.zip",exdir="NaturalEarth")
border <- shapefile("NaturalEarth/ne_10m_admin_1_states_provinces.shp")
#extract the border of the Netherlands
Netherlands <- border[paste(border$iso_a2)=="NL",]
Im able to plot the plot the Netherlands with the events.
#Plot
plot(babesia$Longitude, babesia$Latitude, pch="+",cex=0.5, xlim=c(3.360782, 7.227095), ylim = c(50.723492, 53.554584))
plot(Netherlands, add = T)
Netherlands with events
But upon using the Spatstat package I keep running into this error.
I tried this code to transform the coordinates
coord_netherlands <- coordinates(Netherlands)
proj4string(Netherlands)
summary(Netherlands)
Netherlands_flat <- spTransform(coord_netherlands, CRS("+proj=longlat +datum=WGS84 +no_defs"))
Netherlands <- as.owin(Netherlands_flat)
Error in as.owin.SpatialPolygons(Netherlands_flat) :
Only projected coordinates may be converted to spatstat class objects
Does anyone know how to solve this? Thank you very much in advance!
You are almost there. You just need to project to another coordinate system when you call spTransform. You currently ask for geographic coordinates on a spheriod model of the earth (long,lat). Instead you should ask for a flat (x,y) coordinate system. This could be utm coordinates in the appropriate zone for the Netherlands or there might well be a better alternative. Your events also need to be transformed from (long,lat) to the same coordinate system. Maybe you can look at the shapefile vignette of the spatstat package for an example. Or look under the spatstat tag on this site. I'm on my phone do I can't give detailed help.
Good luck.
If your events are in a data.frame called xy you can project to UTM zone 31N like this:
xy <- data.frame(lon = 1:2, lat = 1:2)
coordinates(xy) <- ~lon+lat
proj4string(xy) <- CRS("+proj=longlat +datum=WGS84")
xy
# SpatialPoints:
# lon lat
# [1,] 1 1
# [2,] 2 2
# Coordinate Reference System (CRS) arguments: +proj=longlat +datum=WGS84
# +ellps=WGS84 +towgs84=0,0,0
events.utm <- spTransform(xy, CRS("+proj=utm +zone=31N +datum=WGS84"))
events.utm
# SpatialPoints:
# lon lat
# [1,] 277438.3 110598.0
# [2,] 388786.7 221094.9
# Coordinate Reference System (CRS) arguments: +proj=utm +zone=31N
# +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

create density raster and extract sum by polygon feature

I have a polygon (zones) and a set of coordinates (points). I'd like to create a spatial kernal density raster for the entire polygon and extract the sum of the density by zone. Points outside of the polygon should be discarded.
library(raster)
library(tidyverse)
library(sf)
library(spatstat)
library(maptools)
load(url("https://www.dropbox.com/s/iv1s5butsx2v01r/example.RData?dl=1"))
# alternatively, links to gists for each object
# https://gist.github.com/ericpgreen/d80665d22dfa1c05607e75b8d2163b84
# https://gist.github.com/ericpgreen/7f4d3cee3eb5efed5486f7f713306e96
ggplot() +
geom_sf(data = zones) +
geom_sf(data = points) +
theme_minimal()
I tried converting to ppp with {spatstat} and then using density(), but I'm confused by the units in the result. I believe the problem is related to the units of the map, but I'm not sure how to proceed.
Update
Here's the code to reproduce the density map I created:
zones_owin <- as.owin(as_Spatial(zones))
pts <- st_coordinates(points)
p <- ppp(pts[,1], pts[,2], window=zones_owin, unitname=c("metre","metres"))
ds <- density(p)
r <- raster(ds)
plot(r)
Units are difficult when you work directly with geographic coordinates (lon, lat). If possible you should convert to planar coordinates (which is a requirement for spatstat) and proceed from there. The planar coordinates would typically be in units of meters, but I guess it depends on the specific projection and underlying ellipsoid etc. You can see this answer for how to project to planar coordinates with sf and export to spatstat format using maptools. Note: You have to manually choose a sensible projection (you can use http://epsg.io to find one) and you have to project both the polygon and the points.
Once everything is in spatstat format you can use density.ppp to do kernel smoothing. The resulting grid values (object of class im) are intensities of points, i.e., number of points per square unit (e.g. square meter). If you want to aggregate over some region you can use integral.im(..., domain = ...) to get the expected number of points in this region for a point process model with the given intensity.
I'm not sure if this answers all of your question, but should be a good start. Clarify in a comment or in your question should you need a different type of output.
It removes all points that are not inside one of the 'zone' polygons, counts them by zone and plots the zones colored by the number of points that fall within.
library(raster)
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(spatstat)
library(maptools)
#> Checking rgeos availability: TRUE
load(url("https://www.dropbox.com/s/iv1s5butsx2v01r/example.RData?dl=1"))
# alternatively, links to gists for each object
# https://gist.github.com/ericpgreen/d80665d22dfa1c05607e75b8d2163b84
# https://gist.github.com/ericpgreen/7f4d3cee3eb5efed5486f7f713306e96
p1 <- ggplot() +
geom_sf(data = zones) +
geom_sf(data = points) +
theme_minimal()
#Remove points outside of zones
points_inside <- st_intersection(points, zones)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
nrow(points)
#> [1] 308
nrow(points_inside)
#> [1] 201
p2 <- ggplot() +
geom_sf(data = zones) +
geom_sf(data = points_inside)
points_per_zone <- st_join(zones, points_inside) %>%
count(LocationID.x)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
p3 <- ggplot() +
geom_sf(data = points_per_zone,
aes(fill = n)) +
scale_fill_viridis_c(option = 'C')
points_per_zone
#> Simple feature collection with 4 features and 2 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: 34.0401 ymin: -1.076718 xmax: 34.17818 ymax: -0.9755066
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +ellps=WGS84 +no_defs
#> # A tibble: 4 x 3
#> LocationID.x n geometry
#> * <dbl> <int> <POLYGON [°]>
#> 1 10 129 ((34.08018 -0.9755066, 34.0803 -0.9757393, 34.08046 -0.975…
#> 2 20 19 ((34.05622 -0.9959458, 34.05642 -0.9960835, 34.05665 -0.99…
#> 3 30 29 ((34.12994 -1.026372, 34.12994 -1.026512, 34.12988 -1.0266…
#> 4 40 24 ((34.11962 -1.001829, 34.11956 -1.002018, 34.11966 -1.0020…
cowplot::plot_grid(p1, p2, p3, nrow = 2, ncol = 2)
It seems I underestimated the difficulty of your problem. Is something like the plot below (& underlying data) what you're looking for?
It uses raster with ~50x50 grid, raster::focal with a window of 9x9 using the mean to interpolate the data.

How can I get correct distances (in meters) from st_distance (sp package) in R using a projection?

I want to calculate the distance between to points. I know there are several ways to do it in R (see here for one example), I thought it would be best to use the st_distance function from the sf package, but when I use a projection different to WGS84 (crs = 4326), I get the distances in decimal degrees and not in meters.
However, when I set the projection to crs = 32718, I get the distance in decimal degrees. Is there a way to convert this to meters (or to get meters in the first place). What I don't understand is why when I set the projection to crs = 4326, I do get the distance in meters.
I included a reproducible example:
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(tidyverse)
library(maptools)
#> Loading required package: sp
#> Checking rgeos availability: TRUE
crs <- CRS("+init=epsg:32718")
df <- tibble::tribble(
~documento, ~cod_mod, ~nlat_ie, ~nlong_ie,
"00004612", 238840, -8.37661, -74.53749,
"00027439", 238758, -8.47195, -74.80497,
"00074909", 502518, -8.83271, -75.21418,
"00074909", 612663, -8.82781, -75.05055,
"00074909", 612812, -8.64173, -74.96442,
"00102408", 237255, -13.4924, -72.9337,
"00102408", 283341, -13.5317, -73.6769,
"00109023", 238717, -9.03639, -75.50947,
"00109023", 238840, -8.37661, -74.53749,
"00109023", 1122464, -8.37855, -74.57039,
"00124708", 238717, -9.03639, -75.50947,
"00124708", 238840, -8.37661, -74.53749,
"00124708", 1122464, -8.37855, -74.57039,
"00186987", 612663, -8.82781, -75.05055,
"00186987", 1121383, -8.36195, -74.57805,
"00237970", 327379, -3.55858, -80.45579,
"00238125", 1137678, -3.6532, -80.4266,
"00238125", 1143577, -3.50163, -80.27616,
"00239334", 1143577, -3.50163, -80.27616,
"00239334", 1372333, -3.6914, -80.2521
)
df_spatial <- df
coordinates(df_spatial) <- c("nlong_ie", "nlat_ie")
proj4string(df_spatial) <- crs
# Now we create a spatial dataframe with coordinates in the average location of each documento
df_mean_location <- df %>%
group_by(documento) %>%
summarize(
mean_long = mean(nlong_ie),
mean_lat = mean(nlat_ie)
)
df_mean_location_spatial <- df_mean_location
coordinates(df_mean_location_spatial) <- c("mean_long", "mean_lat")
proj4string(df_mean_location_spatial) <- crs
df_spatial_st <- st_as_sf(df_spatial)
df_mean_location_spatial_st <- st_as_sf(df_mean_location_spatial)
distancias1 <- st_distance(df_spatial_st, df_mean_location_spatial_st, by_element = TRUE)
distancias1
#> Units: [m]
#> [1] 0.00000000 0.00000000 0.15248325 4.99880005 0.10219044 5.26515886
#> [7] 5.06614947 7.38054767 7.53880558 7.43549151 1.17475732 0.28396349
#> [13] 0.63815871 4.99880005 0.37683694 7.52071866 7.47784143 0.18844161
#> [19] 0.10677741 0.09564457
When I change the crs <- CRS("+init=epsg:4326"), I do get the correct results (in meters):
[1] 0.00 0.00 16792.18 552085.93 11258.44 581428.01 560043.61 816269.42 834131.40 822686.13 129481.67 31286.98 70373.13 552085.93
[15] 41565.46 832000.85 827230.50 20928.56 11835.41 10577.04
EPSG 32718 is a cartesian coordinate reference system in metres. By assigning that CRS to a data set, you are saying "these numbers are metres, and the origin is not at (0,0) degrees (where equator meets Greenwich meridian) but at the origin of zone 18 of the UTM system". So you get a distance in metres.
EPSG 4326 is a lat-long reference system with a particular shape of ellipsoid earth. The coordinates are lat-long degrees. st_distance spots this and works out the great circle distance between points based on the ellipsoid. If you want the distance in decimal degrees then assign an NA CRS and you'll get unitless distances, which are the pythagorean distances in lat-long (and so very wrong in real terms near the poles, for example).

how to merge a shapefile with a dataframe with latitude/longitude data

I am struggling with the following issue
I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf with a simple st_read
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.
In essence something similar to
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.
I think there is an easy way to do it, but I was not able to find it.
Thanks!
This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:
Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.
For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0).
I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.
EDIT: Simplified group_by + mutate with add_count:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n includes the total number of points per FIPS code.
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.
HT: st_join approach from this answer.

Clip spatial polygon by world map in R

This is my first time doing any sort of spatial data visualization in R, and I'm stuck on a particular issue. I would like to clip a spatial polygon (specified by a series of lat/long coordinates) according to a world map, such that any part of the polygon which overlaps with a map polygon is removed. Using what I have in the below code as an example, I want to clip the rectangular spatial polygon so that only oceanic portions of the polygon remain.
I've found examples of how to retain the intersection between two spatial polygons, but I want to do the opposite. Perhaps there is a way to define the intersection, then "subtract" that from the polygon I wish to clip?
This might be a really basic question, but any tips will be appreciated!
Specify lat/long data:
x_coord <- c(25, 25, 275, 275)
y_coord <- c(20, -50, -50, 20)
xy.mat <- cbind(x_coord, y_coord)
xy.mat
Convert to spatial polygons object:
library(sp)
poly = Polygon(xy.mat)
polys = Polygons(list(poly),1)
spatial.polys = SpatialPolygons(list(polys))
proj4string(spatial.polys) = CRS("+proj=longlat +datum=WGS84 +no_defs
+ellps=WGS84 +towgs84=0,0,0")
Convert to spatial polygons data frame and export as shapefile:
df = data.frame(f=99.9)
spatial.polys.df = SpatialPolygonsDataFrame(spatial.polys, df)
spatial.polys.df
library(GISTools)
library(rgdal)
writeOGR(obj=spatial.polys.df, dsn="tmp", layer="polygon",
driver="ESRI Shapefile")
Plot world map and add .shp file:
map("world", wrap=c(0,360), resolution=0, ylim=c(-60,60))
map.axes()
shp <- readOGR("polygon.shp")
plot(shp, add=TRUE, col="blue", border=FALSE)
Here is a solution that stays in sf the entire time (I don't know sp), and illustrates constructing an sf object from scratch. st_difference create the geometry you want exactly, and then plotting can be done with the base plot method or the development version of ggplot which has geom_sf. I used map data from maps and rnaturalearth for this, you can adapt to your particular situation. Wrapping around the dateline is a little finicky regardless unfortunately.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
library(rnaturalearth)
library(maps)
#>
#> Attaching package: 'maps'
#> The following object is masked from 'package:purrr':
#>
#> map
x_coord <- c(25, 25, 275, 275)
y_coord <- c(20, -50, -50, 20)
polygon <- cbind(x_coord, y_coord) %>%
st_linestring() %>%
st_cast("POLYGON") %>%
st_sfc(crs = 4326, check_ring_dir = TRUE) %>%
st_sf() %>%
st_wrap_dateline(options = c("WRAPDATELINE=YES", "DATELINEOFFSET=180"))
land <- rnaturalearth::ne_countries(returnclass = "sf") %>%
st_union()
ocean <- st_difference(polygon, land)
#> although coordinates are longitude/latitude, st_difference assumes that they are planar
plot(st_geometry(land))
plot(st_geometry(polygon), add = TRUE)
plot(st_geometry(ocean), add = TRUE, col = "blue")
ggplot() +
theme_bw() +
borders("world") +
geom_sf(data = ocean)
Created on 2018-03-13 by the reprex package (v0.2.0).
If I understand correctly what you want you can do it with the sf package using st_difference() and st_union()`.
Base on your code here is what you can do.
# world data
data("wrld_simpl", package = 'maptools')
# load sf package
library('sf')
# coerce sp object to sf
world <- st_as_sf(wrld_simpl)
rectangle <- st_as_sf(spatial.polys)
# difference between world polygons and the rectangle
difference <- st_difference(rectangle, st_union(world))
# coerce back to sp
difference <- as(difference, 'Spatial')
# plot the result
plot(difference)

Resources