Following is an example of the data frame I have that was obtained from a publicly available crime data set for St. Louis. The documentation related to the data states that the Xcoord and Ycoord are in
State Plane North American Datum 1983 (NAD83) format
CodedMonth Description XCoord YCoord
1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR 908297.3 1018623.0
2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC 903995.7 1014255.0
3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT 0.0 0.0
4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT 890704.7 1010659.0
5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) 881105.8 1008297.0
6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 882929.6 992941.3
how do I convert these into Xcoord and Ycoord columns into lon and lat format so that I can plot this using ggmap
I have found a couple of answers Convert latitude/longitude to state plane coordinates
But I cant seem to get it to work for my data
You can use the sf package to convert it to a simple features geography.
In order to get this to work you need to know what coordinate system you are working with, and based on the description you provide (State Plane NAD83 and are near St. Louis), My first guess was EPSG 26996 (NAD83 / Missouri East USFT), but that plotted in the middle of lake Huron, so I tried ESRI: 102696. You can look up projections at spatialreference.org.
library(sf)
library(tidyverse)
library(ggmap)
my_df <- read_csv("C:/Users/Brian/Documents/temp.csv")
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = 102696)
This sets the x and y to spatial coordinates. You need to re-project into a geographic system like WGS84 to convert to lat long. st_transform does this for us using crs = 4326, which is the WGS 84 coordinate system
my_latlon_df <- st_transform(my_sf_df, crs = 4326 )
my_latlon_df <- my_latlon_df%>%
mutate( lat= st_coordinates(my_latlon_df)[,1],
lon = st_coordinates(my_latlon_df)[,2])
my_latlon_df
# Simple feature collection with 6 features and 5 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -93.26566 ymin: 35.80151 xmax: -90.19163 ymax: 38.63065
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# # A tibble: 6 x 6
# X1 CodedMonth Description geometry lat lon
# * <chr> <chr> <chr> <POINT [°]> <dbl> <dbl>
# 1 1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR (-90.19163 38.63065) -82.2 44.7
# 2 2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC (-90.20674 38.6187) -82.3 44.7
# 3 3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT (-93.26566 35.80151) -93.3 35.8
# 4 4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT (-90.25329 38.60893) -82.4 44.6
# 5 5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) (-90.2869 38.60251) -82.5 44.6
# 6 6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 (-90.28065 38.56034) -82.5 44.5
We now have geographic coordinates with lat and long as columns of our data frame. The no location information is going to cause problems, since it will plot at the origin of the state plane coordinate plane, which is down in Arkansas somewhere. Let's remove it so we can focus on the good points
# let's exclude point 3 for now
my_latlon_df <- my_latlon_df[-3,]
box <- st_bbox(my_latlon_df) # bounding box
names(box) <- NULL # removing non-complient labels
buffer = .2
box2 <- box + c(-buffer, -buffer, buffer, buffer) # buffering
base_map <- get_map(location = box2, source = "osm") # getting base map
# plotting
ggmap(base_map)+
geom_sf(data = my_latlon_df,
color = "red",
size = 2
)+
scale_x_continuous(limits = c(-90.35, -90.1))+
scale_y_continuous(limits = c(38.5, 38.7))
Unfortunately, if you don't know what coordinate system your x and y points are in, it can become a frustrating game of trial and error. The projected coordinate systems basically create a Cartesian plane on the surface of the globe, and the choice of origin, scale and other parameters are specific to each projection. There isn't nearly as much difference in geographic coordinate systems such as WGS84.
The correct geographic system/projection is "ESRI:102696" so the code should read:
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = "ESRI:102696" )
Related
i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.
You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…
here my R code link about coords of municipalita
dataset with coords of stops
I have a problem with my R code, because I have a error and I don't know how fix it.
I have 2 datasets:
a network of bus stops of a city (I have longitude and latitude for every stops) and I import the dataset st_as_sf.
township of the city. I have imported the shapefile with the sf library---the same format as the stops dataset.
The I'm getting is :
Error in st_geos_binop("intersects", x, y, sparse = sparse, prepared = prepared, :
st_crs(x) == st_crs(y) is not TRUE
And here is my code:
# MUNICIPALITA --------------
library("rgdal")
library("raster")
library("sf")
#in questo file provo leggere con la libreria sp e non sf i file shape
#f=system.file("C:/Users/CELESTE/Desktop/ambiti_amm/A_SCOM.shp",package = "sf")
municipalita=shapefile("C:/Users/CELESTE/Desktop/ambiti_amm/A_SCOM.shp")
municipalita_df=as.data.frame(municipalita)
municipalita_df=municipalita_df[,-c(3:5,8:20)]
point.in.polygon()
municipalita_sf=st_as_sf(municipalita, crs = 4326)
municipalita_sp=as_Spatial(municipalita_sf)
class(municipalita)
municipalita$SCOM_COD=as.numeric(municipalita$SCOM_COD)
municipalita$A_SCOM_TY=as.numeric(municipalita$A_SCOM_TY)
municipalita=subset(municipalita,municipalita[,1]==6)
m=as.data.frame(m)
municipalita=municipalita[,-1]
municipalita=municipalita[-c(2,3,5,7:9),]
m=st_as_sf(municipalita)
mappa_municipalita=mapview(m)
mappa_municipalita
colnames(m)[1]="Quartieri"
colnames(m)[2]="ID_quartiere"
colnames(m)[3]="ID_municipalita"
mapview(m, zcol = "Quartieri")
mapview(m,zcol="ID_quartiere")
mapview(m,zcol="ID_municipalita")
colnames(m$)
class(valori_unici_sp)
##########################
install.packages("spatialEco")
library("spatialEco")
pip=point.in.poly(valori_unici_sp,municipalita_sp)
It is hard to be 100% certain without having access to your data, but it seems that you did not assign a CRS to your valori_unici_sp object (or if you did it was not EPSG:4326 like with your shapefile).
Note that:
when doing point-in-polygon operation both objects need to be in the same CRS - in my case 4326, but any would do as long as it is the same for both objects; this mismatch is the root cause of your issue
I suggest you use sf::st_join() to link the points to polygons, with setting left = FALSE, meaning the join operation will not be one sided (left join in SQL speak) but filtering (inner join in SQL speak)
For a reproducible example addressing your use case consider this code; it uses the nc.shp shapefile (available in all {sf} installations) and three semi random North Carolina cities. It then links the cites to the shapefile to get county data.
library(sf)
# included with sf package
shape <- st_read(system.file("shape/nc.shp", package="sf")) %>%
st_transform(4326)
# three semi random cities
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# the action is here!
result <- st_join(cities, shape, left = F)
# check the structure of outcome
result
Simple feature collection with 3 features and 15 fields
geometry type: POINT
dimension: XY
bbox: xmin: -79.81944 ymin: 34.22333 xmax: -77.91222 ymax: 36.08
geographic CRS: WGS 84
name AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74 BIR79 SID79
1 Raleigh 0.219 2.130 1938 1938 Wake 37183 37183 92 14484 16 4397 20857 31
2 Greensboro 0.170 1.680 1903 1903 Guilford 37081 37081 41 16184 23 5483 20543 38
3 Wilmington 0.042 0.999 2238 2238 New Hanover 37129 37129 65 5526 12 1633 6917 9
NWBIR79 geometry
1 6221 POINT (-78.63333 35.76667)
2 7089 POINT (-79.81944 36.08)
3 2100 POINT (-77.91222 34.22333)
I have two sf objects: polygon county (note: this is a multiple polygon, i.e. many counties) and points monitor2.
The county looks like below. Chinese characters cannot be displayed properly, but it's not a big deal.
Simple feature collection with 6 features and 4 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 113.15 ymin: 20.58265 xmax: 124.5656 ymax: 40.10793
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
City District Province Code geometry
1 <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> <U+53F0><U+6E7E><U+7701> 710000 MULTIPOLYGON (((116.7346 20...
2 <U+5317><U+4EAC><U+5E02> <U+671D><U+9633><U+533A> <U+5317><U+4EAC><U+5E02> 110105 MULTIPOLYGON (((116.4834 40...
3 <U+4E0A><U+6D77><U+5E02> <U+666E><U+9640><U+533A> <U+4E0A><U+6D77><U+5E02> 310107 MULTIPOLYGON (((121.3562 31...
4 <U+4E0A><U+6D77><U+5E02> <U+5B9D><U+5C71><U+533A> <U+4E0A><U+6D77><U+5E02> 230506 MULTIPOLYGON (((121.4855 31...
5 <U+5E7F><U+5DDE><U+5E02> <U+767D><U+4E91><U+533A> <U+5E7F><U+4E1C><U+7701> 440111 MULTIPOLYGON (((113.4965 23...
6 <U+798F><U+5DDE><U+5E02> <U+9F13><U+697C><U+533A> <U+798F><U+5EFA><U+7701> 320106 MULTIPOLYGON (((119.2611 26...
The monitor2 looks like below.
Simple feature collection with 6 features and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 116.17 ymin: 39.8673 xmax: 116.473 ymax: 40.2865
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
# A tibble: 6 x 6
code name city ref value geometry
<chr> <chr> <chr> <chr> <dbl> <POINT [°]>
1 1001A 万寿西宫 北京 N 47.8 (116.366 39.8673)
2 1002A 定陵 北京 Y 45.9 (116.17 40.2865)
3 1003A 东四 北京 N 42.2 (116.434 39.9522)
4 1004A 天坛 北京 N 51.2 (116.434 39.8745)
5 1005A 农展馆 北京 N 46.9 (116.473 39.9716)
6 1006A 官园 北京 N 49.5 (116.361 39.9425)
The first task is to join the value feature in monitor2 to county. I did this with st_is_within_distance and st_join. See the code below. I set distance to be 50 km. Some counties in the new polygon may have values from multiple points within the 50 km buffer.
new = st_join(county, monitor2,
join = st_is_within_distance, dist = 50)
Here comes the second task. I need to aggregate values from different points within that 50 km buffer by their distances to the centroid of the county. How do I achieve this task?
Any comments are welcome.
It's difficult to know exactly what you want without reproducible data, but here's an attempt to show how you can do this.
Get sample data. We reproject here from lat/long to something with metres so we can do distance based spatial operations. We'll use 3 counties from the sample data and use the middle county as the main polygon we want to measure distances from and add a random sample of points scattered across the three counties.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, 32119) # NC state plane projection in metres
county = st_cast(nc[2,],"POLYGON")
p1 = st_as_sf(st_sample(nc[1:3, ], 200)) # random points
# Visualize
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
We want to focus only on points within some distance from our target county. Let's see what that looks like by adding a buffer using st_buffer.
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
We can subset the points within 10000m of the central county by using st_is_within_distance which would accomplish the same as doing an intersect with the st_buffer object.
p1_10 <- p1[st_is_within_distance(county,p1,dist = 10000, sparse = FALSE),]
Measuring distance between the centroid and each element of this subset is straight forward. We can then assign the distance measurement as a variable in the subset spatial object.
p1_10$distance_to_centroid <- as.vector(st_distance(st_centroid(county), p1_10))
Here's what that looks like plotted altogether
plot(st_geometry(nc)[1:3])
plot(county, col = "grey80", add = TRUE)
plot(p1, add = TRUE, pch = 19)
plot(st_buffer(county, dist = 10000), col = NA, border = "red", lty = 3, add = TRUE)
plot(st_centroid(county), col = "red", pch = 15, cex = 1, axes = TRUE, add = TRUE)
plot(p1_10["distance_to_centroid"], add = TRUE, pch = 19)
This is what the p1_10 obj looks like here:
> p1_10
Simple feature collection with 78 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 389967.6 ymin: 293489.4 xmax: 448197.1 ymax: 315140.7
CRS: EPSG:32119
First 10 features:
x distance_to_centroid
1 POINT (437228.1 294079.7) 21703.5425
2 POINT (425029.8 305656.7) 5868.4917
3 POINT (425131.4 309137.8) 6665.0253
4 POINT (409851.2 294971.7) 14549.0585
5 POINT (393070.6 303879.7) 26207.5651
6 POINT (436666.3 296282.2) 20070.5879
7 POINT (442623.8 295976.3) 25549.5662
8 POINT (400517.2 307897.4) 18746.6918
9 POINT (418763.7 306728) 724.6165
10 POINT (405001.4 294845.7) 18125.0738
So from here you can aggregate your features by distance using whatever method you want. In dplyr, it's pretty straightforward. Suppose for example here I wanted to aggregate in 5km intervals.
library(dplyr)
p1_10 %>%
mutate(dist_group = ceiling(distance_to_centroid/5000)) %>%
group_by(dist_group) %>%
tally() %>% # stop here if you want the result to retain geography
as_tibble() %>%
select(dist_group, n)
# A tibble: 7 x 2
dist_group n
<dbl> <int>
1 1 7
2 2 15
3 3 22
4 4 13
5 5 11
6 6 9
7 7 1
I have several data frames in R. The first data frame contains the computed convex hull of a set of lat and long coordinates by market (courtesy of chull in R). It looks like this:
MyGeo<- "Part of Chicago & Wisconsin"
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
dat <- data.frame(Longitude, Latitude, MyGeo)
The second has zip codes by their latitude and longitudinal coordinates (courtesy of the US census website). It looks like this:
CensuseZip <- c("SomeZipCode1","SomeZipCode2","SomeZipCode3","SomeZipCode4","SomeZipCode5","SomeZipCode6","SomeZipCode7")
Longitude2 <- c(-131.470425,-133.457924,-131.693453,-87.64957,-87.99734,-87.895,-88.0228)
Latitude2 <- c(55.138352,56.239062,56.370538,41.87485,42.0086,42.04957,41.81055)
cen <- data.frame(Longitude2, Latitude2, CensuseZip)
Now I believe the first data table provides me with a polygon, or a border, that I should be able to use to identify zip codes that fall within that border. Ideally, I would want to create a third data table that looks something like this:
Longitude2 Latitude2 CensusZip MyGeo
-131.470425 55.138352 SomeZipCode1
-133.457924 56.239062 SomeZipCode2
-131.693453 56.370538 SomeZipCode3
-87.64957 41.87485 SomeZipCode4 Part of Chicago & Wisconsin
-87.99734 42.0086 SomeZipCode5 Part of Chicago & Wisconsin
-87.895 42.04957 SomeZipCode6 Part of Chicago & Wisconsin
-88.0228 41.81055 SomeZipCode7 Part of Chicago & Wisconsin
In essence, I am looking to identify all the zip codes that fall between the blue (see clickable image below) long and lat points. While it is visualized below, I am actually looking for the table described above.
However... I am having trouble doing this... I have tried using the below packages and script:
library(rgeos)
library(sp)
library(rgdal)
coordinates(dat) <- ~ Longitude + Latitude
coordinates(cen) <- ~ Longitude2 + Latitude2
over(cen, dat)
but I receive all NAs.
I use library(sf) to solve this type of point-in-polygon problem (sf is the successor to sp).
The function sf::st_intersection() gives you the intersection of two sf objects. In your case you can construct separate POLYGON and POINT sf objects.
library(sf)
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
## closing the polygon
Longitude[length(Longitude) + 1] <- Longitude[1]
Latitude[length(Latitude) + 1] <- Latitude[1]
## construct sf POLYGON
sf_poly <- sf::st_sf( geometry = sf::st_sfc( sf::st_polygon( x = list(matrix(c(Longitude, Latitude), ncol = 2)))) )
## construct sf POINT
sf_points <- sf::st_as_sf( cen, coords = c("Longitude2", "Latitude2"))
sf::st_intersection(sf_points, sf_poly)
# Simple feature collection with 4 features and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -88.0228 ymin: 41.81055 xmax: -87.64957 ymax: 42.04957
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry
# 4 SomeZipCode4 POINT (-87.64957 41.87485)
# 5 SomeZipCode5 POINT (-87.99734 42.0086)
# 6 SomeZipCode6 POINT (-87.895 42.04957)
# 7 SomeZipCode7 POINT (-88.0228 41.81055)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
The result is all the points which are inside the polygon
You can also use sf::st_join(sf_poly, sf_points) to give the same result
And, the function sf::st_intersects(sf_points, sf_poly) will return a list saying whether the given POINT is inside the polygon
sf::st_intersects(sf_points, sf_poly)
# Sparse geometry binary predicate list of length 7, where the predicate was `intersects'
# 1: (empty)
# 2: (empty)
# 3: (empty)
# 4: 1
# 5: 1
# 6: 1
# 7: 1
Which you can use as an index / identifier of the original sf_points object to add a new column on
is_in <- sf::st_intersects(sf_points, sf_poly)
sf_points$inside_polygon <- as.logical(is_in)
sf_points
# Simple feature collection with 7 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -133.4579 ymin: 41.81055 xmax: -87.64957 ymax: 56.37054
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry inside_polygon
# 1 SomeZipCode1 POINT (-131.4704 55.13835) NA
# 2 SomeZipCode2 POINT (-133.4579 56.23906) NA
# 3 SomeZipCode3 POINT (-131.6935 56.37054) NA
# 4 SomeZipCode4 POINT (-87.64957 41.87485) TRUE
# 5 SomeZipCode5 POINT (-87.99734 42.0086) TRUE
# 6 SomeZipCode6 POINT (-87.895 42.04957) TRUE
# 7 SomeZipCode7 POINT (-88.0228 41.81055) TRUE
I have a relatively large number of coordinates for which I'd like to get the census tract (in addition to the FIPS code). I know that I can look up individual lat/lon pairs using call_geolocator_latlon (as done here), but this seems impractical for my purposes as the function issues a single call to the census bureaus' API, and I imagine would take a very long time to run on my ~200,000 pairs.
Is there a faster way to do this, perhaps by downloading shapefiles for each state using the block_groups function and mapping from lat/lon to census tract from there?
This doesn't use tigris, but utilizes sf::st_within() to check a data frame of points for overlapping tracts.
I'm using tidycensus here to get a map of California's tracts into R.
library(sf)
ca <- tidycensus::get_acs(state = "CA", geography = "tract",
variables = "B19013_001", geometry = TRUE)
Now to sim some data:
bbox <- st_bbox(ca)
my_points <- data.frame(
x = runif(100, bbox[1], bbox[3]),
y = runif(100, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca))
I'm doing 100 points here to be able to ggplot() the results, but the overlap calculation for 1e6 is fast, only a few seconds on my laptop.
my_points$tract <- as.numeric(st_within(my_points, ca)) # this is fast for 1e6 points
The results:
head(my_points) # tract is the row-index for overlapping census tract record in 'ca'
# but part would take forever with 1e6 points
library(ggplot2)
ggplot(ca) +
geom_sf() +
geom_sf(data = my_points, aes(color = is.na(tract)))
Great answer above. To get Census tract IDs you could also use st_join(). NAs for the tract IDs are those points that are within California's bounding box but don't intersect the state itself.
library(tigris)
library(tidyverse)
library(sf)
ca_tracts <- tracts("CA", class = "sf") %>%
select(GEOID, TRACTCE)
bbox <- st_bbox(ca_tracts)
my_points <- data.frame(
x = runif(200000, bbox[1], bbox[3]),
y = runif(200000, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca_tracts))
my_points_tract <- st_join(my_points, ca_tracts)
> my_points_tract
Simple feature collection with 200000 features and 2 fields
geometry type: POINT
dimension: XY
bbox: xmin: -124.4819 ymin: 32.52888 xmax: -114.1312 ymax: 42.0095
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
First 10 features:
GEOID TRACTCE geometry
1 06025012400 012400 POINT (-114.6916 33.42711)
2 <NA> <NA> POINT (-118.4255 41.81896)
3 06053990000 990000 POINT (-121.8154 36.22736)
4 06045010200 010200 POINT (-123.6909 39.70572)
5 <NA> <NA> POINT (-116.9055 37.93532)
6 06019006405 006405 POINT (-119.511 37.09383)
7 06049000300 000300 POINT (-120.7215 41.3392)
8 <NA> <NA> POINT (-115.8916 39.32392)
9 06023990100 990100 POINT (-124.2737 40.14106)
10 06071008901 008901 POINT (-117.319 35.62759)