Get Census Tract from Lat/Lon using tigris - r

I have a relatively large number of coordinates for which I'd like to get the census tract (in addition to the FIPS code). I know that I can look up individual lat/lon pairs using call_geolocator_latlon (as done here), but this seems impractical for my purposes as the function issues a single call to the census bureaus' API, and I imagine would take a very long time to run on my ~200,000 pairs.
Is there a faster way to do this, perhaps by downloading shapefiles for each state using the block_groups function and mapping from lat/lon to census tract from there?

This doesn't use tigris, but utilizes sf::st_within() to check a data frame of points for overlapping tracts.
I'm using tidycensus here to get a map of California's tracts into R.
library(sf)
ca <- tidycensus::get_acs(state = "CA", geography = "tract",
variables = "B19013_001", geometry = TRUE)
Now to sim some data:
bbox <- st_bbox(ca)
my_points <- data.frame(
x = runif(100, bbox[1], bbox[3]),
y = runif(100, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca))
I'm doing 100 points here to be able to ggplot() the results, but the overlap calculation for 1e6 is fast, only a few seconds on my laptop.
my_points$tract <- as.numeric(st_within(my_points, ca)) # this is fast for 1e6 points
The results:
head(my_points) # tract is the row-index for overlapping census tract record in 'ca'
# but part would take forever with 1e6 points
library(ggplot2)
ggplot(ca) +
geom_sf() +
geom_sf(data = my_points, aes(color = is.na(tract)))

Great answer above. To get Census tract IDs you could also use st_join(). NAs for the tract IDs are those points that are within California's bounding box but don't intersect the state itself.
library(tigris)
library(tidyverse)
library(sf)
ca_tracts <- tracts("CA", class = "sf") %>%
select(GEOID, TRACTCE)
bbox <- st_bbox(ca_tracts)
my_points <- data.frame(
x = runif(200000, bbox[1], bbox[3]),
y = runif(200000, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca_tracts))
my_points_tract <- st_join(my_points, ca_tracts)
> my_points_tract
Simple feature collection with 200000 features and 2 fields
geometry type: POINT
dimension: XY
bbox: xmin: -124.4819 ymin: 32.52888 xmax: -114.1312 ymax: 42.0095
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
First 10 features:
GEOID TRACTCE geometry
1 06025012400 012400 POINT (-114.6916 33.42711)
2 <NA> <NA> POINT (-118.4255 41.81896)
3 06053990000 990000 POINT (-121.8154 36.22736)
4 06045010200 010200 POINT (-123.6909 39.70572)
5 <NA> <NA> POINT (-116.9055 37.93532)
6 06019006405 006405 POINT (-119.511 37.09383)
7 06049000300 000300 POINT (-120.7215 41.3392)
8 <NA> <NA> POINT (-115.8916 39.32392)
9 06023990100 990100 POINT (-124.2737 40.14106)
10 06071008901 008901 POINT (-117.319 35.62759)

Related

having trouble testing point/geometry intersections in sf

i am trying to figure out how to use sf_intersects() to test whether or not point data that i have falls inside the geometries of some map data i have.
data i'm working with: https://osfm.fire.ca.gov/media/5818/fhszs19sn.zip
other data i'm working with too: https://osfm.fire.ca.gov/media/7564/c19fhszl06_5.zip
for now, i'm just trying to see if this data falls in the polygons of the above shapefile:
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6))
when i put my map data and point data together, this is what it looks like:
so far, so good. now i attempt to test point/geometry intersections. as the figure suggests, i should be able to get quite a few.
# changing coordinate system of map created by shape file
la_fire_sra <- st_transform(st_as_sf(la_fire_sra), crs = 3857)
# merging test points with map data
la_test_points_merged <- st_as_sf(la_test_points, coords = c('y', 'x'), crs = st_crs(la_fire_sra))
# seeing if points fall within any of the geometries in the shapefile
la_test_points_merged <- la_test_points_merged %>%
mutate(intersection = st_intersects(geometry, la_fire_sra))
that last bit is where it all goes wrong. rstudio doesn't throw an error, but when i print la_test_points_merged to see my results, this is what i see:
> la_test_points_merged
Simple feature collection with 1000 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 33.60155 ymin: -118.9959 xmax: 34.79907 ymax: -117.6015
Projected CRS: WGS 84 / Pseudo-Mercator
First 10 features:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
the last line above is in red.
when i try using st_intersection() instead of st_intersects(), i get a different error:
> la_test_points_merged <- la_test_points_merged %>%
+ mutate(intersection = st_intersection(geometry, la_fire_sra))
Error in `stopifnot()`:
! Problem while computing `intersection = st_intersection(geometry, la_fire_sra)`.
x `intersection` must be size 1000 or 1, not 0.
Run `rlang::last_error()` to see where the error occurred.
i would like to end up with a result like this that tells me whether or not each of the points in la_test_points is contained by any of the geometry values in la_fire_sa.
how can i fix this to make my code work? i have looked at lots of other similar questions, but i can't seem to find any answers that apply to my current situation.
thanks in advance for any help.
You can join the points to the shapefile, and the result will show you the fire hazard for each point that falls within a polygon. The default for an st_join is st_intersects, but you can change it if you'd like.
Below I've used one of the shapefiles you linked. If you need to use both you can combine them for a single dataframe with all the polygons. Looks like they have different columns though, so some cleaning might be needed.
library(tidyverse)
library(sf)
set.seed(3) #to make la_test_points reproducible
a <- read_sf('fhszs06_3_19.shp')
# Create synthetic data, make it an sf object, and set the crs
la_test_points <- data.frame(y = runif(1000, 33.6, 34.8), x = runif(1000, -119, -117.6)) %>%
st_as_sf(coords = c('x','y')) %>%
st_set_crs(4326) %>%
st_transform(st_crs(a))
# join the points with the fire hazard area
joined <- st_join(la_test_points, a)
# the sf dataframe, lots of NA's so they're removed for a look:
joined %>% filter(!is.na(HAZ_CODE)) %>% head()
#> Simple feature collection with 6 features and 5 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 125951 ymin: -433789.6 xmax: 177186.8 ymax: -369094
#> Projected CRS: NAD_1983_Albers
#> SRA HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
#> 1 SRA 3 Very High 613618.0 686671532 POINT (163249.3 -395328.4)
#> 2 SRA 3 Very High 250826.8 233414399 POINT (127980.6 -433789.6)
#> 3 SRA 3 Very High 613618.0 686671532 POINT (167675.9 -386506.6)
#> 4 SRA 3 Very High 391522.6 297194108 POINT (143421.2 -369094)
#> 5 SRA 2 High 208122.8 211364977 POINT (177186.8 -388738.9)
#> 6 SRA 3 Very High 613618.0 686671532 POINT (125951 -399105.6)
# Plotting points, colored according to fire hazard code
ggplot() +
geom_sf(data = a) +
geom_sf(data = joined, aes(color = HAZ_CODE)) +
scale_color_gradient(low = 'yellow', high = 'red')
Created on 2022-11-08 with reprex v2.0.2
Edit to address joining the example shapefiles:
# Keeping the columns that the example shapefiles have in common,
# and joining them together.
ax <- a %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
bx <- b %>% select(HAZ_CODE, HAZ_CLASS, Shape_Leng, Shape_Area)
fires <- rbind(ax, bx)
head(fires)
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 151597.9 ymin: -470591.9 xmax: 198216 ymax: -443900.4
Projected CRS: NAD83 / California Albers
# A tibble: 6 × 5
HAZ_CODE HAZ_CLASS Shape_Leng Shape_Area geometry
<int> <chr> <dbl> <dbl> <MULTIPOLYGON [m]>
1 3 Very High 5415. 1355567. (((152996.8 -469302.2, 152996.9 -469302.2, 152965.9 -469339.9, 152957.5 -…
2 3 Very High 2802. 423658. (((153701.7 -468506, 153703.9 -468590.6, 153708 -468758.1, 153707.6 -4687…
3 3 Very High 802. 32272. (((191491 -449977.1, 191494.3 -449973.2, 191517.3 -449946.5, 191521.5 -44…
4 3 Very High 1097. 40800. (((182453.8 -445649.1, 182216.3 -445706.6, 182215.4 -445655.7, 182170.4 -…
5 3 Very High 59226. 9379764. (((198201 -446611.2, 198199.9 -446580, 198199.1 -446551.3, 198200 -446580…
6 3 Very High 1255. 70800. (((186617.7 -444161.6, 186619 -444164.5, 186630.5 -444192.8, 186561.8 -44…

How to write a function to matches values in two dataframes (make a faster version)

I have a dataframe with coordinates of regions of interest, and another dataframe with temperature readings (bio1) taken in research stations, and their coordinates.
I'd like to create a new column to match the region of interest with the temperature of the nearest research station.
I have managed to do this with the following code (here's a simplified fake dataframe pair)
df1 <- data.frame(latitude = c(10.5,6,2), longitude = c(18,9,4))
df2 <- data.frame(vy = c(10,5,3), vx = c(20,10,3), bio1 = c('a','b','c'))
for(i in 1:nrow(df1)){
df1$temperature[i] <- df2$bio1[which(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i]) ==
min(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i])))]
}
So, this code checks all the combinations and choses the one with the smallest distance between latitude and longitude at each row.
I checked and it seems to work, but it's very slow to use on large dataframes.
Can you solve this issue with a faster method?
Something like this might work
library(tidyverse)
library(sf)
# put some id's in df1
df1$id <- LETTERS[1:3]
# make df1 and df2 simple objects
sf1 <- df1 %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
sf2 <- df2 %>%
st_as_sf(coords = c("vy", "vx"), crs = 4326)
# find nearest sf2 in sf1
sf1 %>%
mutate(nearest_bio = sf2$bio1[st_nearest_feature(sf2)])
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 4 ymin: 2 xmax: 18 ymax: 10.5
# Geodetic CRS: WGS 84
# id geometry nearest_bio
# 1 A POINT (18 10.5) b
# 2 B POINT (9 6) c
# 3 C POINT (4 2) b

Checking if a point falls within polygon Shapefile

I have a shapefile about NYC Yellow cab service zones: taxi_zones.shp. It can be download here: https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
I want to check whether certain locations fall into any of the zones. Here is the R code I use:
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- do.call("st_sfc",c(lapply(1:nrow(pnts),
function(i) {st_point(as.numeric(pnts[i, ]))}), list("crs" = 4326)))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
zones <- apply(st_intersects(tt_trans, pnts_trans, sparse = FALSE), 2,
function(col) {
tt_trans[which(col), ]$LocationID
})
The first two points are within the zones defined by the shapefile. However, the third point is not. And the fourth point has incorrect coordinates. How should I modify the code so that for points outside the zones and points with incorrect coordinates, it returns 'NA'?
I have my own approach. Would that fulfill your requirements? I can't tell you what specifically is wrong with your code, but this one is also a bit cleaner:
library(sf)
tt <- read_sf('./Downloads/taxi_zones/taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817, -74.00668, 0, 500),
"y" = c(40.75798, 40.73178, 0, 400)
)
pnts_sf <- st_as_sf(pnts, coords = c('x', 'y'), crs = st_crs(4326))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
pnts_trans <- pnts_sf %>% mutate(
intersection = as.integer(st_intersects( pnts_trans,tt_trans)))
The result would be
geometry intersection
1 POINT (-73.97817 40.75798) 161
2 POINT (-74.00668 40.73178) 158
3 POINT (0 0) NA
4 POINT (500 400) NA
I suggest you consider joining your spatial objects via sf::st_join(), as shown bellow; what it does is that it combines the attributes of your polygon objects and points objects.
The default behaviour is "left" join = points lacking polygons will get NA. It can be tweaked by setting left = FALSE in join parameters, resulting in "inner" join behaviour = points not contained in polygons will be omitted from result.
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- sf::st_as_sf(pnts, coords = c("x", "y"), crs = 4326)
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
res <- sf::st_join(pnts_trans, tt_trans)
print(res)
Simple feature collection with 4 features and 6 fields (with 1 geometry empty)
geometry type: POINT
dimension: XY
bbox: xmin: 2152087 ymin: -130624.1 xmax: 9480615 ymax: 1178046
projected CRS: NAD27 / US National Atlas Equal Area
OBJECTID Shape_Leng Shape_Area zone LocationID borough geometry
1 161 0.03580391 7.191307e-05 Midtown Center 161 Manhattan POINT (2153474 -127064.5)
2 158 0.05480999 1.855683e-04 Meatpacking/West Village West 158 Manhattan POINT (2152087 -130624.1)
3 NA NA NA <NA> NA <NA> POINT (9480615 1178046)
4 NA NA NA <NA> NA <NA> POINT EMPTY

How to convert X and Y coordinates into Latitude and longitude?

Following is an example of the data frame I have that was obtained from a publicly available crime data set for St. Louis. The documentation related to the data states that the Xcoord and Ycoord are in
State Plane North American Datum 1983 (NAD83) format
CodedMonth Description XCoord YCoord
1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR 908297.3 1018623.0
2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC 903995.7 1014255.0
3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT 0.0 0.0
4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT 890704.7 1010659.0
5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) 881105.8 1008297.0
6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 882929.6 992941.3
how do I convert these into Xcoord and Ycoord columns into lon and lat format so that I can plot this using ggmap
I have found a couple of answers Convert latitude/longitude to state plane coordinates
But I cant seem to get it to work for my data
You can use the sf package to convert it to a simple features geography.
In order to get this to work you need to know what coordinate system you are working with, and based on the description you provide (State Plane NAD83 and are near St. Louis), My first guess was EPSG 26996 (NAD83 / Missouri East USFT), but that plotted in the middle of lake Huron, so I tried ESRI: 102696. You can look up projections at spatialreference.org.
library(sf)
library(tidyverse)
library(ggmap)
my_df <- read_csv("C:/Users/Brian/Documents/temp.csv")
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = 102696)
This sets the x and y to spatial coordinates. You need to re-project into a geographic system like WGS84 to convert to lat long. st_transform does this for us using crs = 4326, which is the WGS 84 coordinate system
my_latlon_df <- st_transform(my_sf_df, crs = 4326 )
my_latlon_df <- my_latlon_df%>%
mutate( lat= st_coordinates(my_latlon_df)[,1],
lon = st_coordinates(my_latlon_df)[,2])
my_latlon_df
# Simple feature collection with 6 features and 5 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -93.26566 ymin: 35.80151 xmax: -90.19163 ymax: 38.63065
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# # A tibble: 6 x 6
# X1 CodedMonth Description geometry lat lon
# * <chr> <chr> <chr> <POINT [°]> <dbl> <dbl>
# 1 1: 2019-09 AUTO THEFT-PERM RETNT/UNRECOV OVER 48HR (-90.19163 38.63065) -82.2 44.7
# 2 2: 2019-09 ASSLT-AGGRAV-OTH-WPN-2ND-CHILD-DOMESTIC (-90.20674 38.6187) -82.3 44.7
# 3 3: 2019-09 FORGERY-ISSUING FALSE INSTRUMENT OR CERTIFICAT (-93.26566 35.80151) -93.3 35.8
# 4 4: 2019-09 STLG BY DECEIT/IDENTITY THEFT REPORT (-90.25329 38.60893) -82.4 44.6
# 5 5: 2019-09 STALKING (HARASSMENT ONLY, NO THREAT) (-90.2869 38.60251) -82.5 44.6
# 6 6: 2019-09 LARCENY-MTR VEH PARTS UNDER $500 (-90.28065 38.56034) -82.5 44.5
We now have geographic coordinates with lat and long as columns of our data frame. The no location information is going to cause problems, since it will plot at the origin of the state plane coordinate plane, which is down in Arkansas somewhere. Let's remove it so we can focus on the good points
# let's exclude point 3 for now
my_latlon_df <- my_latlon_df[-3,]
box <- st_bbox(my_latlon_df) # bounding box
names(box) <- NULL # removing non-complient labels
buffer = .2
box2 <- box + c(-buffer, -buffer, buffer, buffer) # buffering
base_map <- get_map(location = box2, source = "osm") # getting base map
# plotting
ggmap(base_map)+
geom_sf(data = my_latlon_df,
color = "red",
size = 2
)+
scale_x_continuous(limits = c(-90.35, -90.1))+
scale_y_continuous(limits = c(38.5, 38.7))
Unfortunately, if you don't know what coordinate system your x and y points are in, it can become a frustrating game of trial and error. The projected coordinate systems basically create a Cartesian plane on the surface of the globe, and the choice of origin, scale and other parameters are specific to each projection. There isn't nearly as much difference in geographic coordinate systems such as WGS84.
The correct geographic system/projection is "ESRI:102696" so the code should read:
my_sf_df <- st_as_sf(my_df, coords = c("XCoord", "YCoord"), crs = "ESRI:102696" )

How can I extract bounding boxes in a row-wise manner using R?

I have an sf object containing multiple (square) polygons of a grid. What I like would be to get a column containing for each polygon the four values, which define the bounding box (bottom-left and top-right).
Here is an example for the the canton of Zurich in Switzerland:
library(raster)
library(sf)
library(dplyr)
ch <- getData('GADM', country = 'CH', level = 1)
ch_grid <- ch %>%
st_as_sf() %>%
filter(NAME_1 == "Zürich") %>%
st_make_grid(cellsize = 0.1, what = "polygons")
This gives me a 6x7 grid. Now I'm looking for a way to get for each of the square polygons the two coordinate pairs, which define the bounding box - preferably in a new column.
I hope it's clear what I mean. I would very much appreciate your help.
Given your ch_grid sfc object, this one-liner:
> ch_grid_df = cbind(st_sf(geometry=ch_grid),do.call(rbind,lapply(ch_grid, st_bbox)))
creates an sf data frame with four columns as required:
> ch_grid_df
Simple feature collection with 42 features and 4 fields
geometry type: POLYGON
dimension: XY
bbox: xmin: 8.358933 ymin: 47.16357 xmax: 9.058933 ymax: 47.76357
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
First 10 features:
xmin ymin xmax ymax ch_grid
1 8.358933 47.16357 8.458933 47.26357 POLYGON ((8.358933 47.16357...
2 8.458933 47.16357 8.558933 47.26357 POLYGON ((8.458933 47.16357...
3 8.558933 47.16357 8.658933 47.26357 POLYGON ((8.558933 47.16357...
4 8.658933 47.16357 8.758933 47.26357 POLYGON ((8.658933 47.16357...
5 8.758933 47.16357 8.858933 47.26357 POLYGON ((8.758933 47.16357...
6 8.858933 47.16357 8.958933 47.26357 POLYGON ((8.858933 47.16357...
This only uses base R functions and so will be robust against the vagaries of time and fashion.
The newest tidyverse pattern for iterating over rows is using dplyr::mutate(new_col = purrr::map(existing_col, func)), which works well with spatial objects including the geometry column in sf objects.
So something like this to return a bounding box and a grid layout for each row:
library(purrr)
ch_grid <- ch %>%
st_as_sf() %>%
mutate(bbox = map(geometry, st_bbox),
grid = map(geometry, ~ st_make_grid(., cellsize = 0.1, what = "polygons")))
If don't want to use purrr::map(), lapply() can be subbed in:
ch_grid <- ch %>%
st_as_sf() %>%
mutate(bbox = lapply(geometry, st_bbox),
grid = lapply(geometry, st_make_grid, cellsize = 0.1, what = "polygons"))

Resources