I have a shapefile about NYC Yellow cab service zones: taxi_zones.shp. It can be download here: https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
I want to check whether certain locations fall into any of the zones. Here is the R code I use:
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- do.call("st_sfc",c(lapply(1:nrow(pnts),
function(i) {st_point(as.numeric(pnts[i, ]))}), list("crs" = 4326)))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
zones <- apply(st_intersects(tt_trans, pnts_trans, sparse = FALSE), 2,
function(col) {
tt_trans[which(col), ]$LocationID
})
The first two points are within the zones defined by the shapefile. However, the third point is not. And the fourth point has incorrect coordinates. How should I modify the code so that for points outside the zones and points with incorrect coordinates, it returns 'NA'?
I have my own approach. Would that fulfill your requirements? I can't tell you what specifically is wrong with your code, but this one is also a bit cleaner:
library(sf)
tt <- read_sf('./Downloads/taxi_zones/taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817, -74.00668, 0, 500),
"y" = c(40.75798, 40.73178, 0, 400)
)
pnts_sf <- st_as_sf(pnts, coords = c('x', 'y'), crs = st_crs(4326))
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
pnts_trans <- pnts_sf %>% mutate(
intersection = as.integer(st_intersects( pnts_trans,tt_trans)))
The result would be
geometry intersection
1 POINT (-73.97817 40.75798) 161
2 POINT (-74.00668 40.73178) 158
3 POINT (0 0) NA
4 POINT (500 400) NA
I suggest you consider joining your spatial objects via sf::st_join(), as shown bellow; what it does is that it combines the attributes of your polygon objects and points objects.
The default behaviour is "left" join = points lacking polygons will get NA. It can be tweaked by setting left = FALSE in join parameters, resulting in "inner" join behaviour = points not contained in polygons will be omitted from result.
library(sf)
tt <- read_sf('taxi_zones.shp')
pnts <- data.frame(
"x" = c(-73.97817,-74.00668,0,500),
"y" = c(40.75798, 40.73178,0,400))
pnts_sf <- sf::st_as_sf(pnts, coords = c("x", "y"), crs = 4326)
pnts_trans <- st_transform(pnts_sf, 2163)
tt_trans <- st_transform(tt, 2163)
res <- sf::st_join(pnts_trans, tt_trans)
print(res)
Simple feature collection with 4 features and 6 fields (with 1 geometry empty)
geometry type: POINT
dimension: XY
bbox: xmin: 2152087 ymin: -130624.1 xmax: 9480615 ymax: 1178046
projected CRS: NAD27 / US National Atlas Equal Area
OBJECTID Shape_Leng Shape_Area zone LocationID borough geometry
1 161 0.03580391 7.191307e-05 Midtown Center 161 Manhattan POINT (2153474 -127064.5)
2 158 0.05480999 1.855683e-04 Meatpacking/West Village West 158 Manhattan POINT (2152087 -130624.1)
3 NA NA NA <NA> NA <NA> POINT (9480615 1178046)
4 NA NA NA <NA> NA <NA> POINT EMPTY
Related
I have created a function that uses st_join() from the sf package to extract the congressional district (a polygon) from a set of latitude and longitude coordinates, using a different shapefile to identify the congressional district depending on a "congress" argument that is specified. (This is necessary because districts are periodically redrawn, so the boundaries change over time.) The next step is to apply the function row by row to a data frame containing multiple rows of coordinates (and associated "congress" values) so that the congress value for a given row determines which shapefile to use, and then assign the extracted district to a new variable.
I'm running into trouble applying this function row-by-row. I first tried using the rowwise() and mutate() functions from dplyr, but got a "must be size 1" error. Based on the comments to this question, I put list() around the variable assigned inside the mutate() function, but this has resulted in the new variable being a list instead a single character string.
I would greatly appreciate help figuring out a way to either (i) modify the function so that it can be applied row by row using rowwise() and mutate() or (ii) apply my function row-by-row in some other way.
Reproducible code is below; you just need to download two shapefiles from https://cdmaps.polisci.ucla.edu/ ("districts104.zip" and "districts111.zip"), unzip them, and put them in your working directory.
library(tidyverse)
library(sf)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111)
latitude <- c(37.32935, 37.32935)
longitude <- c(-122.00954, -122.00954)
df_test <- data.frame(congress, latitude, longitude)
point_geo_test <- st_as_sf(df_test,
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104)) # prep for st_join()
sf_use_s2(FALSE) # preempt evaluation error that would otherwise pop up when using the st_join function
extract_district <- function(points, cong) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
paste(st_join_results$STATENAME, st_join_results$DISTRICT, sep = "-")
}
point_geo_test <- point_geo_test %>%
rowwise %>%
mutate(district = list(extract_district(points = point_geo_test, cong = congress)))
Edit 7 July:
From your comments I understand you were looking for something different, the assumption I made about why your function was giving multiple values was wrong. Hence this new answer from scratch:
The custom function you've written doesn't lend itself to row-by-row application, because it already processes all rows at once:
Given the following input:
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test contains these values:
> point_geo_test
[...]
congress geometry
1 104 POINT (-122.0095 37.32935)
2 111 POINT (-122.0095 37.32935)
3 104 POINT (73.72036 41.1134)
4 111 POINT (73.72036 41.1134)
5 104 POINT (-87.86885 42.15549)
6 111 POINT (-87.86885 42.15549)
and extract_district() returns this:
> extract_district(point_geo_test, 104)
[...]
[1] "California-14" "California-14" "NA-NA" "NA-NA" "Illinois-10" "Illinois-10"
This is already a result for each row. The only problem is, while they are the correct results for the coordinates of each row, they the name for those coordinates only during congress 104. Hence, these values are only valid for the rows in point_geo_test where congress == 104.
Extracting correct values for all rows
We will create a function that returns the correct data for all rows, eg the correct name for the coordinates during the associated congress.
I've simplified your code slightly: the df_test is not an intermediate data frame any more, but defined directly in the creation of point_geo_test. Any values I extract, I'll save into this data frame as well.
library(tidyverse)
library(sf)
sf_use_s2(FALSE)
districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")
congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)
point_geo_test <- st_as_sf(data.frame(congress, latitude, longitude),
coords = c(x = "longitude", y = "latitude"),
crs = st_crs(districts_104))
To keep the code more flexible and organized, I'll create a generic function that can fetch any parameter for the given coordinates:
extract_values <- function(points, parameter) {
# initialize return values, one for each row in `points`
values <- rep(NA, nrow(points))
# for each congress present in `points`, lookup parameter and store in the rows with matching congress
for(cong in unique(points$congress)) {
shapefile <- get(paste0("districts_", cong))
st_join_results <- st_join(points, shapefile, join = st_within)
values[points$congress == cong] <- st_join_results[[parameter]][points$congress == cong]
}
return(values)
}
Examples:
> extract_values(point_geo_test, 'STATENAME')
[1] "California" "California" NA NA "Illinois" "Illinois"
> extract_values(point_geo_test, 'DISTRICT')
[1] "14" "15" NA NA "10" "10"
Storing values
point_geo_test$state <- extract_values(point_geo_test, 'STATENAME')
point_geo_test$district <- extract_values(point_geo_test, 'DISTRICT')
point_geo_test$name <- paste(point_geo_test$state, point_geo_test$district, sep = "-")
Result:
> point_geo_test
Simple feature collection with 6 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -122.0095 ymin: 37.32935 xmax: 73.72036 ymax: 42.15549
Geodetic CRS: GRS 1980(IUGG, 1980)
congress state district name geometry
1 104 California 14 California-14 POINT (-122.0095 37.32935)
2 111 California 15 California-15 POINT (-122.0095 37.32935)
3 104 <NA> <NA> NA-NA POINT (73.72036 41.1134)
4 111 <NA> <NA> NA-NA POINT (73.72036 41.1134)
5 104 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)
6 111 Illinois 10 Illinois-10 POINT (-87.86885 42.15549)
I have a dataframe with coordinates of regions of interest, and another dataframe with temperature readings (bio1) taken in research stations, and their coordinates.
I'd like to create a new column to match the region of interest with the temperature of the nearest research station.
I have managed to do this with the following code (here's a simplified fake dataframe pair)
df1 <- data.frame(latitude = c(10.5,6,2), longitude = c(18,9,4))
df2 <- data.frame(vy = c(10,5,3), vx = c(20,10,3), bio1 = c('a','b','c'))
for(i in 1:nrow(df1)){
df1$temperature[i] <- df2$bio1[which(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i]) ==
min(abs(df2$vx - df1$longitude[i]) +
abs(df2$vy - df1$latitude[i])))]
}
So, this code checks all the combinations and choses the one with the smallest distance between latitude and longitude at each row.
I checked and it seems to work, but it's very slow to use on large dataframes.
Can you solve this issue with a faster method?
Something like this might work
library(tidyverse)
library(sf)
# put some id's in df1
df1$id <- LETTERS[1:3]
# make df1 and df2 simple objects
sf1 <- df1 %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326)
sf2 <- df2 %>%
st_as_sf(coords = c("vy", "vx"), crs = 4326)
# find nearest sf2 in sf1
sf1 %>%
mutate(nearest_bio = sf2$bio1[st_nearest_feature(sf2)])
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 4 ymin: 2 xmax: 18 ymax: 10.5
# Geodetic CRS: WGS 84
# id geometry nearest_bio
# 1 A POINT (18 10.5) b
# 2 B POINT (9 6) c
# 3 C POINT (4 2) b
I have several data frames in R. The first data frame contains the computed convex hull of a set of lat and long coordinates by market (courtesy of chull in R). It looks like this:
MyGeo<- "Part of Chicago & Wisconsin"
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
dat <- data.frame(Longitude, Latitude, MyGeo)
The second has zip codes by their latitude and longitudinal coordinates (courtesy of the US census website). It looks like this:
CensuseZip <- c("SomeZipCode1","SomeZipCode2","SomeZipCode3","SomeZipCode4","SomeZipCode5","SomeZipCode6","SomeZipCode7")
Longitude2 <- c(-131.470425,-133.457924,-131.693453,-87.64957,-87.99734,-87.895,-88.0228)
Latitude2 <- c(55.138352,56.239062,56.370538,41.87485,42.0086,42.04957,41.81055)
cen <- data.frame(Longitude2, Latitude2, CensuseZip)
Now I believe the first data table provides me with a polygon, or a border, that I should be able to use to identify zip codes that fall within that border. Ideally, I would want to create a third data table that looks something like this:
Longitude2 Latitude2 CensusZip MyGeo
-131.470425 55.138352 SomeZipCode1
-133.457924 56.239062 SomeZipCode2
-131.693453 56.370538 SomeZipCode3
-87.64957 41.87485 SomeZipCode4 Part of Chicago & Wisconsin
-87.99734 42.0086 SomeZipCode5 Part of Chicago & Wisconsin
-87.895 42.04957 SomeZipCode6 Part of Chicago & Wisconsin
-88.0228 41.81055 SomeZipCode7 Part of Chicago & Wisconsin
In essence, I am looking to identify all the zip codes that fall between the blue (see clickable image below) long and lat points. While it is visualized below, I am actually looking for the table described above.
However... I am having trouble doing this... I have tried using the below packages and script:
library(rgeos)
library(sp)
library(rgdal)
coordinates(dat) <- ~ Longitude + Latitude
coordinates(cen) <- ~ Longitude2 + Latitude2
over(cen, dat)
but I receive all NAs.
I use library(sf) to solve this type of point-in-polygon problem (sf is the successor to sp).
The function sf::st_intersection() gives you the intersection of two sf objects. In your case you can construct separate POLYGON and POINT sf objects.
library(sf)
Longitude <- c(-90.31914, -90.61911, -89.37842, -88.0988, -87.44875)
Latitude <- c(38.45781, 38.80097, 43.07961, 43.0624,41.49182)
## closing the polygon
Longitude[length(Longitude) + 1] <- Longitude[1]
Latitude[length(Latitude) + 1] <- Latitude[1]
## construct sf POLYGON
sf_poly <- sf::st_sf( geometry = sf::st_sfc( sf::st_polygon( x = list(matrix(c(Longitude, Latitude), ncol = 2)))) )
## construct sf POINT
sf_points <- sf::st_as_sf( cen, coords = c("Longitude2", "Latitude2"))
sf::st_intersection(sf_points, sf_poly)
# Simple feature collection with 4 features and 1 field
# geometry type: POINT
# dimension: XY
# bbox: xmin: -88.0228 ymin: 41.81055 xmax: -87.64957 ymax: 42.04957
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry
# 4 SomeZipCode4 POINT (-87.64957 41.87485)
# 5 SomeZipCode5 POINT (-87.99734 42.0086)
# 6 SomeZipCode6 POINT (-87.895 42.04957)
# 7 SomeZipCode7 POINT (-88.0228 41.81055)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
The result is all the points which are inside the polygon
You can also use sf::st_join(sf_poly, sf_points) to give the same result
And, the function sf::st_intersects(sf_points, sf_poly) will return a list saying whether the given POINT is inside the polygon
sf::st_intersects(sf_points, sf_poly)
# Sparse geometry binary predicate list of length 7, where the predicate was `intersects'
# 1: (empty)
# 2: (empty)
# 3: (empty)
# 4: 1
# 5: 1
# 6: 1
# 7: 1
Which you can use as an index / identifier of the original sf_points object to add a new column on
is_in <- sf::st_intersects(sf_points, sf_poly)
sf_points$inside_polygon <- as.logical(is_in)
sf_points
# Simple feature collection with 7 features and 2 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: -133.4579 ymin: 41.81055 xmax: -87.64957 ymax: 56.37054
# epsg (SRID): NA
# proj4string: NA
# CensuseZip geometry inside_polygon
# 1 SomeZipCode1 POINT (-131.4704 55.13835) NA
# 2 SomeZipCode2 POINT (-133.4579 56.23906) NA
# 3 SomeZipCode3 POINT (-131.6935 56.37054) NA
# 4 SomeZipCode4 POINT (-87.64957 41.87485) TRUE
# 5 SomeZipCode5 POINT (-87.99734 42.0086) TRUE
# 6 SomeZipCode6 POINT (-87.895 42.04957) TRUE
# 7 SomeZipCode7 POINT (-88.0228 41.81055) TRUE
I'm trying to place a grid over San Jose like this:
Grid of San Jose
You can make the grid visually using the following code:
ca_cities = tigris::places(state = "CA") #using tigris package to get shape file of all CA cities
sj = ca_cities[ca_cities$NAME == "San Jose",] #specifying to San Jose
UTM_ZONE = "10" #the UTM zone for San Jose, will be used to convert the proj4string of sj into UTM
main_sj = sj#polygons[[1]]#Polygons[[5]] #the portion of the shape file I focus on. This is the boundary of san jose
#converting the main_sj polygon into a spatialpolygondataframe using the sp package
tst_ps = sp::Polygons(list(main_sj), 1)
tst_sps = sp::SpatialPolygons(list(tst_ps))
proj4string(tst_sps) = proj4string(sj)
df = data.frame(f = 99.9)
tst_spdf = sp::SpatialPolygonsDataFrame(tst_sps, data = df)
#transforming the proj4string and declaring the finished map as "map"
map = sp::spTransform(tst_sps, CRS(paste0("+proj=utm +zone=",UTM_ZONE," ellps=WGS84")))
#designates the number of horizontal and vertical lines of the grid
NUM_LINES_VERT = 25
NUM_LINES_HORZ = 25
#getting bounding box of map
bbox = map#bbox
#Marking the x and y coordinates for each of the grid lines.
x_spots = seq(bbox[1,1], bbox[1,2], length.out = NUM_LINES_HORZ)
y_spots = seq(bbox[2,1], bbox[2,2], length.out = NUM_LINES_VERT)
#creating the coordinates for the lines. top and bottom connect to each other. left and right connect to each other
top_vert_line_coords = expand.grid(x = x_spots, y = y_spots[1])
bottom_vert_line_coords = expand.grid(x = x_spots, y = y_spots[length(y_spots)])
left_horz_line_coords = expand.grid(x = x_spots[1], y = y_spots)
right_horz_line_coords = expand.grid(x = x_spots[length(x_spots)], y = y_spots)
#creating vertical lines and adding them all to a list
vert_line_list = list()
for(n in 1 : nrow(top_vert_line_coords)){
vert_line_list[[n]] = sp::Line(rbind(top_vert_line_coords[n,], bottom_vert_line_coords[n,]))
}
vert_lines = sp::Lines(vert_line_list, ID = "vert") #creating Lines object of the vertical lines
#creating horizontal lines and adding them all to a list
horz_line_list = list()
for(n in 1 : nrow(top_vert_line_coords)){
horz_line_list[[n]] = sp::Line(rbind(left_horz_line_coords[n,], right_horz_line_coords[n,]))
}
horz_lines = sp::Lines(horz_line_list, ID = "horz") #creating Lines object of the horizontal lines
all_lines = sp::Lines(c(horz_line_list, vert_line_list), ID = 1) #combining horizontal and vertical lines into a single grid format
grid_lines = sp::SpatialLines(list(all_lines)) #converting the lines object into a Spatial Lines object
proj4string(grid_lines) = proj4string(map) #ensuring the projections are the same between the map and the grid lines.
trimmed_grid = intersect(grid_lines, map) #grid that shapes to the san jose map
plot(map) #plotting the map of San Jose
lines(trimmed_grid) #plotting the grid
However, I am struggling to turn each grid 'square' (some of the grid pieces are not squares since they fit to the shape of the san jose map) into a bin which I could input data into. Put another way, if each grid 'square' was numbered 1:n, then I could make a dataframe like this:
grid_id num_assaults num_thefts
1 1 100 89
2 2 55 456
3 3 12 1321
4 4 48 498
5 5 66 6
and fill each grid 'square' with data the point location of each crime occurrence, hopefully using the over() function from the sp package.
I have tried solving this problem for weeks, and I can't figure it out. I have looked for an easy solution, but I can't seem to find it. Any help would be appreciated.
Additionally, here's an sf and tidyverse-based solution:
With sf, you can make a grid of squares with the st_make_grid() function. Here I'll make a 2km grid over San Jose's bounding box, then intersect it with the boundary of San Jose. Note that I'm projecting to UTM zone 10N so I can specify the grid size in meters.
library(tigris)
library(tidyverse)
library(sf)
options(tigris_class = "sf", tigris_use_cache = TRUE)
set.seed(1234)
sj <- places("CA", cb = TRUE) %>%
filter(NAME == "San Jose") %>%
st_transform(26910)
g <- sj %>%
st_make_grid(cellsize = 2000) %>%
st_intersection(sj) %>%
st_cast("MULTIPOLYGON") %>%
st_sf() %>%
mutate(id = row_number())
Next, we can generate some random crime data with st_sample() and plot it to see what we are working with.
thefts <- st_sample(sj, size = 500) %>%
st_sf()
assaults <- st_sample(sj, size = 200) %>%
st_sf()
plot(g$geometry)
plot(thefts, add = TRUE, col = "red")
Crime data can then be joined to the grid spatially with st_join(). We can plot to check our results.
theft_grid <- g %>%
st_join(thefts) %>%
group_by(id) %>%
summarize(num_thefts = n())
plot(theft_grid["num_thefts"])
We can then do the same with the assaults data, then join the two datasets together to get the desired result. If you had a lot of crime datasets, these could be modified to work within some variation of purrr::map().
assault_grid <- g %>%
st_join(assaults) %>%
group_by(id) %>%
summarize(num_assaults = n())
st_geometry(assault_grid) <- NULL
crime_data <- left_join(theft_grid, assault_grid, by = "id")
crime_data
Simple feature collection with 190 features and 3 fields
geometry type: GEOMETRY
dimension: XY
bbox: xmin: 584412 ymin: 4109499 xmax: 625213.2 ymax: 4147443
epsg (SRID): 26910
proj4string: +proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
# A tibble: 190 x 4
id num_thefts num_assaults geometry
<int> <int> <int> <GEOMETRY [m]>
1 1 2 1 POLYGON ((607150.3 4111499, 608412 4111499, 608412 4109738,…
2 2 4 1 POLYGON ((608412 4109738, 608412 4111499, 609237.8 4111499,…
3 3 3 1 POLYGON ((608412 4113454, 608412 4111499, 607150.3 4111499,…
4 4 2 2 POLYGON ((609237.8 4111499, 608412 4111499, 608412 4113454,…
5 5 1 1 MULTIPOLYGON (((610412 4112522, 610412 4112804, 610597 4112…
6 6 1 1 POLYGON ((616205.4 4113499, 616412 4113499, 616412 4113309,…
7 7 1 1 MULTIPOLYGON (((617467.1 4113499, 618107.9 4113499, 617697.…
8 8 2 1 POLYGON ((605206.8 4115499, 606412 4115499, 606412 4114617,…
9 9 5 1 POLYGON ((606412 4114617, 606412 4115499, 608078.2 4115499,…
10 10 1 1 POLYGON ((609242.7 4115499, 610412 4115499, 610412 4113499,…
# ... with 180 more rows
With a Spatial* object, as your data
library(tigris)
ca_cities = tigris::places(state = "CA") #using tigris package to get shape file of all CA cities
sj = ca_cities[ca_cities$NAME == "San Jose",] #specifying to San Jose
sjutm = sp::spTransform(sj, CRS("+proj=utm +zone=10 +datum=WGS84"))
You can make a grid of polygons like this
library(raster)
r <- raster(sjutm, ncol=25, nrow=25)
rp <- as(r, 'SpatialPolygons')
Show it
plot(sjutm, col='red')
lines(rp, col='blue')
To count the number of cases per grid cell (using some random points here) you do not want to use the polygons but rather the RasterLayer
set.seed(0)
x <- runif(500, xmin(r), xmax(r))
y <- runif(500, ymin(r), ymax(r))
xy1 <- cbind(x, y)
x <- runif(500, xmin(r), xmax(r))
y <- runif(500, ymin(r), ymax(r))
xy2 <- cbind(x, y)
d1 <- rasterize(xy1, r, fun="count", background=0)
d2 <- rasterize(xy2, r, fun="count", background=0)
plot(d1)
plot(sjutm, add=TRUE)
Followed by
s <- stack(d1, d2)
names(s) = c("assault", "theft")
s <- mask(s, sjutm)
plot(s, addfun=function()lines(sjutm))
To get the table you are after
p <- rasterToPoints(s)
cell <- cellFromXY(s, p[,1:2])
res <- data.frame(grid_id=cell, p[,3:4])
head(res)
# grid_id assault theft
#1 1 1 1
#2 2 0 1
#3 3 0 3
#4 5 1 1
#5 6 1 0
#6 26 0 0
You can also create a SpatialPolygonsDataFrame from the results
pp <- as(s, 'SpatialPolygonsDataFrame')
pp
#class : SpatialPolygonsDataFrame
#features : 190
#extent : 584411.5, 623584.9, 4109499, 4147443 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=utm +zone=10 +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#variables : 2
#names : assault, theft
#min values : 0, 0
#max values : 4, 5
If your goal is only the visual, and not necessarily all the grid-aggregation code and data you can generate an interactive map and grid in library(mapdeck) (noting you'll need a Mapbox access token)
The first step to generate the data is borrowed from #kwalkertcu 's answer
library(tigris)
library(sf)
options(tigris_class = "sf", tigris_use_cache = TRUE)
set.seed(1234)
sj <- places("CA", cb = TRUE) %>%
filter(NAME == "San Jose") %>%
st_transform(26910)
thefts <- st_sample(sj, size = 500) %>%
st_sf() %>%
st_transform(crs = 4326)
## some random weight data
thefts$weight <- sample(1:100, size = nrow(thefts), replace = T)
Then, given a sf object with a weight column you can plot it using add_screengrid()
library(mapdeck)
set_token("MAPBOX_TOKEN")
mapdeck(
style = mapdeck_style("dark")
, location = c(-121.8, 37.3)
, zoom = 6
) %>%
add_screengrid(
data = thefts
, cell_size = 15
, weight = "weight"
)
Notes:
I'm using the github version of mapdeck where the API has changed slightly, but the CRAN version should yield the same result.
I have a relatively large number of coordinates for which I'd like to get the census tract (in addition to the FIPS code). I know that I can look up individual lat/lon pairs using call_geolocator_latlon (as done here), but this seems impractical for my purposes as the function issues a single call to the census bureaus' API, and I imagine would take a very long time to run on my ~200,000 pairs.
Is there a faster way to do this, perhaps by downloading shapefiles for each state using the block_groups function and mapping from lat/lon to census tract from there?
This doesn't use tigris, but utilizes sf::st_within() to check a data frame of points for overlapping tracts.
I'm using tidycensus here to get a map of California's tracts into R.
library(sf)
ca <- tidycensus::get_acs(state = "CA", geography = "tract",
variables = "B19013_001", geometry = TRUE)
Now to sim some data:
bbox <- st_bbox(ca)
my_points <- data.frame(
x = runif(100, bbox[1], bbox[3]),
y = runif(100, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca))
I'm doing 100 points here to be able to ggplot() the results, but the overlap calculation for 1e6 is fast, only a few seconds on my laptop.
my_points$tract <- as.numeric(st_within(my_points, ca)) # this is fast for 1e6 points
The results:
head(my_points) # tract is the row-index for overlapping census tract record in 'ca'
# but part would take forever with 1e6 points
library(ggplot2)
ggplot(ca) +
geom_sf() +
geom_sf(data = my_points, aes(color = is.na(tract)))
Great answer above. To get Census tract IDs you could also use st_join(). NAs for the tract IDs are those points that are within California's bounding box but don't intersect the state itself.
library(tigris)
library(tidyverse)
library(sf)
ca_tracts <- tracts("CA", class = "sf") %>%
select(GEOID, TRACTCE)
bbox <- st_bbox(ca_tracts)
my_points <- data.frame(
x = runif(200000, bbox[1], bbox[3]),
y = runif(200000, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca_tracts))
my_points_tract <- st_join(my_points, ca_tracts)
> my_points_tract
Simple feature collection with 200000 features and 2 fields
geometry type: POINT
dimension: XY
bbox: xmin: -124.4819 ymin: 32.52888 xmax: -114.1312 ymax: 42.0095
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
First 10 features:
GEOID TRACTCE geometry
1 06025012400 012400 POINT (-114.6916 33.42711)
2 <NA> <NA> POINT (-118.4255 41.81896)
3 06053990000 990000 POINT (-121.8154 36.22736)
4 06045010200 010200 POINT (-123.6909 39.70572)
5 <NA> <NA> POINT (-116.9055 37.93532)
6 06019006405 006405 POINT (-119.511 37.09383)
7 06049000300 000300 POINT (-120.7215 41.3392)
8 <NA> <NA> POINT (-115.8916 39.32392)
9 06023990100 990100 POINT (-124.2737 40.14106)
10 06071008901 008901 POINT (-117.319 35.62759)