R googleway with ABS census data - r

I'm intending to analyse Australian census data using googleway to produce heat maps.
My approach has been to use prepared data from googleway melbourne which contains column SA2_NAME and join it with the ESRI shape file from the census data after conversion with rgdal (code below). The problem is that joining by SA2_NAME is not unique for polylines - some SA2 areas are made of multiple 'sub' areas. So it seems this is not a good approach.
A better approach would be to convert the ESRI shape data sa2_shape below to have polylines in the format of the melbourne data. How is this done?
Code below produces a 'bridging' data frame to use in joining melbourne data from googleway with ABS data which has SA2_MAIN as the key field - as stated above, the problem with this 'hack' approach is that polylines are not unique by SA2_NAME
library(tidyverse)
library(googleway)
library(rgdal)
shape_path <- "abs_data/sa2_esri_shapefile"
shape_file <- "SA2_2016_AUST"
sa2_shape <- readOGR(shape_path, shape_file)
sa2_df <- data.frame(sa2_shape$SA2_MAIN, sa2_shape$SA2_NAME)
names(sa2_df) <- c("SA2_MAIN", "SA2_NAME")
sa2_df <- sa2_df %>% semi_join(melbourne, by = "SA2_NAME")

As per SymbolixAU comment - used sf to load the data and this works as long as geometry is not an empty list - see code below.
library(tidyverse)
library(googleway)
library(sf)
shape_path <- "abs_data/sa2_esri_shapefile"
shape_file <- "SA2_2016_AUST"
shape_file_path <- paste0(shape_path, "/", shape_file, '.shp')
sa2_shape <- sf::st_read(shape_file_path)
sa2_shape <- sa2_shape %>%
filter(STATE_NAME == "Victoria",
AREA_SQKM > 0)# This is important - otherwise google_map() will crash!
google_map() %>%
googleway::add_polygons(data = sa2_shape,
polyline = "geometry",
fill_colour = "SA2_NAME")
> sa2_shape %>% head()
Simple feature collection with 6 features and 6 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 143.6849 ymin: -37.68153 xmax: 143.951 ymax: -37.46847
epsg (SRID): 4283
proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
SA2_MAIN SA2_MAIN16 SA2_NAME STATE_CODE STATE_NAME AREA_SQKM geometry
1 201011001 201011001 Alfredton 2 Victoria 52.7111 MULTIPOLYGON (((143.7072 -3...
2 201011002 201011002 Ballarat 2 Victoria 12.3787 MULTIPOLYGON (((143.8675 -3...
3 201011003 201011003 Ballarat - North 2 Victoria 92.3577 MULTIPOLYGON (((143.853 -37...
4 201011004 201011004 Ballarat - South 2 Victoria 32.8541 MULTIPOLYGON (((143.8675 -3...
5 201011005 201011005 Buninyong 2 Victoria 51.5855 MULTIPOLYGON (((143.8533 -3...
6 201011006 201011006 Delacombe 2 Victoria 34.1608 MULTIPOLYGON (((143.7072 -3...

Related

Create new geometry on grouped column in R sf

I'd like to create a new shapefile or a new geometry variable that allows me to plot borders around regions in R. I'm using the sf and mapping with tmap. Basically, I'm adding a character vector to an sf object and would like to make the character vector the new/preferred mapping border.
Here is an example of my approach, which doesn't do what I'd like. I can't tell that it does anything.
library(tidyverse)
library(sf)
library(tmap)
## use North Carolina example
nc = st_read(system.file("shape/nc.shp", package="sf"))
nc_new.region <- nc %>% ## add new region variable
mutate(new.region = sample(c('A', 'B', 'C'), nrow(.),replace = T))
nc_union <- nc_new.region %>%
group_by(new.region) %>% # group by the new character vector
mutate(new_geometry = st_union(geometry)) # union on the geometry variable
# map with tmap package
tm_shape(nc_union)+
tm_borders()
This happens because mutate(new_geometry = st_union(geometry)) creates a "new" column within the original sf object, but plotting still uses the "original" geometry column. Indeed, if you have a look at your nc_union object, you'll see that it still contains 100 features (therefore, no "dissolving" was really done).
To do what you wish, you should instead create a "new" sf object using summarize over the groups:
library(tidyverse)
library(sf)
library(tmap)
## use North Carolina example
nc = st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `D:\Documents\R\win-library\3.5\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
nc_new.region <- nc %>% ## add new region variable
mutate(new.region = sample(c('A', 'B', 'C'), nrow(.),replace = T))
nc_union <- nc_new.region %>%
group_by(new.region) %>%
summarize()
> nc_union
Simple feature collection with 3 features and 1 field
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
# A tibble: 3 x 2
new.region geometry
<chr> <MULTIPOLYGON [°]>
1 A (((-78.65572 33.94867, -79.0745 34.30457, -79.04095 34.3193, -79.02947 34.34737, -7~
2 B (((-79.45597 34.63409, -79.6675 34.80066, -79.68596 34.80526, -79.66015 34.8179, -7~
3 C (((-78.65572 33.94867, -78.63472 33.97798, -78.63027 34.0102, -78.58778 34.03061, -~
tm_shape(nc_union)+
tm_borders()
You can see that now nc_union contains only 3 MULTIPOLYGONS, and plot reflects the "aggregation".
See also: https://github.com/r-spatial/sf/issues/290
Created on 2019-08-23 by the reprex package (v0.3.0)

R, sf: intersect lines with the borders of multipolygons, extract coordinates of those intersections

I am newbie to SF and stack, hope my question is clear enough.
I was able to create a set of lines connecting 1 point to a set of points all over the US.
The I can read the US counties into multipolygons.
My goal is to find and geolocate all the points where the lines I created cross the county borders.
So far I was able to create the lines from the points:
points_to_lines <- dt %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
group_by(lineid) %>%
summarize(do_union = FALSE) %>% lineid
st_cast("LINESTRING")
This is the head of the lines
Simple feature collection with 1628 features and 1 field
geometry type: LINESTRING
dimension: XY
bbox: xmin: 30.1127 ymin: -91.32484 xmax: 37.23671 ymax: -82.31262
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
# A tibble: 1,628 x 2
lineid geometry
<int> <LINESTRING [°]>
1 1 (33.51859 -86.81036, 36.16266 -86.7816)
2 2 (33.51859 -86.81036, 34.61845 -82.47791)
This is the head of the county dataset.
Reading layer `US_county_1930_conflated' from data source `~/county_gis/1930' using driver `ESRI Shapefile'
Simple feature collection with 3110 features and 18 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -7115608 ymin: -1337505 xmax: 2258244 ymax: 4591848
epsg (SRID): NA
proj4string: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs
Very naively I have tried to give them both the same set of coordinates, and then st_intersects them. The non sparse matrix seams to say that all the lines intersect only one county.
gis1930_p <- st_set_crs(gis1930, 4326) %>% st_transform(4326)
st_intersects(points, gis1930_p, sparse=FALSE)
I also plot the lines on top of the counties but only the map of the US counties is mapped.
plot(gis1930_p[0], reset = FALSE)
plot(points[0], add = TRUE)
Any help would be greatly appreciated and please let me know if I can provide any additional details.
You didn't provide your data so I am going to use the dataset provided in: https://gis.stackexchange.com/a/236922/32531
The main thing you need is the st_intersection function:
library(sf)
line_1 <- st_as_sfc("LINESTRING(458.1 768.23, 455.3 413.29, 522.3 325.77, 664.8 282.01, 726.3 121.56)")
poly_1 <- st_as_sfc("MULTIPOLYGON(((402.2 893.03, 664.8 800.65, 611.7 666.13, 368.7 623.99, 215.1 692.06, 402.2 893.03)), ((703.9 366.29, 821.2 244.73, 796.1 25.93, 500.0 137.76, 703.9 366.29)))")
pnts <- st_intersection(line_1,
st_cast(poly_1, "MULTILINESTRING", group_or_split = FALSE))
plot(poly_1)
plot(line_1, add = TRUE)
plot(pnts, add = TRUE, col = "red", pch = 21)

Get Census Tract from Lat/Lon using tigris

I have a relatively large number of coordinates for which I'd like to get the census tract (in addition to the FIPS code). I know that I can look up individual lat/lon pairs using call_geolocator_latlon (as done here), but this seems impractical for my purposes as the function issues a single call to the census bureaus' API, and I imagine would take a very long time to run on my ~200,000 pairs.
Is there a faster way to do this, perhaps by downloading shapefiles for each state using the block_groups function and mapping from lat/lon to census tract from there?
This doesn't use tigris, but utilizes sf::st_within() to check a data frame of points for overlapping tracts.
I'm using tidycensus here to get a map of California's tracts into R.
library(sf)
ca <- tidycensus::get_acs(state = "CA", geography = "tract",
variables = "B19013_001", geometry = TRUE)
Now to sim some data:
bbox <- st_bbox(ca)
my_points <- data.frame(
x = runif(100, bbox[1], bbox[3]),
y = runif(100, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca))
I'm doing 100 points here to be able to ggplot() the results, but the overlap calculation for 1e6 is fast, only a few seconds on my laptop.
my_points$tract <- as.numeric(st_within(my_points, ca)) # this is fast for 1e6 points
The results:
head(my_points) # tract is the row-index for overlapping census tract record in 'ca'
# but part would take forever with 1e6 points
library(ggplot2)
ggplot(ca) +
geom_sf() +
geom_sf(data = my_points, aes(color = is.na(tract)))
Great answer above. To get Census tract IDs you could also use st_join(). NAs for the tract IDs are those points that are within California's bounding box but don't intersect the state itself.
library(tigris)
library(tidyverse)
library(sf)
ca_tracts <- tracts("CA", class = "sf") %>%
select(GEOID, TRACTCE)
bbox <- st_bbox(ca_tracts)
my_points <- data.frame(
x = runif(200000, bbox[1], bbox[3]),
y = runif(200000, bbox[2], bbox[4])
) %>%
# convert the points to same CRS
st_as_sf(coords = c("x", "y"),
crs = st_crs(ca_tracts))
my_points_tract <- st_join(my_points, ca_tracts)
> my_points_tract
Simple feature collection with 200000 features and 2 fields
geometry type: POINT
dimension: XY
bbox: xmin: -124.4819 ymin: 32.52888 xmax: -114.1312 ymax: 42.0095
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
First 10 features:
GEOID TRACTCE geometry
1 06025012400 012400 POINT (-114.6916 33.42711)
2 <NA> <NA> POINT (-118.4255 41.81896)
3 06053990000 990000 POINT (-121.8154 36.22736)
4 06045010200 010200 POINT (-123.6909 39.70572)
5 <NA> <NA> POINT (-116.9055 37.93532)
6 06019006405 006405 POINT (-119.511 37.09383)
7 06049000300 000300 POINT (-120.7215 41.3392)
8 <NA> <NA> POINT (-115.8916 39.32392)
9 06023990100 990100 POINT (-124.2737 40.14106)
10 06071008901 008901 POINT (-117.319 35.62759)

Subset class sfc_LINESTRING & sfc objects within a bbox

Example:
bbox <- c(-0.1178, 51.4232, -0.0185, 51.5147) # I know it needs to be sf df object
# we have
df
#> Geometry set for 300 features
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -0.113894 ymin: 51.49739 xmax: -0.0764779 ymax: 51.59839
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> LINESTRING (-0.113894 51.50631, -0.1135137 51.5...
#> LINESTRING (-0.0767875 51.59837, -0.0764779 51....
#> ....
How can I do something like
df[bbox]
and keep the linestrings which are within the bbox. Thanks.
Here's an example using an sf object from tigris, just for reproducibility. I'm using towns in New Haven County, Connecticut, plotting it the way it comes in. Then I crop it to a bounding box I made up, using st_crop, which I believe was added fairly recently to sf. If I had the bbox as a shape, instead of a vector of coordinates, I could have used st_intersection.
I don't have a linestring object handy, but I'd assume it works the same way.
library(tidyverse)
library(sf)
# selecting just to limit the amount of data in my sf
ct_sf <- tigris::county_subdivisions(state = "09", county = "09", cb = T, class = "sf") %>%
select(NAME, geometry)
plot(ct_sf)
crop_bbox <- c(xmin = -73, ymin = 41.2, xmax = -72.7, ymax = 41.5)
ct_cropped <- st_crop(ct_sf, crop_bbox)
plot(ct_cropped)

How to create zipcode boundaries in R

I am trying to create a map that has the name of the 'community' showing the boundaries of multiple zip codes. Data that I have is similar to below. Where the variable is the name of the community and the numbers are the corresponding zipcodes.
Tooele <- c('84074','84029')
NEUtahCo <- c('84003', '84004', '84042', '84062')
NWUtahCounty <- c('84005','84013','84043','84045')
I was able to make a map of the entire area I want using
ggmap(get_map(location = c(lon=-111.9, lat= 40.7), zoom = 9))
Attached is a picture of what I want.
You have a decent foundation for this already by having figured out the shapefile and how it matches the zips you want to show. Simple features (sf) make this pretty easy, as does the brand new ggplot2 v3.0.0 which has the geom_sf to plot sf objects.
I wasn't sure if the names of the different areas (counties?) that you have are important, so I just threw them all into little tibbles and bound that into one tibble, utah_zips. tigris also added sf support, so if you set class = "sf", you get an sf object. To keep it simple, I'm just pulling out the columns you need and simplifying one of the names.
library(tidyverse)
library(tigris)
library(ggmap)
Tooele <- c('84074','84029')
NEUtahCo <- c('84003', '84004', '84042', '84062')
NWUtahCounty <- c('84005','84013','84043','84045')
utah_zips <- bind_rows(
tibble(area = "Tooele", zip = Tooele),
tibble(area = "NEUtahCo", zip = NEUtahCo),
tibble(area = "NWUtahCounty", zip = NWUtahCounty)
)
zips_sf <- zctas(cb = T, starts_with = "84", class = "sf") %>%
select(zip = ZCTA5CE10, geometry)
head(zips_sf)
#> Simple feature collection with 6 features and 1 field
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -114.0504 ymin: 37.60461 xmax: -109.0485 ymax: 41.79228
#> epsg (SRID): 4269
#> proj4string: +proj=longlat +datum=NAD83 +no_defs
#> zip geometry
#> 37 84023 MULTIPOLYGON (((-109.5799 4...
#> 270 84631 MULTIPOLYGON (((-112.5315 3...
#> 271 84334 MULTIPOLYGON (((-112.1608 4...
#> 272 84714 MULTIPOLYGON (((-113.93 37....
#> 705 84728 MULTIPOLYGON (((-114.0495 3...
#> 706 84083 MULTIPOLYGON (((-114.0437 4...
Then you can filter the sf for just the zips you need—since there's other information (the county names), you can use a join to get everything in one sf data frame:
utah_sf <- zips_sf %>%
inner_join(utah_zips, by = "zip")
head(utah_sf)
#> Simple feature collection with 6 features and 2 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -113.1234 ymin: 40.21758 xmax: -111.5677 ymax: 40.87196
#> epsg (SRID): 4269
#> proj4string: +proj=longlat +datum=NAD83 +no_defs
#> zip area geometry
#> 1 84029 Tooele MULTIPOLYGON (((-112.6292 4...
#> 2 84003 NEUtahCo MULTIPOLYGON (((-111.8497 4...
#> 3 84074 Tooele MULTIPOLYGON (((-112.4191 4...
#> 4 84004 NEUtahCo MULTIPOLYGON (((-111.8223 4...
#> 5 84062 NEUtahCo MULTIPOLYGON (((-111.7734 4...
#> 6 84013 NWUtahCounty MULTIPOLYGON (((-112.1564 4...
You already have your basemap figured out, and since ggmap makes ggplot objects, you can just add on a geom_sf layer. The tricks are just to make sure you declare the data you're using, set it to not inherit the aes from ggmap, and turn off the graticules in coord_sf.
basemap <- get_map(location = c(lon=-111.9, lat= 40.7), zoom = 9)
ggmap(basemap) +
geom_sf(aes(fill = zip), data = utah_sf, inherit.aes = F, size = 0, alpha = 0.6) +
coord_sf(ndiscr = F) +
theme(legend.position = "none")
You might want to adjust the position of the basemap, since it cuts off one of the zips. One way is to use st_bbox to get the bounding box of utah_sf, then use that to get the basemap.

Resources