Mapping Zip Code vs. County shapefile in R - r

I am trying to map the polygons for various geographic areas (i.e. county/zip codes). Based on what I have found at this blog I can easily accomplish this for counties.
library(rgdal)
library(rgeos)
library(leaflet)
url<-"http://www2.census.gov/geo/tiger/TIGER2010DP1/County_2010Census_DP1.zip"
downloaddir<-getwd()
destname<-"tiger_county.zip"
download.file(url, destname)
unzip(destname, exdir=downloaddir, junkpaths=TRUE)
filename<-list.files(downloaddir, pattern=".shp", full.names=FALSE)
filename<-gsub(".shp", "", filename)
# ----- Read in shapefile (NAD83 coordinate system)
# ----- this is a fairly big shapefile and takes 1 minute to read
dat<-readOGR(downloaddir, "County_2010Census_DP1")
# ----- Create a subset of New York counties
subdat<-dat[substring(dat$GEOID10, 1, 2) == "36",]
# ----- Transform to EPSG 4326 - WGS84 (required)
subdat<-spTransform(subdat, CRS("+init=epsg:4326"))
# ----- save the data slot
subdat_data<-subdat#data[,c("GEOID10", "ALAND10")]
# ----- simplification yields a SpatialPolygons class
subdat<-gSimplify(subdat,tol=0.01, topologyPreserve=TRUE)
# ----- to write to geojson we need a SpatialPolygonsDataFrame
subdat<-SpatialPolygonsDataFrame(subdat, data=subdat_data)
leaflet() %>%
addTiles() %>%
addPolygons(data=subdat)
But if I run the exact same code with a different file for zip codes
url <- "http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_zcta510_500k.zip"
I get a completely different area of the country instead of New York.
Not sure if someone is more familiar with these datasets and these functions to explain why this difference happens?

Given #hrbrmstr noticed that the zip codes returned are in fact zip codes in Alabama this made me second guess my previous assumption on the structure of the GEOID10 variable. I discovered this link which says that with the zcta files the GEOID10 variable is actually just the zip codes so it is not possible to filter the same as the county file.
I figured out another way to filter using the zip_codes dataset from the noncensus package. I then substituted the line
subdat<-dat[substring(dat$GEOID10, 1, 2) == "36",]
for
# get zip codes for New York
ny_zips <- zip_codes[zip_codes$state=="NY",]
subdat<-dat[dat$GEOID10 %in% ny_zips$zip,]

Related

Use "st_transform()" to transform coordinates to another projection - when creating cartogram

I have a shapefile of population estimates of different administrative levels on Nigeria and I want to create a cartogram out of it.
I used the cartogram package and tried the following
library(cartogram)
admin_lvl2_cartogram <- cartogram(admin_level2_shape, "mean", itermax=5)
However this gives me an error stating "Error: Using an unprojected map. This function does not give correct centroids and distances for longitude/latitude data:
Use "st_transform()" to transform coordinates to another projection." I'm not sure how to resolve this
To recreate the initial data
Download the data using the wopr package
library(wopr)
catalogue <- getCatalogue()
# Select files from the catalogue by subsetting the data frame
selection <- subset(catalogue,
country == 'NGA' &
category == 'Population' &
version == 'v1.2')
# Download selected files
downloadData(selection)
Manually unzip the downloaded zip file (NGA_population_v1_2_admin.zip) and read in the data
library(rgdal)
library(here)
admin_level2_shape <- readOGR(here::here("wopr/NGA/population/v1.2/NGA_population_v1_2_admin/NGA_population_v1_2_admin_level2_boundaries.shp"))
The function spTransform in the sp package is probably easiest because the readOGR call returns a spatial polygon defined in that package.
Here's a full example that transforms to a suitable projection for Nigeria, "+init=epsg:26331". You'll probably have to Google to find the exact one for your needs.
#devtools::install_github('wpgp/wopr')
library(wopr)
library(cartogram)
library(rgdal)
library(sp)
library(here)
catalogue <- getCatalogue()
# Select files from the catalogue by subsetting the data frame
selection <- subset(catalogue, country == 'NGA' & category == 'Population' & version == 'v1.2')
# Download selected files
downloadData(selection)
unzip(here::here("wopr/NGA/population/v1.2/NGA_population_v1_2_admin.zip"),
overwrite = T,
exdir = here::here("wopr/NGA/population/v1.2"))
admin_level2_shape <- readOGR(here::here("wopr/NGA/population/v1.2/NGA_population_v1_2_admin/NGA_population_v1_2_admin_level2_boundaries.shp"))
transformed <- spTransform(admin_level2_shape, CRS("+init=epsg:26331"))
admin_lvl2_cartogram <- cartogram(transformed, "mean", itermax=5)
I confess I don't know anything about the specific packages so I don't know if what is produced is correct, but at least it transforms.

Flow mapping in R

I'm trying to plot trips between zipcodes in R. Specifically, I'm trying to create an interactive where you can click on each zipcode, and see the other zipcodes colored according to how many people traveled from the zip you clicked on to the other zipcodes. Sort of like this: https://www.forbes.com/special-report/2011/migration.html
But less fancy; just showing "out-migration" would be super.
I've been messing with this in R using the leaflet package, but I haven't managed to figure it out. Could someone with better R skills help me out? Any insight would be much appreciated.
I've downloaded a shapefile of zipcodes in LA county from here:
https://data.lacounty.gov/Geospatial/ZIP-Codes/65v5-jw9f
Then I used the code below to create some toy data.
You can find the zipcode shapefiles here:
https://drive.google.com/file/d/0B2a3BZ6nzEGJNk55dmdrdVI2MTQ/view?usp=sharing
And you can find the toy data here:
https://drive.google.com/open?id=0B2a3BZ6nzEGJR29EOFdjR1NPR3c
Here's the code I've got so far:
require(rgdal)
setwd("~/Downloads/ZIP Codes")
# Read SHAPEFILE.shp from the current working directory (".")
shape <- readOGR(dsn = ".", layer = "geo_export_89ff0f09-a580-4844-988a-c4808d510398")
plot(shape) #Should look like zip codes in LA county
#get a unique list of zipcodes
zips <- as.numeric(as.character(unique(shape#data$zipcode)))
#create a dataframe with all the possible combination of origin and destination zipcodes
zips.df <- data.frame(expand.grid(as.character(zips),as.character(zips)), rpois(96721,10))
#give the dataframe some helpful variable names
names(zips.df) <- c("origin_zip", "destination_zip","number_of_trips")
Like I said, any help would be much appreciated. Thanks!

R: Match Polygons from Shapefile 1 to Area Codes in shapefile 2

I was asked whether R can work with shapefiles - I never worked with shapefiles myself before, but I am sure, others must have come across this kind of question!
I have two shapefiles:
a) shapefile 1 (PolygonSamples.shp) contains a list of polygons which are distributed all over Germany (attached is a sample). The polygons might be smaller, equal or larger than the polygon of one postal codes polygon.
b) shapefile 2 lists the german postal codes and can be downloaded from
https://blog.oraylis.de/2010/05/german-map-spatial-data-for-plz-postal-code-regions/
The question is now:
How to 'match' the two shapefiles to get a dataframe that lists which polygon in shapefile 1 matches which postal codes(s) of shapefile 2. The result ideally looks like
Polygon ID (shapefile 1) Postal Code (shapefile 2)
1 80995
2 80997
2 80999
3 81247
Nothing of what I found matches really my question.
For example From a shapefile with polygons/areas, and points (lat,lon), figure out which polygon/area each point belongs to? In R
seems close, but I don't manage to get the desired dataframe (or datatable) output.
library(maps)
library(maptools)
# Polygons
tmp_dir <- "C:/Users/.../"
polygons <- readShapeSpatial(sprintf('%s/polygons.shp', tmp_dir)
plot(polygons)
# Postal codes
dir <- "C:/Users/..../"
postcode <- readShapeSpatial(sprintf('%s/post_pl.shp', dir)
plot(postcode)
The missing codes snipplet would read something like
result_table <- match(polygons_ID, postcode,
data1= polygon, data2 = postcode,
by = "coordinates in the shapefile"
Sample of polygons in a shapefile (.shp) incl. other spatial files (.dbf,.prj, .qpj,.shx) can be send.
Any help is really VERY much appreciated!
PS: R version 3.2.3, 64 bit, RStudio on Windows 7
Unfortunately I did not find an answer in R, but I could figure out how to match the two independent shapefiles in QGIS.
The main problem: The custom shapefile uses in the .prj file the geocoding Google Mercator (EPSG = 900913), while the downloaded postal code shapefile uses EPSG 4326.
QGIS does not automatically recognize these .prj files as projection files. One has to set them by hand.
Most importantly: Google Mercator (EPSG = 900913) was changed to EPSG= 3857. So for the custom shapefile I had to set – by hand! – the CRS to WGS 84/Pseudo-Mercator EPSG = 3857.
Now I could right click on the custom shape layer -> save as …. And Change the CRS to EPSG 4326. Thus the new custom shapefile now has the same projection like the downloaded postal code shapefile, and they can be joined by location.
(PS: Although I have a solution to do the conversion by hand, I would love to do this in R, because I need the resulting file for analysis.)
Check out: https://gis.stackexchange.com/questions/140504/extracting-intersection-areas-in-r?newreg=033544fa0f5349bcb8167d78867c8073
It gives you which shapefiles in dataset B overlap with a shapefile in dataset A as well as how much area in each of B's shapefiles is present in the target shapefile.

Choropleth Maps in R - TIGER Shapefile issue

Have a Question on Mapping with R, specifically around the choropleth maps in R.
I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).
My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.
I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?
To be a little more detailed, here's the point where R stops working -
shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")
(where the shp file is the census/TIGER) shape file.
Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output
require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")
Error: cannot allocate vector of size 317 Kb
There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.
The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.
This map:
was made from your data using the following code.
library(rgdal)
library(ggplot2)
library(stringr)
library(RColorBrewer)
setwd("<directory containing shapfiles and sample data>")
data <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros
zips <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
map <- zips[zips$ZCTA5CE10 %in% data$ZIP,] # extract only zips in your Sample.csv
map.df <- fortify(map) # convert to data frame suitable for plotting
# merge data from Samples.csv into map data frame
map.data <- data.frame(id=rownames(map#data),ZIP=map#data$ZCTA5CE10)
map.data <- merge(map.data,data,by="ZIP")
map.df <- merge(map.df,map.data,by="id")
# load state boundaries
states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
states.df <- fortify(states) # convert to data frame suitable for plotting
ggMap <- ggplot(data = map.df, aes(long, lat, group = group))
ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
ggMap <- ggMap + coord_equal()
ggMap
Explanation:
The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map#polygons and map#data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).
So the basic workflow is as follows:
1. Load polygon shapefiles into Spatial object ( => zips)
2. Subset if appropriate ( => map).
3. Convert to data frame suitable for plotting ( => map.df).
4. Merge data from Sample.csv into map.df.
5. Draw the map.
Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.
Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map#data. The zip codes are stored in map#data, in column ZCTA5CE10. So first we must create a data frame that associates the map#data row names (id) with map#data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.
Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.
NB: The state boundary layer, uses the US States TIGER file.
I would advise the following.
Use readOGR from the rgdal package rather than readShapeSpatial.
Consider using ggplot2 for good-looking maps - many of the examples use this.
Refer to one of the existing examples of creating a choropleth such as this one to get an overview.
Start with a simple choropleth and gradually add your own data; don't try and get it all right at once.
If you need more help, create a reproducible example with a SMALL fake dataset and with links to the shapefiles in question. The idea is that you make it easy to help us help you rather than discourage us by not supplying code and data in your question.

How can I batch geocode street addresses from a csv file in to R?

Edit; answer below.
Batch geocoding can be done like this using ggmap, file names are mine. Code was adapted from David Smith's Revolutions Blog Post
library(ggmap)
#Read in csv file
FDNYHouse = read.csv("Path to your csv file here.csv")
#Get column header names if you don't already have them
names(FDNYHouse)
#Create a file of just addresses that need to be geocoded
#You can require a state by replacing State below with New York if state was missing
#Everything inside paste() is a column header from the csv file
FDNYAddresses = with(FDNYHouse, paste(FacilityAddress, Borough, State, sep = ","))
#Now we can geocode the addresses
FDNYLocations = geocode(FDNYAddresses)
#The FDNYLocations file will have a lon and lat column representing your geocoded data
#My next problem is getting the shape file projection to match my geocoded points
I have created a census tract map of NYC using ggplot2 and a shape file. Next, I'd like to create a data frame using street addresses of fire houses to lay over the top of the map using a csv file I downloaded here:
FDNY Firehouse Locations
The shape file for census tracts is locate here (it's the 2010 version in black):
NYC Shape File
My problems are that the data doesn't list city and state, and that I don't know how to write a function that can grab these addresses and geocode them with Google using something like ggmap.
Any advice or nudges in the right direction would be appreciated. I'm new to R and stackoverflow so go easy on me.
Edit: Did anyone who marked this as already asked either A) look at my actual data or B) realize that the question you think I repeated is 3 years old? Guess nothing new has happened in R in the last 3 years right? The world is flat, move along folks. /rant
I can use ggmap and the geocode() function to get lat and lon without creating a function to do it.
#As an example
install.packages("ggmap")
library(ggmap)
geocode("San Francisco")
The problem, again, is how to tell R to read my csv file, which is missing city and state data, so that it can create the 200+ lat and lon measurement I need without me having to geocode 1 address at a time.
The second issue is then taking this data, making a data frame and adding it to the NYC shape file I already have.
That answer from 3 years ago is complicated and confusing for someone without the experience most people who looked at this post have...I also believe it doesn't answer my question.
I recently solved a similar problem. Below are two pieces of code. The first function converts addresses to lat/lon (if you can't abide by Google's terms of use, look for the Data Science Toolkit as a good standalone alternative for geo-coding.) The second function looks at a given lat/lon pair and determines which polygon (Census tract) contains those coordinates. Very useful for doing choropleth maps.
library("RJSONIO") #Load Library
library("plyr")
library("RODBC")
library(maptools)
getGeoCode <- function(gcStr)
{ gcStr <- gsub(' ','%20',gcStr) #Encode URL Parameters
#Open Connection
connectStr <- paste('http://http://maps.googleapis.com/maps/api/geocode/json?address=',gcStr, sep="")
con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con, warn = FALSE), collapse=""))
close(con)
#Flatten the received JSON
data.json <- unlist(data.json)
if (data.json["status"] == "OK" && data.json["results.geometry.location_type"] == "ROOFTOP") {
address <- data.json["results.formatted_address"]
lat <- data.json["results.geometry.location.lat"]
lon <- data.json["results.geometry.location.lng"]
gcodes <- data.frame("Address" = address, "Lon" = as.numeric(lon), "Lat" = as.numeric(lat))
return (gcodes)
} else return ()
}
# Testing...
geoCodes <- getGeoCode("Palo Alto,California")
geoCodes
# "-122.1430195" "37.4418834"
# Required for TractLookup
Washington <-readShapePoly("g:/USCensus/tl_2012_53_tract/tl_2012_53_tract")
# US Census tract files (includes shape and data files)
tractLookup <- function(x) {
# pt <- SpatialPoints(data.frame(x = -80.1, y = 26.3))
pt <- SpatialPoints(data.frame(x = x$Lon, y = x$Lat))
Mapping <- over(pt, Washington) # what index number does pt fall inside?
Mapping <- data.frame(
"GEOID" = as.character(Mapping$GEOID),
"State" = as.character(Mapping$STATEFP) ,
"County" = as.character(Mapping$COUNTYFP),
"Tract" = as.character(Mapping$TRACTCE),
"Tract_Name" = as.character(Mapping$NAME),
"INTPTLAT" = as.character(Mapping$INTPTLAT),
"INTPTLON" = as.character(Mapping$INTPTLON),
stringsAsFactors = FALSE)
Mapping[is.na(Mapping)] <- "NULL"
return(Mapping)
}
tractLookup(data.frame("Lon" = -122, "Lat" = 47.5))
# GEOID State County Tract Tract_Name INTPTLAT INTPTLON
# 1 53033032102 53 033 032102 321.02 +47.4851507 -121.9657839
Looking at the New York fire department shape file, you should be able to change the mapping statement to look for and return the appropriate fields in place of the GEOID and tract information from the standard US Census shape file in my example.
Try it this way.
# Geocoding a csv column of "addresses" in R
#load ggmap
library(ggmap)
# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
# Read in the CSV data and store it in a variable
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)

Resources