Draw Boundary by zip code and create a heat map - r

I need to create heat map with 3 digit zip boundary.
I have 3 digit zip and count data like this
zip <- c(790, 791, 792, 793)
count <- c(0, 100, 20, 30)
TX <- data.frame(zip, count)
Also, I draw TX map.
library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
states <- map_data("state")
texas<- subset(states, region =="texas")
ggplot(data = texas) +
geom_polygon(aes(x = long, y = lat), fill = "gray", color = "black")
What I want to achieve is to (1) draw boundary with 3 digit zip code and (2) create the heat map using count column. The outcome will looks like this with heat map coloring.

This question does not contain reproducible sample data. Hence, I needed some good amount of time to deliver the following. Please provide minimum reproducible data and codes you tried from next time. (I doubt if you really invested time to seriously write your codes.)
Anyway, I think getting a good polygon data for US zip codes is difficult without paying some money. This question provides good information. I obtained data from this link since the data was accessible. You gotta find whatever suitable polygon data for yourself.
I also obtained data for the zip codes in Texas from here and saved it as "zip_code_database.csv."
I added explanation for each code below. So I do not write a thourough explanation here. Basically, you need to merge polygon data by subtracting the first three numbers in the zip codes. You also need to create an aggregated data for whatever the value you have in your data using the 3-digit zip code. The other thing is to find center points of the polygons to add the zip codes as labels.
library(tidyverse)
library(rgdal)
library(rgeos)
library(maptools)
library(ggalt)
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
# Prepare the zip poly data for US
mydata <- readOGR(dsn = ".", layer = "cb_2016_us_zcta510_500k")
# Texas zip code data
zip <- read_csv("zip_code_database.csv")
tx <- filter(zip, state == "TX")
# Get polygon data for TX only
mypoly <- subset(mydata, ZCTA5CE10 %in% tx$zip)
# Create a new group with the first three digit.
# Drop unnecessary factor levels.
# Add a fake numeric variable, which is used for coloring polygons later.
mypoly$group <- substr(mypoly$ZCTA5CE10, 1,3)
mypoly$ZCTA5CE10 <- droplevels(mypoly$ZCTA5CE10)
set.seed(111)
mypoly$value <- sample.int(n = 10000, size = nrow(mypoly), replace = TRUE)
# Merge polygons using the group variable
# Create a data frame for ggplot.
mypoly.union <- unionSpatialPolygons(mypoly, mypoly$group)
mymap <- fortify(mypoly.union)
# Check how polygons are like
plot(mypoly)
plot(mypoly.union, add = T, border = "red", lwd = 1)
# Convert SpatialPolygons to data frame and aggregate the fake values
mypoly.df <- as(mypoly, "data.frame") %>%
group_by(group) %>%
summarise(value = sum(value))
# Find a center point for each zip code area
centers <- data.frame(gCentroid(spgeom = mypoly.union, byid = TRUE))
centers$zip <- rownames(centers)
# Finally, drawing a graphic
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id), map = mymap) +
geom_cartogram(data = mypoly.df, aes(fill = value, map_id = group), map = mymap) +
geom_text_repel(data = centers, aes(label = zip, x = x, y = y), size = 3) +
scale_fill_gradientn(colours = rev(brewer.pal(10, "Spectral"))) +
coord_map() +
theme_map()

Related

Proximity Maps using R

I'm looking to create some proximity maps using R, which show how far areas are from certain points. I can't find any examples in R code, but I've found an output which is the sort of thing I want:
It doesn't necessarily have to have all the labelling/internal boundaries wizardry, but I'd like it to stop at the sea border (thinking of using the rgeos function gintersection - see here).
I've tried doing a density plot as 'heatmaps' (this would be a pretty good solution/alternative) and putting a shapefile over the top (following this suggestion, but they're not lining up and I can't do a gintersection, probably because there's not a coordinate system attached to the density plot.
I used your question to play a little with new libraries...
Get a UK map and define random points
library(raster)
library(sf)
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
library(purrr)
# Get UK map
GBR <- getData(name = "GADM", country = "GBR", level = 1)
GBR_sf <- st_as_sf(GBR)
# Define 3 points on the UK map
pts <- matrix(c(-0.4966766, -2.0772529, -3.8437793,
51.91829, 52.86147, 56.73899), ncol = 2)
# Project in mercator to allow buffer with distances
pts_sf <- st_sfc(st_multipoint(pts), crs = 4326) %>%
st_sf() %>%
st_transform(27700)
ggplot() +
geom_sf(data = GBR_sf) +
geom_sf(data = pts_sf, colour = "red")
Calculate buffer areas
We create a list of multipolygons for each buffer distance. The point dataset must be in projected coordinates (here mercator) as buffer distance is in the scale of the coordinates system.
# Define distances to buffer
dists <- seq(5000, 150000, length.out = 5)
# Create buffer areas with each distances
pts_buf <- purrr::map(dists, ~st_buffer(pts_sf, .)) %>%
do.call("rbind", .) %>%
st_cast() %>%
mutate(
distmax = dists,
dist = glue::glue("<{dists/1000} km"))
# Plot: alpha allows to see overlapping polygons
ggplot() +
geom_sf(data = GBR_sf) +
geom_sf(data = pts_buf, fill = "red",
colour = NA, alpha = 0.1)
Remove overlapping
Buffer areas are overlapping. On the figure above, the more intense red color is due to multiple overlapping layers of transparent red. Let's remove the overlapping. We need to remove from larger areas, the buffer with the lower size. I then need to add again the smallest area to the list.
# Remove part of polygons overlapping smaller buffer
pts_holes <- purrr::map2(tail(1:nrow(pts_buf),-1),
head(1:nrow(pts_buf),-1),
~st_difference(pts_buf[.x,], pts_buf[.y,])) %>%
do.call("rbind", .) %>%
st_cast() %>%
select(-distmax.1, -dist.1)
# Add smallest polygon
pts_holes_tot <- pts_holes %>%
rbind(filter(pts_buf, distmax == min(dists))) %>%
arrange(distmax) %>%
mutate(dist = forcats::fct_reorder(dist, distmax))
# Plot and define color according to dist
ggplot() +
geom_sf(data = GBR_sf) +
geom_sf(data = pts_holes_tot,
aes(fill = dist),
colour = NA) +
scale_fill_brewer(direction = 2)
Remove areas in the sea
If you want to find proximity area on terrestrial parts only, we need to remove buffer areas that are in the sea. Intersection is computed between multipolygons with the same projection. I previously realize an union of the UK map.
# Remove part of polygons in the sea
# Union and projection of UK map
GBR_sf_merc <- st_transform(st_union(GBR_sf), 27700)
pts_holes_uk <- st_intersection(pts_holes_tot,
GBR_sf_merc)
ggplot() +
geom_sf(data = GBR_sf) +
geom_sf(data = pts_holes_uk,
aes(fill = dist),
colour = NA) +
scale_fill_brewer(direction = 2)
And here is the final proximity map using sf, ggplot2 and a few other libraries...
Based on Sébastien's example, a more old-fashioned approach:
library(raster)
GBR <- getData(name = "GADM", country = "GBR", level = 1)
pts <- matrix(c(-0.4966766, -2.0772529, -3.8437793, 51.91829, 52.86147, 56.73899), ncol = 2)
r <- raster(GBR, res=1/12)
d <- distanceFromPoints(r, pts)
m <- mask(d, GBR)
plot(m)

Unable to change color in R in a USA map

I am trying to change the background colors in US map for displaying presidential results for different states. I read so many posts regarding this color change but I was not able to change any of those colors. Below is my code, link for dataset and snapshot which I am getting:
#install.packages("ggplot2")
#install.packages("ggmap")
#install.packages("plyr")
#install.packages("raster")
#install.packages("stringr")
library(ggplot2) # for plotting and miscellaneuous things
library(ggmap) # for plotting
library(plyr) # for merging datasets
library(raster) # to get map shape filegeom_polygon
library(stringr) # for string operation
# Get geographic data for USA
usa.shape<-getData("GADM", country = "usa", level = 1)
# Creating a data frame of map data
usa.df <- map_data("state")
#rename 'region' as 'state' and make it a factor variable
colnames(usa.df) [5] <- "State"
usa.df$State <- as.factor(usa.df$State)
#set working directory
setwd("C:/Users/Ashish/Documents/Stats projects/2/")
#input data from file separated by commas
usa.dat <- read.csv("data1.csv", header = T)
# printing data structure
str(usa.df)
# removing % sign from the data, and converting percentage win to numeric
usa.dat$Clinton <- as.numeric(sub("%","",usa.dat$Clinton))/1
usa.dat$Trump <- as.numeric(sub("%","",usa.dat$Trump))/1
usa.dat$Others <- as.numeric(sub("%","",usa.dat$Others))/1
# Creating a winner column based on the percentage
usa.dat$Winner = "Trump"
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$Winner = "Clinton"
usa.dat$State <- tolower(usa.dat$State)
# Creating a chance column which corresponds to winning percentage of the candidate
usa.dat$chance <- usa.dat$Trump
a <- usa.dat[usa.dat$Clinton > usa.dat$Trump,]
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$chance <- a$Clinton
# display the internal structure of the object
usa.dat
#join the usa.df and usa.dat objects on state variable
usa.df <- join(usa.df, usa.dat, by = "State", type = "inner")
str(usa.df)
states <- data.frame(state.center, state.abb) # centers of states and abbreviations
#function for plotting different regions of USA map based on the input data showing different coloring scheme
#for each state.
p <- function(data, title) {
ggp <- ggplot() +
# Draw borders of states
geom_polygon(data = data, aes(x = long, y = lat, group = group,
fill = Winner, alpha=chance), color = "black", size = 0.15) +
#scale_alpha_continuous(range=c(0,1))+
scale_color_gradientn(colours = c("#F08080","white","#5DADE2"),breaks = c(0,50,100),
labels=c("Clinton","Equal","Trump"),
limits=c(0,100),name="Election Forecast") +
# Add state abbreviations
geom_text(data = states, aes(x = x, y = y, label = state.abb), size = 2)+
guides(fill = guide_legend(direction='vertical', title='Candidate', label=TRUE, colours=c("red", "blue")))
return(ggp)
}
figure.title <- "2016 presidential election result"
# Save the map to a file to viewing (you can plot on the screen also, but it takes
# much longer that way. The ratio of US height to width is 1:9.)
#print(p(usa.df, brks.to.use, figure.title))
ggsave(p(usa.df, figure.title), height = 4, width = 4*1.9,
file = "election_result.jpg")
Image link:
Dataset: Dataset link
I would like to get same coloring scheme as displayed in Election forecast gradient.
Thanks to Alistaire for providing his valuable feedbacks and solution for the above problem. Using scale_fill_brewer(type = 'qual', palette = 6) along with ggplot() resolves the above issue in R.

Creating Zipcode US map in R and adding Legend on Map

I am new to R and need help. I need to make a map of zipcode. The dataframe is as follows -
Zipcode plpn
31139 138
85941 58
85349 104.01
87305 101
86515 98.08
79849 96.98
I want to plot this on US map and apply color by plpn variable.
plpn color
0-50 light blue
50-100 blue
100+ dark blue
Can you kindly tell me the easiest way to do it in R ?
Note - I wanted through the webpages on this but the code seem bit complicated for me.(https://www.r-bloggers.com/my-first-r-package-zipcode/)
I tried the following -
library(maps)
library(zipcode)
data(zipcode)
setwd("C:/Users/rkpanda/Documents/TEMP/Feb2017/")
mktng <- read.csv(file="zipcode_mrktng.csv", header=T, sep = ",")
mktng2 <- merge(mktng, zipcode, by.x='postal_cd', by.y = 'zip')
mktng3 <- subset(mktng2, plpn_by_credit > 50)
map("state")
points( mktng3$longitude, mktng3$latitude, pch= 20, cex= 0.5, col="blue")
Is there a way to apply label (city name associated with zipcode) to the dots that are appearing on the map ?
You can get the shapefiles from here:
https://www.census.gov/geo/maps-data/data/cbf/cbf_zcta.html
This is slow because it is plotting the entire US but it works:
library(rgdal)
library(dplyr)
library(ggplot2)
library(ggmap)
shp <- readOGR('shapefiles/cb_2015_us_zcta510_500k',
'cb_2015_us_zcta510_500k', stringsAsFactors = FALSE)
shpDF <- fortify(shp)
shpData <- shp#data
shpData$AWATER10 <- as.numeric(shpData$AWATER10)
shpData$id <- row.names(shpData)
shpDF <- shpDF %>%
left_join(shpData, by = 'id')
ggplot(shpDF) +
geom_polygon(aes(x = long, y = lat, group = group , fill = (AWATER10)),
color = 'gray') +
theme_minimal() +
xlim(-125, -65) +
ylim(24, 50)
The end result image isnt that great at the zip code level. The resolution of zip codes is too small.
To add in your own data just left join to the shpDF on the ZCTA5CE10 field.

Density count in heatmaps

I have a problem with my heatmap, which displays the density LEVEL, but doesn't say anything about the density count. (how many points are in the same area for example).
My data is divided in more columns, but the most important ones are: lat,lon.
I would like to have something like this, but with "count" : https://stackoverflow.com/a/24615674/5316566,
however when I try to apply the code he uses in that answer, my maximum-"level" density doesn't reflect my density count.( Intead of 7500 I receive for example 6, even if I have thousands and thousands of data concentrated).
That's my code:
us_map_g_str <- get_map(location = c(-90.0,41.5,-81.0,42.7), zoom = 7)
ggmap(us_map_g_str, extent = "device") +
geom_tile(data = data1, aes(x = as.numeric(lon), y = as.numeric(lat)), size = 0.3) +
stat_density2d(data = data1, aes(x = as.numeric(lon), y = as.numeric(lat), fill = ..level.., alpha = ..level..), size = 0.3, bins = 10, geom = "polygon") +
scale_fill_gradient(name= "Ios",low = "green", high = "red", trans= "exp") +
scale_alpha(range = c(0, 0.3), guide = FALSE)
This is what I get:
This is part of the data:
lat lon tag device
1 43.33622 -83.67445 0 iPhone5
2 43.33582 -83.69964 0 iPhone5
3 43.33623 -83.68744 0 iPhone5
4 43.33584 -83.72186 0 iPhone5
5 43.33616 -83.67526 0 iPhone5
6 43.25040 -83.78234 0 iPhone5
(The "tag" column is not important)
REVISED
I realised that my previous answer needs to be revised. So, here it is. If you want to find out how many data points exist in each level of a contour, you actually have a lot of things to do. If you are happy to use the leaflet option below, your life would be much easier.
First, let's get a map of Detroit, and create a sample data frame.
library(dplyr)
library(ggplot2)
library(ggmap)
mymap <- get_map(location = "Detroit", zoom = 8)
### Create a sample data
set.seed(123)
mydata <- data.frame(long = runif(min = -84, max = -82.5, n = 100),
lat = runif(min = 42, max = 42.7, n = 100))
Now, we draw a map and save it as g.
g <- ggmap(mymap) +
stat_density2d(data = mydata,
aes(x = long, y = lat, fill = ..level..),
size = 0.5, bins = 10, geom = "polygon")
The real job begins here. In order to find out the numbers of data points in all levels, you want to employ the data frame, which ggplot generates. In this data frame you have data for polygons. These polygons are used to draw level lines. You can see that in the following image, which I draw three levels on a map.
### Create a data frame so that we can find how many data points exist
### in each level.
mydf <- ggplot_build(g)$data[[4]]
### Check where the polygon lines are. This is just for a check.
check <- ggmap(mymap) +
geom_point(data = mydata, aes(x = long, y = lat)) +
geom_path(data = subset(mydf, group == "1-008"), aes(x = x, y = y)) +
geom_path(data = subset(mydf, group == "1-009"), aes(x = x, y = y)) +
geom_path(data = subset(mydf, group == "1-010"), aes(x = x, y = y))
The next step is to reate a level vector for a legend. We group the data by group (e.g., 1-010) and take the first row for each group using slice(). Then, ungroup the data and choose the 2nd column. Finally, create a vector
with unlist(). We come back to lev in the end.
mydf %>%
group_by(group) %>%
slice(1) %>%
ungroup %>%
select(2) %>%
unlist -> lev
Now we split the polygon data (i.e., mydf) by group and create a polygon for each level. Since we have 11 levels (11 polygons), we use lapply(). In the lapply loop, we need to do; 1) extract column for longitude anf latitude, 2) create polygon, 3) convert polygons to spatial polygons, 4) assign
CRS, 5) create a dummy data frame, and 6) create SpatialPolygonsDataFrames.
mylist <- split(mydf, f = mydf$group)
test <- lapply(mylist, function(x){
xy <- x[, c(3,4)]
circle <- Polygon(xy, hole = as.logical(NA))
SP <- SpatialPolygons(list(Polygons(list(circle), ID = "1")))
proj4string(SP) <- CRS("+proj=longlat +ellps=WGS84")
df <- data.frame(value = 1, row.names = "1")
circleDF <- SpatialPolygonsDataFrame(SP, data = df)
})
Now we go back to the original data. What we need to to is to convert the data frame to SpatialPointsDataFrame. This is because we need to subset data and find how many data points exist in each polygon (in each level). First, get long and lat from your data.frame. Make sure that the order is in lon/lat.
xy <- mydata[,c(1,2)]
Then, we create SPDF (SpatialPolygonsDataFrame). You want to have an identical proj4string between spatial polygons and spatial points data.
spdf <- SpatialPointsDataFrame(coords = xy, data = mydata,
proj4string = CRS("+proj=longlat +ellps=WGS84"))
Then, we subset data (mydata) using each polygon.
ana <- lapply(test, function(y){
mydf <- as.data.frame(spdf[y, ])
})
Data points are overlapping across levels; we have duplication. First we try to find out unique data points for each level. We bind data frames in ana and create a data frame, which is foo1. We also create a data frame, which we want to find unique number of data points. We make sure that columns names are all identical between foo1 and foo2. Using setdiff() and nrow(), we can find the unique number of data points in each level.
total <- lapply(11:2, function(x){
foo1 <- bind_rows(ana[c(11:x)])
foo2 <- as.data.frame(ana[x-1])
names(foo2) <- names(foo1)
nrow(setdiff(foo2, foo1))
})
Finally, we need to find the number of data points for the most inner level, which is level 11. We choose a data frame for level 11 in ana and create a data frame and count the number of row.
bob <- nrow(as.data.frame(ana[11]))
out <- c(bob,unlist(total))
### check if total is 100
### sum(out)
### [1] 100
We assign reversed out as names for lev. This is because we want to show how many data points exist for each level in a legend.
names(lev) <- rev(out)
Now we are ready to add a legend.
final <- g +
scale_fill_continuous(name = "Total",
guide = guide_legend(),
breaks = lev)
final
LEAFLET OPTION
If you use leaflet package, you can group data points with different zooms. Leaflet counts data points in certain areas and indicate numbers in circles like the following figure. The more you zoom in, the more leaflet breaks up data points into small groups. In terms of workload, this is much lighter. In addition, your map is interactive. This may be a better option.
library(leaflet)
leaflet(mydf) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions())

Plotting text labels over geom_polygon data in ggmap in R

I am attempting to make a map with three layers using ggmap. The layers are as follows:
A map of the US (toner-lite)
a set of geometries that color the states on some value (simulated data below)
labels for the state names, as annotations in the center of each state.
To do this I have created a map of US states with states colored by a randomized value (rnorm) and this part is successful. From here I am attempting to print the abbreviations of each state at the longitude and latitude coordinates of each state's center, using geom_text. The part that fails is the 'geom_text' overlay, with the following error:
Error: 'x' and 'units' must have length > 0 In addition: Warning
messages: 1: In gpclibPermit() : support for gpclib will be
withdrawn from maptools at the next major release 2: Removed 855070
rows containing missing values (geom_text).
Here is the script, which I have worked hard to run as on its own. It will download the shapefile and center of state data, as well as to simulate data to fill the states. I've tested it and it works up to what I have commented out (geom_text layer).
I have searched for answers to this already, so please let me know if you have any advice on how to do what I am attempting. If there is a better strategy for placing labels on top of the polygon fills, I am all ears (or eyes in this case).
###Combining Census data with a tract poly shapefile
library(maptools)
library(ggplot2)
library(gpclib)
library(ggmap)
library(rgdal)
library(dplyr)
#Set working directory to where you want your files to exist (or where they already exist)
setwd('~/Documents/GIS/USCensus/')
#Read and translate coord data for shape file of US States
if(!file.exists('tl_2014_us_state.shp')){
download.file('ftp://ftp2.census.gov/geo/tiger/TIGER2014/STATE/tl_2014_us_state.zip',
'tl_2014_us_state.zip')
files <- unzip('tl_2014_us_state.zip')
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
} else {
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
}
#two column dataset of state abbreviations and center of state
#Downloadable from: https://dev.maxmind.com/static/csv/codes/state_latlon.csv
if(!file.exists('state_latlon.csv')){
download.file('http://dev.maxmind.com/static/csv/codes/state_latlon.csv','state_latlon.csv')
}
centers <- read.csv('state_latlon.csv')
#Change values of longitude and latitude from state center data so as not to interfere with shapefile at merge
names(centers)[2:3] <- c('long_c','lat_c')
#simulated data for plotting values
mydata<- data.frame(rnorm(55, 0, 1)) #55 "states" in the coord dataset for state centers
names(mydata)[1] <- 'value'
#hold names in tract dataset and for simulated data
ntract<-names(tract)
ndata<-names(mydata)
#Turn geo data into R dataframe
gpclibPermit()
tract_geom<-fortify(tract,region="STUSPS")
#Merge state geo data with simulated data
state_data <- cbind(centers,mydata)
#merge state center and value data with shapefile data
tract_poly <- merge(state_data,tract_geom,by.x="state",by.y="id", all = F)
tract_poly<-tract_poly[order(tract_poly$order),]
#Create map of US
mymap <- get_stamenmap(bbox = c(left = -124.848974,
bottom = 24.396308,
right = -66.885444,
top = 49.384358),zoom=5,
maptype="toner-lite")
#This plots a map of the US with just the state names as labels (and a few other landmarks). Used for reference
USMap <- ggmap(mymap,extent='device') +
geom_polygon(aes(x = long, y = lat, group = group, fill = value),
data = tract_poly,
alpha = 1,
color = "black",
size = 0.2) #+
# geom_text(aes(x = long_c, y = lat_c, group = group, label = state),
# data= tract_poly,
# alpha = 1,
# color = "black")
USMap
That's a strange error message for what ended up being the problem. Somewhere along the way you have flipped the latitude and longitude for centers. (I also took into account elpi's advice above and didn't plot the Initials repeatedly by using your centers dataset directly). The code below works, but I'd recommend fixing your centers dataset.
centers$new_long <- centers$lat_c
centers$new_lat <- centers$long_c
USMap <- ggmap(mymap,extent='device') +
geom_polygon(aes(x = long, y = lat, group = group, fill = value),
data = tract_poly,
alpha = 1,
color = "black",
size = 0.2) +
geom_text(aes(x = new_long, y = new_lat, label = state),
data= centers,
alpha = 1,
color = "black")
Try this
centroids <- setNames(do.call("rbind.data.frame", by(tract_poly, tract_poly$group, function(x) {Polygon(x[c('long', 'lat')])#labpt})), c('long', 'lat'))
centroids$label <- tract_poly$state[match(rownames(centroids), tract_poly$group)]
USMap + with(centroids, annotate(geom="text", x = long, y=lat, label = label, size = 2.5))
(via)

Resources