Creating Zipcode US map in R and adding Legend on Map - r

I am new to R and need help. I need to make a map of zipcode. The dataframe is as follows -
Zipcode plpn
31139 138
85941 58
85349 104.01
87305 101
86515 98.08
79849 96.98
I want to plot this on US map and apply color by plpn variable.
plpn color
0-50 light blue
50-100 blue
100+ dark blue
Can you kindly tell me the easiest way to do it in R ?
Note - I wanted through the webpages on this but the code seem bit complicated for me.(https://www.r-bloggers.com/my-first-r-package-zipcode/)
I tried the following -
library(maps)
library(zipcode)
data(zipcode)
setwd("C:/Users/rkpanda/Documents/TEMP/Feb2017/")
mktng <- read.csv(file="zipcode_mrktng.csv", header=T, sep = ",")
mktng2 <- merge(mktng, zipcode, by.x='postal_cd', by.y = 'zip')
mktng3 <- subset(mktng2, plpn_by_credit > 50)
map("state")
points( mktng3$longitude, mktng3$latitude, pch= 20, cex= 0.5, col="blue")
Is there a way to apply label (city name associated with zipcode) to the dots that are appearing on the map ?

You can get the shapefiles from here:
https://www.census.gov/geo/maps-data/data/cbf/cbf_zcta.html
This is slow because it is plotting the entire US but it works:
library(rgdal)
library(dplyr)
library(ggplot2)
library(ggmap)
shp <- readOGR('shapefiles/cb_2015_us_zcta510_500k',
'cb_2015_us_zcta510_500k', stringsAsFactors = FALSE)
shpDF <- fortify(shp)
shpData <- shp#data
shpData$AWATER10 <- as.numeric(shpData$AWATER10)
shpData$id <- row.names(shpData)
shpDF <- shpDF %>%
left_join(shpData, by = 'id')
ggplot(shpDF) +
geom_polygon(aes(x = long, y = lat, group = group , fill = (AWATER10)),
color = 'gray') +
theme_minimal() +
xlim(-125, -65) +
ylim(24, 50)
The end result image isnt that great at the zip code level. The resolution of zip codes is too small.
To add in your own data just left join to the shpDF on the ZCTA5CE10 field.

Related

Map with grid cells coloured in function of point density (R, ggplot)

I'm trying to create a map of Europe with grid cells coloured based on the number of records within a cell. Here I attach an image as illustrative of the desired output (see Fig 1 of https://doi.org/10.3897/phytokeys.74.9723).
In order to produce this image I have developed a minimal reproducible example with random points distributed across Europe. I have been able to produce a similar figure with levelplot but I'm particulary interested in doing this with ggplot as it will allow further customising. Is it possible to do produce a similar figure with ggplot? And if so, any advice of what path should I follow?
Note: The size of the grids/cells is irrelevant at the moment but I'll adjust it depending on point density. All of them have to be the same size as in the first example and they only will differ on the pattern of colour.
#Load libraries
library(rgdal) #v1.5-28
library(rgeos) #v.0.5-9
library(ggplot2) # 3.3.5
library(rworldmap) #plot worldmap v.1.3-6
library(dplyr) #v.1.0.7
#Create dataframe of coordinates that fall in Europe
coord <- data.frame(cbind(runif(1000,-15,45),runif(1000,30,75)))
colnames(coord) <- c("long","lat")
#Exlude ocean points following this post
URL <- "http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_ocean.zip"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
fils <- unzip(fil)
oceans <- readOGR(grep("shp$", fils, value=TRUE), "ne_110m_ocean",
stringsAsFactors=FALSE, verbose=FALSE)
europe_coord <- data.frame(long = coord$long,
lat = coord$lat)
coordinates(europe_coord) <- ~long+lat
proj4string(europe_coord) <- CRS(proj4string(oceans))
ocean_points <- over(europe_coord, oceans)
#Add ocean points to dataset
coord$ocean <- ocean_points$featurecla
#Exlude ocean points
europe_land <- coord %>% filter(is.na(ocean))
#Load worldmap
world <- map_data("world")
#Plot europe spatial data
ggplot() + geom_map(data = world, map = world,
aes(long, lat, map_id = region), color = "white",
fill = "lightgray", size = 0.1) +
geom_point(data = europe_land,aes(long, lat),
alpha = 0.7, size = 0.05) + ylim(0,70) +
coord_sf(xlim = c(-15, 45), ylim = c(30, 75), expand = FALSE)

Draw Boundary by zip code and create a heat map

I need to create heat map with 3 digit zip boundary.
I have 3 digit zip and count data like this
zip <- c(790, 791, 792, 793)
count <- c(0, 100, 20, 30)
TX <- data.frame(zip, count)
Also, I draw TX map.
library(ggplot2)
library(ggmap)
library(maps)
library(mapdata)
states <- map_data("state")
texas<- subset(states, region =="texas")
ggplot(data = texas) +
geom_polygon(aes(x = long, y = lat), fill = "gray", color = "black")
What I want to achieve is to (1) draw boundary with 3 digit zip code and (2) create the heat map using count column. The outcome will looks like this with heat map coloring.
This question does not contain reproducible sample data. Hence, I needed some good amount of time to deliver the following. Please provide minimum reproducible data and codes you tried from next time. (I doubt if you really invested time to seriously write your codes.)
Anyway, I think getting a good polygon data for US zip codes is difficult without paying some money. This question provides good information. I obtained data from this link since the data was accessible. You gotta find whatever suitable polygon data for yourself.
I also obtained data for the zip codes in Texas from here and saved it as "zip_code_database.csv."
I added explanation for each code below. So I do not write a thourough explanation here. Basically, you need to merge polygon data by subtracting the first three numbers in the zip codes. You also need to create an aggregated data for whatever the value you have in your data using the 3-digit zip code. The other thing is to find center points of the polygons to add the zip codes as labels.
library(tidyverse)
library(rgdal)
library(rgeos)
library(maptools)
library(ggalt)
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
# Prepare the zip poly data for US
mydata <- readOGR(dsn = ".", layer = "cb_2016_us_zcta510_500k")
# Texas zip code data
zip <- read_csv("zip_code_database.csv")
tx <- filter(zip, state == "TX")
# Get polygon data for TX only
mypoly <- subset(mydata, ZCTA5CE10 %in% tx$zip)
# Create a new group with the first three digit.
# Drop unnecessary factor levels.
# Add a fake numeric variable, which is used for coloring polygons later.
mypoly$group <- substr(mypoly$ZCTA5CE10, 1,3)
mypoly$ZCTA5CE10 <- droplevels(mypoly$ZCTA5CE10)
set.seed(111)
mypoly$value <- sample.int(n = 10000, size = nrow(mypoly), replace = TRUE)
# Merge polygons using the group variable
# Create a data frame for ggplot.
mypoly.union <- unionSpatialPolygons(mypoly, mypoly$group)
mymap <- fortify(mypoly.union)
# Check how polygons are like
plot(mypoly)
plot(mypoly.union, add = T, border = "red", lwd = 1)
# Convert SpatialPolygons to data frame and aggregate the fake values
mypoly.df <- as(mypoly, "data.frame") %>%
group_by(group) %>%
summarise(value = sum(value))
# Find a center point for each zip code area
centers <- data.frame(gCentroid(spgeom = mypoly.union, byid = TRUE))
centers$zip <- rownames(centers)
# Finally, drawing a graphic
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id), map = mymap) +
geom_cartogram(data = mypoly.df, aes(fill = value, map_id = group), map = mymap) +
geom_text_repel(data = centers, aes(label = zip, x = x, y = y), size = 3) +
scale_fill_gradientn(colours = rev(brewer.pal(10, "Spectral"))) +
coord_map() +
theme_map()

Unable to change color in R in a USA map

I am trying to change the background colors in US map for displaying presidential results for different states. I read so many posts regarding this color change but I was not able to change any of those colors. Below is my code, link for dataset and snapshot which I am getting:
#install.packages("ggplot2")
#install.packages("ggmap")
#install.packages("plyr")
#install.packages("raster")
#install.packages("stringr")
library(ggplot2) # for plotting and miscellaneuous things
library(ggmap) # for plotting
library(plyr) # for merging datasets
library(raster) # to get map shape filegeom_polygon
library(stringr) # for string operation
# Get geographic data for USA
usa.shape<-getData("GADM", country = "usa", level = 1)
# Creating a data frame of map data
usa.df <- map_data("state")
#rename 'region' as 'state' and make it a factor variable
colnames(usa.df) [5] <- "State"
usa.df$State <- as.factor(usa.df$State)
#set working directory
setwd("C:/Users/Ashish/Documents/Stats projects/2/")
#input data from file separated by commas
usa.dat <- read.csv("data1.csv", header = T)
# printing data structure
str(usa.df)
# removing % sign from the data, and converting percentage win to numeric
usa.dat$Clinton <- as.numeric(sub("%","",usa.dat$Clinton))/1
usa.dat$Trump <- as.numeric(sub("%","",usa.dat$Trump))/1
usa.dat$Others <- as.numeric(sub("%","",usa.dat$Others))/1
# Creating a winner column based on the percentage
usa.dat$Winner = "Trump"
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$Winner = "Clinton"
usa.dat$State <- tolower(usa.dat$State)
# Creating a chance column which corresponds to winning percentage of the candidate
usa.dat$chance <- usa.dat$Trump
a <- usa.dat[usa.dat$Clinton > usa.dat$Trump,]
usa.dat[usa.dat$Clinton > usa.dat$Trump,]$chance <- a$Clinton
# display the internal structure of the object
usa.dat
#join the usa.df and usa.dat objects on state variable
usa.df <- join(usa.df, usa.dat, by = "State", type = "inner")
str(usa.df)
states <- data.frame(state.center, state.abb) # centers of states and abbreviations
#function for plotting different regions of USA map based on the input data showing different coloring scheme
#for each state.
p <- function(data, title) {
ggp <- ggplot() +
# Draw borders of states
geom_polygon(data = data, aes(x = long, y = lat, group = group,
fill = Winner, alpha=chance), color = "black", size = 0.15) +
#scale_alpha_continuous(range=c(0,1))+
scale_color_gradientn(colours = c("#F08080","white","#5DADE2"),breaks = c(0,50,100),
labels=c("Clinton","Equal","Trump"),
limits=c(0,100),name="Election Forecast") +
# Add state abbreviations
geom_text(data = states, aes(x = x, y = y, label = state.abb), size = 2)+
guides(fill = guide_legend(direction='vertical', title='Candidate', label=TRUE, colours=c("red", "blue")))
return(ggp)
}
figure.title <- "2016 presidential election result"
# Save the map to a file to viewing (you can plot on the screen also, but it takes
# much longer that way. The ratio of US height to width is 1:9.)
#print(p(usa.df, brks.to.use, figure.title))
ggsave(p(usa.df, figure.title), height = 4, width = 4*1.9,
file = "election_result.jpg")
Image link:
Dataset: Dataset link
I would like to get same coloring scheme as displayed in Election forecast gradient.
Thanks to Alistaire for providing his valuable feedbacks and solution for the above problem. Using scale_fill_brewer(type = 'qual', palette = 6) along with ggplot() resolves the above issue in R.

Density count in heatmaps

I have a problem with my heatmap, which displays the density LEVEL, but doesn't say anything about the density count. (how many points are in the same area for example).
My data is divided in more columns, but the most important ones are: lat,lon.
I would like to have something like this, but with "count" : https://stackoverflow.com/a/24615674/5316566,
however when I try to apply the code he uses in that answer, my maximum-"level" density doesn't reflect my density count.( Intead of 7500 I receive for example 6, even if I have thousands and thousands of data concentrated).
That's my code:
us_map_g_str <- get_map(location = c(-90.0,41.5,-81.0,42.7), zoom = 7)
ggmap(us_map_g_str, extent = "device") +
geom_tile(data = data1, aes(x = as.numeric(lon), y = as.numeric(lat)), size = 0.3) +
stat_density2d(data = data1, aes(x = as.numeric(lon), y = as.numeric(lat), fill = ..level.., alpha = ..level..), size = 0.3, bins = 10, geom = "polygon") +
scale_fill_gradient(name= "Ios",low = "green", high = "red", trans= "exp") +
scale_alpha(range = c(0, 0.3), guide = FALSE)
This is what I get:
This is part of the data:
lat lon tag device
1 43.33622 -83.67445 0 iPhone5
2 43.33582 -83.69964 0 iPhone5
3 43.33623 -83.68744 0 iPhone5
4 43.33584 -83.72186 0 iPhone5
5 43.33616 -83.67526 0 iPhone5
6 43.25040 -83.78234 0 iPhone5
(The "tag" column is not important)
REVISED
I realised that my previous answer needs to be revised. So, here it is. If you want to find out how many data points exist in each level of a contour, you actually have a lot of things to do. If you are happy to use the leaflet option below, your life would be much easier.
First, let's get a map of Detroit, and create a sample data frame.
library(dplyr)
library(ggplot2)
library(ggmap)
mymap <- get_map(location = "Detroit", zoom = 8)
### Create a sample data
set.seed(123)
mydata <- data.frame(long = runif(min = -84, max = -82.5, n = 100),
lat = runif(min = 42, max = 42.7, n = 100))
Now, we draw a map and save it as g.
g <- ggmap(mymap) +
stat_density2d(data = mydata,
aes(x = long, y = lat, fill = ..level..),
size = 0.5, bins = 10, geom = "polygon")
The real job begins here. In order to find out the numbers of data points in all levels, you want to employ the data frame, which ggplot generates. In this data frame you have data for polygons. These polygons are used to draw level lines. You can see that in the following image, which I draw three levels on a map.
### Create a data frame so that we can find how many data points exist
### in each level.
mydf <- ggplot_build(g)$data[[4]]
### Check where the polygon lines are. This is just for a check.
check <- ggmap(mymap) +
geom_point(data = mydata, aes(x = long, y = lat)) +
geom_path(data = subset(mydf, group == "1-008"), aes(x = x, y = y)) +
geom_path(data = subset(mydf, group == "1-009"), aes(x = x, y = y)) +
geom_path(data = subset(mydf, group == "1-010"), aes(x = x, y = y))
The next step is to reate a level vector for a legend. We group the data by group (e.g., 1-010) and take the first row for each group using slice(). Then, ungroup the data and choose the 2nd column. Finally, create a vector
with unlist(). We come back to lev in the end.
mydf %>%
group_by(group) %>%
slice(1) %>%
ungroup %>%
select(2) %>%
unlist -> lev
Now we split the polygon data (i.e., mydf) by group and create a polygon for each level. Since we have 11 levels (11 polygons), we use lapply(). In the lapply loop, we need to do; 1) extract column for longitude anf latitude, 2) create polygon, 3) convert polygons to spatial polygons, 4) assign
CRS, 5) create a dummy data frame, and 6) create SpatialPolygonsDataFrames.
mylist <- split(mydf, f = mydf$group)
test <- lapply(mylist, function(x){
xy <- x[, c(3,4)]
circle <- Polygon(xy, hole = as.logical(NA))
SP <- SpatialPolygons(list(Polygons(list(circle), ID = "1")))
proj4string(SP) <- CRS("+proj=longlat +ellps=WGS84")
df <- data.frame(value = 1, row.names = "1")
circleDF <- SpatialPolygonsDataFrame(SP, data = df)
})
Now we go back to the original data. What we need to to is to convert the data frame to SpatialPointsDataFrame. This is because we need to subset data and find how many data points exist in each polygon (in each level). First, get long and lat from your data.frame. Make sure that the order is in lon/lat.
xy <- mydata[,c(1,2)]
Then, we create SPDF (SpatialPolygonsDataFrame). You want to have an identical proj4string between spatial polygons and spatial points data.
spdf <- SpatialPointsDataFrame(coords = xy, data = mydata,
proj4string = CRS("+proj=longlat +ellps=WGS84"))
Then, we subset data (mydata) using each polygon.
ana <- lapply(test, function(y){
mydf <- as.data.frame(spdf[y, ])
})
Data points are overlapping across levels; we have duplication. First we try to find out unique data points for each level. We bind data frames in ana and create a data frame, which is foo1. We also create a data frame, which we want to find unique number of data points. We make sure that columns names are all identical between foo1 and foo2. Using setdiff() and nrow(), we can find the unique number of data points in each level.
total <- lapply(11:2, function(x){
foo1 <- bind_rows(ana[c(11:x)])
foo2 <- as.data.frame(ana[x-1])
names(foo2) <- names(foo1)
nrow(setdiff(foo2, foo1))
})
Finally, we need to find the number of data points for the most inner level, which is level 11. We choose a data frame for level 11 in ana and create a data frame and count the number of row.
bob <- nrow(as.data.frame(ana[11]))
out <- c(bob,unlist(total))
### check if total is 100
### sum(out)
### [1] 100
We assign reversed out as names for lev. This is because we want to show how many data points exist for each level in a legend.
names(lev) <- rev(out)
Now we are ready to add a legend.
final <- g +
scale_fill_continuous(name = "Total",
guide = guide_legend(),
breaks = lev)
final
LEAFLET OPTION
If you use leaflet package, you can group data points with different zooms. Leaflet counts data points in certain areas and indicate numbers in circles like the following figure. The more you zoom in, the more leaflet breaks up data points into small groups. In terms of workload, this is much lighter. In addition, your map is interactive. This may be a better option.
library(leaflet)
leaflet(mydf) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions())

Plotting text labels over geom_polygon data in ggmap in R

I am attempting to make a map with three layers using ggmap. The layers are as follows:
A map of the US (toner-lite)
a set of geometries that color the states on some value (simulated data below)
labels for the state names, as annotations in the center of each state.
To do this I have created a map of US states with states colored by a randomized value (rnorm) and this part is successful. From here I am attempting to print the abbreviations of each state at the longitude and latitude coordinates of each state's center, using geom_text. The part that fails is the 'geom_text' overlay, with the following error:
Error: 'x' and 'units' must have length > 0 In addition: Warning
messages: 1: In gpclibPermit() : support for gpclib will be
withdrawn from maptools at the next major release 2: Removed 855070
rows containing missing values (geom_text).
Here is the script, which I have worked hard to run as on its own. It will download the shapefile and center of state data, as well as to simulate data to fill the states. I've tested it and it works up to what I have commented out (geom_text layer).
I have searched for answers to this already, so please let me know if you have any advice on how to do what I am attempting. If there is a better strategy for placing labels on top of the polygon fills, I am all ears (or eyes in this case).
###Combining Census data with a tract poly shapefile
library(maptools)
library(ggplot2)
library(gpclib)
library(ggmap)
library(rgdal)
library(dplyr)
#Set working directory to where you want your files to exist (or where they already exist)
setwd('~/Documents/GIS/USCensus/')
#Read and translate coord data for shape file of US States
if(!file.exists('tl_2014_us_state.shp')){
download.file('ftp://ftp2.census.gov/geo/tiger/TIGER2014/STATE/tl_2014_us_state.zip',
'tl_2014_us_state.zip')
files <- unzip('tl_2014_us_state.zip')
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
} else {
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
}
#two column dataset of state abbreviations and center of state
#Downloadable from: https://dev.maxmind.com/static/csv/codes/state_latlon.csv
if(!file.exists('state_latlon.csv')){
download.file('http://dev.maxmind.com/static/csv/codes/state_latlon.csv','state_latlon.csv')
}
centers <- read.csv('state_latlon.csv')
#Change values of longitude and latitude from state center data so as not to interfere with shapefile at merge
names(centers)[2:3] <- c('long_c','lat_c')
#simulated data for plotting values
mydata<- data.frame(rnorm(55, 0, 1)) #55 "states" in the coord dataset for state centers
names(mydata)[1] <- 'value'
#hold names in tract dataset and for simulated data
ntract<-names(tract)
ndata<-names(mydata)
#Turn geo data into R dataframe
gpclibPermit()
tract_geom<-fortify(tract,region="STUSPS")
#Merge state geo data with simulated data
state_data <- cbind(centers,mydata)
#merge state center and value data with shapefile data
tract_poly <- merge(state_data,tract_geom,by.x="state",by.y="id", all = F)
tract_poly<-tract_poly[order(tract_poly$order),]
#Create map of US
mymap <- get_stamenmap(bbox = c(left = -124.848974,
bottom = 24.396308,
right = -66.885444,
top = 49.384358),zoom=5,
maptype="toner-lite")
#This plots a map of the US with just the state names as labels (and a few other landmarks). Used for reference
USMap <- ggmap(mymap,extent='device') +
geom_polygon(aes(x = long, y = lat, group = group, fill = value),
data = tract_poly,
alpha = 1,
color = "black",
size = 0.2) #+
# geom_text(aes(x = long_c, y = lat_c, group = group, label = state),
# data= tract_poly,
# alpha = 1,
# color = "black")
USMap
That's a strange error message for what ended up being the problem. Somewhere along the way you have flipped the latitude and longitude for centers. (I also took into account elpi's advice above and didn't plot the Initials repeatedly by using your centers dataset directly). The code below works, but I'd recommend fixing your centers dataset.
centers$new_long <- centers$lat_c
centers$new_lat <- centers$long_c
USMap <- ggmap(mymap,extent='device') +
geom_polygon(aes(x = long, y = lat, group = group, fill = value),
data = tract_poly,
alpha = 1,
color = "black",
size = 0.2) +
geom_text(aes(x = new_long, y = new_lat, label = state),
data= centers,
alpha = 1,
color = "black")
Try this
centroids <- setNames(do.call("rbind.data.frame", by(tract_poly, tract_poly$group, function(x) {Polygon(x[c('long', 'lat')])#labpt})), c('long', 'lat'))
centroids$label <- tract_poly$state[match(rownames(centroids), tract_poly$group)]
USMap + with(centroids, annotate(geom="text", x = long, y=lat, label = label, size = 2.5))
(via)

Resources