Complex Maps in R Using ggplot2 - r

I recently needed to create a map using combined statistical areas for the US - I want to share how this can be done. The same applies if one wants to use zip codes or other divisions (county, congressional districts, etc.)

1) You need to obtain shapefiles that are needed by any program to draw a map - these basically define the polygons that create the shapes (states, zips, csa, etc.). In the US the Census bureau provides those at
http://www.census.gov/cgi-bin/geo/shapefiles2013/main
(I am sorry but these are so large that I can't provide the data for an example). Unzip them and place them into your directory of choice. Make sure you set it as the working directory using setwd()
2) The files need to be imported into a spatial dataframe object in R, using readShapeSpatial in package maptools. In my case I read 2 files, state to draw the boundary and the csa file to draw the metro areas. For simplicity I created an id field in each, this is not needed. Please note rgeos is required as well, by default maptools tries to use gpoclib but there are some license issues, rgeos doe snot have those
require(maptools)
require(rgeos)
cbsa_bdry = readShapeSpatial("tl_2013_us_csa.shp")
cbsa_bdry#data$id <- as.character(cbsa_bdry#data$GEOID)
state_bdry = readShapeSpatial("tl_2013_us_state.shp")
state_bdry#data$id <- as.character(state_bdry#data$STUSPS)
3) the files need to be fortified to be used in ggplot - as far as I can tell this is creating a dataffame for plotting from the spatial data set. I do not understand it but it is needed. Here you use the id field - in my case state and the csa number
cbsa_map_f <- fortify(cbsa_bdry, region="id")
state_map_f <- fortify(state_bdry, region="id")
At this point the fortified files can be mapped using ggplot, however normally you want some other data to be mapped with them, for example a variable of interest that defines the color of the shape (it can be say the party that holds the governor mansion, or the average income, or whatever your analysis dictates)
4) I will merge back the data portion of the spatial dataframe for this example, in this case I am not mergin any other data. Please note, I learned the hard way that using merge can mess up the data due to sorting, if you use merge make sure the data is not sorted as part of it, I am using join for that reason.
require(plyr)
state_map_df <-join(state_map_f, state_bdry#data, by="id")
state_map_df$state <- state_map_df$STUSPS
5) Do your chart - you need to draw the layers in order, the bottom layer first, then second one, or one can obscure the other. For simplicity I am only drawing Texas
tx <-subset(cbsa_map_df,state=="TX" )
tx1 <- subset(state_map_df,state=="TX")
require(ggplot2)
require(mapproj)
sample <- ggplot(NULL) +geom_polygon(data=tx1,aes(long,lat,group=group),color="blue",fill="light gray")+coord_map(projection="mercator")+theme_bw()
sample <- sample + geom_polygon(data=tx,aes(long,lat,fill=NAME,group=group),color="black")+theme(legend.position="bottom",axis.text.x = element_blank(),axis.text.y = element_blank(),axis.ticks = element_blank())
sample<- sample+guides(fill=guide_legend(title="MSA Name",nrow = 3,title.position="top",title.hjust=0.5,title.theme=element_text(face="bold",angle=0)))
sample <- sample + scale_x_continuous("")+scale_y_continuous("")
sample
I am not passing a dataframne to the initial gplot call to open the chart as each layer uses a different dataframe for data. If I had merged data to use a continuous scale I would use that on the fill= for the csa portion

Related

tmap is plotting a different legend (range of values?) for a cropped rasterlayer compared to original raster

I am extremely new to working with spatial data and so most of what I'm about to say is me trying to speak a foreign language. Right now I am trying to learn how to do this all in R (I am slightly more capable with this data in QGIS but for this solution, I am looking for R only).
My research involves ecological data in Pennsylvania (PA) and so I am playing around with cropping the US NLCD dataset to PA. I have a raster layer for the NLCD and a shapefile for the boundary of Pennsylvania. I am able to successfully crop the larger US raster down to PA as follows:
library(raster)
library(rgdal)
pabound <- readOGR(dsn="...",
layer="PAbound")
nlcdRast <- raster(".../NLCD_2016_Land_Cover_L48_20190424.img")
pabound <- spTransform(pabound,CRS(proj4string(nlcdRast)))
PAnlcd <- raster::crop(nlcdRast,pabound)
If I run the simple plot command for both nlcdRast and PAnlcd (i.e. plot(nlcdRast) they maintain the same color scheme. But when I run it through tmap it seems to look at the cropped data differently and I am not exactly sure how to figure this out. Please see the plots below:
library(tmap)
tm_shape(nlcdRast) +
tm_raster()
And then when I plot the cropped version in tmap:
tm_shape(PAnlcd) +
tm_raster()
As you can see, it is not simply the color palette that is changing (I am confident I could figure that out) but the real problem is I'm losing the important information as seen in the legend. Whereas the full plot actually shows the categorical values for the raster NLCD, the cropped version now seems to show just some unknown numerical range. Even though it looks bad at the moment, I'd like to have the same legend/information as seen in the full US map.
I apologize for not having a more reproducible example but I am completely lost on what is happening here so I can't quite replicate it. I suppose right now I'm just looking for where to look to try and figure out what changed. Thank you in advance.
Cropping is changing the way the pixels are represented. To maintain your values use the stars package (also note I'm using the sf package for the shapefile):
library(stars)
library(sf)
# load in NLCD
nlcdRast <- read_stars(".../NLCD_2016_Land_Cover_L48_20190424.img")
# read in study area
pabound <- st_read(dsn="...", layer="PAbound")
# reproject pabound to match NLCD
pabound <- st_transform(pabound, CRSobj = crs(nlcdRast))
# now crop
panlcd <- st_crop(nlcdRast, pabound)

Extracting gridded (netcdf) climate data for an entire watershed polygon

My goal is to extract precipitation data from the Daymet database (https://daymet.ornl.gov/) for each of my 68 watershed polygons. I was able to use the ncdf4 package to download the data:
download_daymet_ncss(location = c(53.5,-116.6, 48,-98),
start = 1992,
end = 2000,
param = "prcp",
path = "./Daymet gridded data/Precip_raw")#download the data
I realize this is quite a large area and might be part of the issue.
Once the data is downloaded, the challenge begins.
I've tried two approaches, the first to aggregate the data into annual values (using daymet_grid_agg from the daymetr package). But then extracting the correct areas from the raster generated is challenging (and I haven't been able to do successfully).
I then tried to use the RavenR package to generate a grid overlay from a netcdf (rvn_netcdf_to_gridshp).
fn<-"prcp_daily_1992_ncss.nc"
ncfile<-nc_open(fn)
outshp<-rvn_netcdf_to_gridshp(ncfile, projID = 4326)
This fails completely - either RStudio freezes, cannont allocate 2.7 GB, or the shapefile is empty. I have tried increasing memory size, but then R is just running forever and nothing seems to happen.
Next, I tried this simple approach (as per https://www.researchgate.net/post/How-to-get-data-only-within-shapefile-boundary-from-a-netcdf-data-file-in-R-software):
shp<-st_read(file.choose())
data<-brick(file.choose())
crs(data) <- "+proj=lcc +lon_0=-90 +lat_1=33 +lat_2=45"
output<-raster::mask(data, shp)
The output brick raster is full of NAs...
I have made the downloaded netcdf quite a bit smaller, but none of these approaches seem to work... And yes, my data are gridded (unlike in this case: How to extract NetCDF data frame by region using a polygon shapefile)

Crop out all mapped area beyond region of 13 U.S. states

I've just started with mapping in R and I've managed to convert a lat, lon dataframe to a raster file and then plot state borders on top of that.
Now I need to get it ready to publish and would like to include only the shape of my 13-state region (and no great lakes).
library(tmap)
usa <- st_as_sf(maps::map("state", fill=TRUE, plot =FALSE))
map_us<- tm_shape(usa)+
tm_borders()
tm_shape(raster_file) +
tm_raster(style = "cont", palette = "viridis", midpoint = 0)+
map_us
I'm having a hard time finding something out there that would provide a polygon for multiple states and I have been through a lot of mapping packages. Sorry I can't include my raster data here.
To crop a raster file to {sf} vector shape you have in principle two options:
crop at data level; this involves raster::mask() with possibly raster::crop() later to reduce the extent of the raster (masked raster retains the original size)
retain the data, and overlay a white polygon with a hole over your plot
Cropping on data level is more "pure", but will leave you with ragged edges (a raster cell has to be square).
Overlaying a white polygon is not so pure, but might be preferable if your key aim is a slick presentation (and purity of essence be damned...)
You will find both of them discussed, together with examples and sample code, in this post on the RStudio community site.

Rasterize error: Polygon to raster conversion produces horizontal lines

I am working with shapefiles in R that I need to convert from polygon to raster. While the vectors look perfect when plotted, when converted to raster using 'rasterize' they produce erroneous horizontal lines. Here is an example of the problem:
Here is a generic example of the code that I am using (sorry that I cannot upload the data itself as it is proprietary):
spdf.dat <- readOGR("directory here", "layer here")
# Plot polygon
plot(spdf.dat, col = 'dimgrey', border = 'black')
# Extract boundaries
ext <- extent(spdf.dat)
# Set resolution for rasterization
res <- 1
# determine no. of columns from extents and resolution
yrow <- round((ext#ymax - ext#ymin) / res)
xcol <- round((ext#xmax - ext#xmin) / res)
# Rasterize base
rast.base <- raster(ext, yrow, xcol, crs = projection(spdf.dat))
# Rasterize substrate polygons
rast <- rasterize(spdf.dat, rast.base, field = 1, fun = 'min', progress='text')
plot(rast, col = 'dimgrey')
Does this seem to be a problem with the source data or the rasterize function? Has anyone seen this sort of error before? Thank you for any advice that you can provide.
To make it official so the question is considered answered, I'll copy my commented responses here. You can therefor accept it.
When I look at your figure, it seems to me that the problematic appearing lines in the raster are situated at the same latitude of some islands. Try to removes these islands from your dataset. If the problem disappear, you'll know that your data is the problem and where in your data the problem lies.
An other option is to try the gdalUtils package which has a function: gdal_rasterize. Maybe gdal is less exigent in the input data.
I had a similar problem rasterizing the TIGER areal water data for the San Juan Islands in Washington State , as well as for Maui - both of these spatial polygon data frames at the default resolution returned by package Tigris using a raster defined by points 1 arc-second of lat/lon apart. There were several horizontal stripes starting at what appeared to be sharp bends of the coastline. Various simplification algorithms helped, but not predictably, and not perfectly.
Try package Velox, which takes some getting used to as it uses Reference Classes. It probably has size limits, as it uses the Boost geometry libraries and works in memory. You don't need to understand it all, I don't. It is fast compared to raster::rasterize (especially for large and complicated spatial lines dataframes), although I didn't experience the hundred-fold speedups claimed, I am not gonna complain about a mere factor of 10 or 20 speedup. Most importantly, velox$rasterize() doesn't leave streaks for the locations I found where raster::rasterize did!
I found that it leaves a lot of memory garbage, and when converting large rasterLayers derived from velox$rasterize, running gc() was helpful before writing the raster in native R .grd format (in INT1S format to save disk space).
Just as a follow up to this question based on my experiences.
The horizontal lines are as a result of these 'islands' as described above. However, it only occurs if the polygon is 'multi-part'. If 'islands' are distinct polygons rather than a separate part of one polygon, then raster:rasterize() works fine.

How do I add information from a data frame onto a US Map in R?

I have a data frame based on the global terrorism database called gtd in R.
I have the maps package downloaded/required
What I am trying to do is take data from my gtd data frame and add the points on a US map.
I know the code for the USA map is map("state")
in my gtd data frame, there is a column for latitude and longitude
all I want to do is plot the points on the US map to show location of where terrorist attacks happened in US
so from my gtd database to only get a subset where American longitude/latitude is needed I know I would use
subset(x=gtd, country_txt=="United States")
but from there, how to I plot the location of attacks as points on the USA map?
Much appreciated!
I'm going to make the broad assumption from clues in your post that you're using data from the global terrorism database. If you've managed to read in the huge CSV from it, what you want to do should work and you're really close to having the answer. Basically, you just need to add the points() to the map():
library(maps)
# assuming you have it in a CSV file
dat <- read.csv("gtd.csv")
# subset only points in the US
US <- dat[dat$country_txt == "United States",]
# plot the base US map
map('state')
# add points with really horrid default color, point type & size
# changing this is a good exercise for the poster :-)
points(x=US$longitude, y=US$latitude, col="red", cex=0.75)
This has nothing to do with ggplot2 though, so consider removing that tag. It can be done with ggplot but maps() doesn't use it.

Resources