Wrong districts filled on state map plot - r

I have a shapfile of school districts in Texas and am trying to use ggplot2 to highlight 10 in particular. I've tinkered with it and gotten everything set up, but when I spot checked it I realized the 10 districts highlighted are not in fact the ones I want to be highlighted.
The shapefile can be downloaded from this link to the Texas Education Agency Public Open Data Site.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex#data$NAME2), as.character(tex#data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]
# turn shapefile into df
tex_df <- fortify(tex)
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
As you'll see, when the plot gets created it looks like it's done exactly what I want. The problem is, those ten districts highlighted are not hte ones in the districts vector above. I've re-ran everything clean numerous times, double checked that I wasn't having a factor/character conversion issue, and double checked within the web data explorer that the IDs that I get from the shapefile are indeed the ones that should match with my list of names. I really have no idea where this issue could be coming from.
This is my first time working with shapefiles and rgdal so if I had to guess there's something simple about the structure that I don't understand and hopefully one of you can quickly point it out for me. Thanks!
Here's the output:

Alternative 1
With the fortify function add the argument region specifying "NAME2", the column id will include your district names then. Then create your dummy fill variable based on that column.
I am not familiar with Texas districts, but I assume the result is right.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
Alternative 2
Without passing the argument region to fortify function. Addressing seeellayewhy's issue implementing previous alternative. We add two layers, no need to create dummy variable or merge any data frame.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts))
# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)
# Plot two geom_polygon layers, one for each data frame
ggplot() +
geom_polygon(data = tex_df1,
aes(x = long, y = lat, group = group, fill = "#CCCCCC"),
color = "#CCCCCC")+
geom_polygon(data = tex_df2,
aes(x = long, y = lat, group = group, fill ="#003082")) +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

When trying to implement #mpalanco's solution of adding the "region" argument to the fortify() function, I got an error that I could solve through numerous other stack posts (Error: isTRUE(gpclibPermitStatus()) is not TRUE). I also tried using broom::tidy() which is the non-deprecated euqivalent to fortify() and had the same error.
Ultimately, I ended up implementing #luchanocho's solution from here. I don't like the fact that it uses seq() to generate the ID because it's not necessarily preserving the proper order, but my case was simple enough that I could go through every district and confirm that the correct ones were highlighted.
My code is below. Output is the same as #mpalanco's answer. Since he obviously got the right result and used something that's not shaky the way the implemented solution is, I'm going to give him the answer assuming it works. The solution below can be considered a workaround if others experience the same error I got.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# convert shapefile to a df
tex_df <- fortify(tex)
# generate temp df with IDs to merge back in
names_df <- data.frame(tex#data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1) # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")
# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))
ggplot(data=final) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

Related

Putting Values on a County Map in R

I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data
The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")

R post-merge ggplot/qmap plots zipcode polygons incorrectly (jagged)

I have spent days searching this site and others for a solution, and haven't found it yet. If there is another page with my solution, and I missed it, I apologize.
I found this
but reloading ggplot2 and rgdal (after detaching) didn't fix it.
I am using demographic data at the ZCTA (zip code tabulation area) to overlay polygons on a Google terrain map. I am able to get the polygons plotted correctly using qmap, but after I merge in the demographic data, the plots are all wrong. I've tried specifying the order, and playing with the merge. (Heck, I've tried all sorts of things.) I'd love some help with this.
is a working plot, before the merge, and
is after.
Here's my code:
# shapefile from Census
fips34 <-readOGR(".", "zt34_d00", stringsAsFactors = FALSE)
# zip code areas, 1 row per ZCTA with nonmissing Census data
ptInd <-read.dta("ptIndzcta.dta")
keepzips <- fips34
keepzips#data$id <-rownames(keepzips#data) # create idvar to remerge
keepzipsdat <- fortify(keepzips, region="id") # fortify
keepzipsdat <- keepzipsdat[order(keepzipsdat$order),] # clarify order
keepzipsdat <- join(keepzipsdat, keepzips#data, by="id") # remerge for zcta
qmap("new jersey", zoom = 8, maptype="terrain", color="bw") +
geom_polygon(aes(x=long, y=lat, group=group),
data=keepzipsdat) + coord_equal() # this map plots fine
# now merge in data to create choropleth
zip2 <- merge(keepzipsdat, ptInd, by.y="zcta5", by.x="ZCTA", all.x = TRUE)
zip2[order(zip2$order),] # reestablish order, is this necessary?
qmap("new jersey", zoom = 8, maptype="terrain", color="bw") +
geom_polygon(aes(x=long, y=lat, group=group),
data=zip2) + coord_equal() # this looks crazy
ggplot(data=zip2, aes(x=long, y=lat, group=group)) + geom_polygon()
# also crazy
# and this is before assigning a fill variable to the polygons

Spatial Plot in R : how to plot the polygon and color as per the data to be visualized

I have been trying to draw the county based Choropleth map in R for visualizing my dataset for the State of Arizona.
For plotting the thematic map using the polygon bases data for the county from arizona.edu (Spatial Library) and data is from az.gov
It have the following for plotting the COUNTY polygon-
library(maptools)
library(rgdal)
library(ggplot2)
library(plyr)
county <- readShapePoly(file.choose())
county#data$id <- rownames(county#data)
county.points <- fortify(county, coords="id")
county.df <- join(county.points, county#data, by="id")
ggplot(county.df) + aes(long,lat,group=group, fill="id") +
geom_polygon() +
geom_path(color="white") +
coord_equal() +
scale_fill_brewer("County Arizona")
This code is not giving me any error and also no output.
My Source of Shape file here
Data Source here
I can't speak to why your code is not generating output - there are too many possible reasons - but is this what you are trying to achieve?
Code
library(rgdal)
library(ggplot2)
library(plyr)
library(RColorBrewer)
setwd("< directory with all your files >")
map <- readOGR(dsn=".",layer="ALRIS_tigcounty")
marriages <- read.csv("marriages.2012.csv",header=T,skip=3)
marriages <- marriages[2:16,]
marriages$County <- tolower(gsub(" ","",marriages$County))
marriages$Total <- as.numeric(as.character(marriages$Total))
data <- data.frame(id=rownames(map#data), NAME=map#data$NAME, stringsAsFactors=F)
data <- merge(data,marriages,by.x="NAME",by.y="County",all.x=T)
map.df <- fortify(map)
map.df <- join(map.df,data, by="id")
ggplot(map.df, aes(x=long, y=lat, group=group))+
geom_polygon(aes(fill=Total))+
geom_path(colour="grey50")+
scale_fill_gradientn("2012 Marriages",
colours=rev(brewer.pal(8,"Spectral")),
trans="log",
breaks=c(100,300,1000,3000,10000))+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank())+
coord_fixed()
Explanation
To generate a choropleth map, ultimately we need to associate polygons with your datum of interest (total marriages by county). This is a three step process: first we associate polygon ID with county name:
data <- data.frame(id=rownames(map#data), NAME=map#data$NAME, stringsAsFactors=F)
Then we associate county name with total marriages:
data <- merge(data,marriages,by.x="NAME",by.y="County",all.x=T)
Then we associate the result with the polygon coordinate data:
map.df <- join(map.df,data, by="id")
Your specific case has a lot of potential traps:
The link you provided was to a pdf - utterly useless. But poking around a bit revealed an Excel file with the same data. Even this file needs cleaning: the data has "," separators, which need to be turned off, and some of the cells have footnotes, which have to be removed. Finally, we have to save as a csv file.
Since we are matching on county name, the names have to match! In the shapefile attributes table, the county names are all lower case, and spaces have been removed (e.g., "Santa Cruz" is "santacruz". So we need to lowercase the county names and remove spaces:
marriages$County <- tolower(gsub(" ","",marriages$County))
The totals column comes in as a factor, which has to be converted to numeric:
marriages$Total <- as.numeric(as.character(marriages$Total))
Your actual data is highly skewed: maricopa county had 23,600 marriages, greenlee had 50. So using a linear color scale is not very informative. Consequently, we use a logarithmic scale:
scale_fill_gradientn("2012 Marriages",
colours=rev(brewer.pal(8,"Spectral")),
trans="log",
breaks=c(100,300,1000,3000,10000))+

R ggplot2 merge with shapefile and csv data to fill polygons

We daily produce maps that show a calculated level for temperature in 30 distinct areas of our region, each area is filled with a different colour depending on the level. This maps look like
Now I want to switch map generation to R. I've downloaded provincial and municipal boundaries (you can find boundaries for whole Spain or here the subset for my region) and managed to plot them with ggplot2 following Hadley's example.
I can also produce an ascii file that contains two columns: identifier (CODINE) and daily level. You can download here.
This is my first script attempting to plot shapefiles with R and ggplot2 so there may be mistakes and for sure it can be improved, suggestions welcome. The following code (based on Hadley's previously mentioned) works for me:
> require("rgdal")
> require("maptools")
> require("ggplot2")
> require("plyr")
# Reading municipal boundaries
esp = readOGR(dsn=".", layer="lineas_limite_municipales_etrs89")
muni=subset(esp, esp$PROV1 == "46" | esp$PROV1 == "12" | esp$PROV1 == "3")
muni#data$id = rownames(muni#data)
muni.points = fortify(muni, region="id")
muni.df = join(muni.points, muni#data, by="id")
# Reading province boundaries
prov = readOGR(dsn=".", layer="poligonos_provincia_etrs89")
pr=subset(prov, prov$CODINE == "46" | prov$CODINE == "12" | prov$CODINE == "03" )
pr#data$id = rownames(pr#data)
pr.points = fortify(pr, region="id")
pr.df = join(pr.points, pr#data, by="id")
ggplot(muni.df) + aes(long,lat,group=group) + geom_path(color="blue") +
+ coord_equal()+ geom_path(data=pr.df, +
aes(x=long, y=lat, group=group),color="red", size=0.5)
This code plots a nice map with all the boundaries
For polygon filling by level I tried to read and then merge as suggested in http://tormodboe.wordpress.com/2011/02/22/g%C3%B8y-med-kart-2/
level=read.csv("levels.dat",header=T,sep=" ")
munlevel=merge(muni.df,level,by="CODINE")
but it gives an error
Error en fix.by(by.x, x) : 'by' must specify a uniquely valid column
I am not familiar with shapefiles, maybe I need to learn more on shp data attributes to find the right choice to merge both data sets. How can I merge data so I can plot the lines (municipal boundaries) and then fill it with levels?
[NB: This question was asked over a month ago so OP has probably found a different way to solve their problem. I stumbled upon it while working on this related question. This answer is included in hopes it will benefit someone else.]
This appears to be what OP is asking for...
... and was produced with the following code:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
# read temperature data
setwd("<location if your data file>")
temp.data <- read.csv(file = "levels.dat", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
# read municipality polygons
setwd("<location of your shapefile")
esp <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")
muni <- subset(esp, esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")
# fortify and merge: muni.df is used in ggplot
muni#data$id <- rownames(muni#data)
muni.df <- fortify(muni)
muni.df <- join(muni.df, muni#data, by="id")
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
# create the map layers
ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group))
ggp <- ggp + geom_polygon(aes(fill=LEVEL)) # draw polygons
ggp <- ggp + geom_path(color="grey", linestyle=2) # draw boundaries
ggp <- ggp + coord_equal()
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444",
space = "Lab", na.value = "grey50",
guide = "colourbar")
ggp <- ggp + labs(title="Temperature Levels: Comunitat Valenciana")
# render the map
print(ggp)
Explanation:
Shapefiles imported into R with readOGR(...) are of type SpacialDataFrame and have two main sections: a ploygon section which contains the coordinates of all the points on each polygon, and a data section which contains information about each polygon (so, one row per polygon). These can be referenced, e.g., using muni#polygons and muni#data. The utility function fortify(...) converts the polygon section to a data frame organized for plotting with ggplot. So the basic workflow is:
[1] Import temperature data file (temp.data)
[2] Import polygon shapefile of municipalities (muni)
[3] Convert muni polygons to a data frame for plotting (muni.df <- fortify(...))
[4] Join columns from muni#data to muni.df
[5] Join columns from temp.data to muni.df
[6] Make the plot
The joins must be done on common fields, and this is where most of the problems come in. Each polygon in the original shapefile has a unique ID attribute. Running fortify(...) on the shapefile creates a column, id, which is based on this. But there is no ID column in the data section. Instead, the polygon IDs are stored as row names. So first we must add an id column to muni#data as follows:
muni#data$id <- rownames(muni#data)
Now we have an id field in muni#data and a corresponding id field in muni.df, so we can do the join:
muni.df <- join(muni.df, muni#data, by="id")
To create the map we will need to set fill colors based on temperature level. To do that we need to join the LEVEL column from temp.data to muni.df. In temp.data there is a field CODINE which identifies the municipality. There is also, now, a corresponding field CODIGOINE in muni.df. But there's a problem: CODIGOINE is char(5), with leading zeros, whereas CODINE is integer which means leading zeros are missing (imported from Excel, perhaps?). So just joining on these two fields produces no matches. We must first convert CODINE into char(5) with leading zeros:
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
Now we can join temp.dat to muni.df based on the corresponding fields.
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
We use merge(...) instead of join(...) because the join fields have different names and join(...) requires them to have the same name. (Note, however that join(...) is faster and should be used if possible). So, finally, we have a data frame which contains all the information for plotting the polygons and the temperature LEVEL which can be used to establish the fill color for each polygon.
Some notes on OP's original code:
OP's first map (the green one at the top) identifies "30 distinct areas for our region...". I could find no shapefile identifying those areas. The municipality file identifies 543 municipalities, and I could see no way to group these into 30 areas. In addition, the temperature level file has 542 rows, one for each municipality (more or less).
OP was importing line shapefiles for municipality to draw the boundaries. You don't need that because geom_polygon(...) will draw (and fill) the polygons and geom_path(...) will draw the boundaries.

State level unemployment in R

This is a newbie question. I want to plot the state level unemployment in the US map. There have been profound discussions here and elsewhere about how to plot county level unemployment and the issues associated with it. The code looks intimidating to me. Is there a simple code out there which takes two columns, a state code and a factor variable indicating numeric intervals and yields a colored US map(based on the factor variable). A supplementary question is that if I need to go a little further and create similar plot but with unemployment rate in major cities of US how do I modify the code.
Thank you in advance.
Here is a quick piece of code with comments explaining each step. Let me know if you have questions
# load libraries
library(XML);
library(ggplot2);
library(maps);
library(plyr);
# read the data from the bls website with correct column formats
unemp = readHTMLTable('http://www.bls.gov/web/laus/laumstrk.htm',
colClasses = c('character', 'character', 'numeric'))[[2]];
# rename columns and convert region to lowercase
names(unemp) = c('rank', 'region', 'rate');
unemp$region = tolower(unemp$region);
# get us state map data and merge with unemp
us_state_map = map_data('state');
map_data = merge(unemp, us_state_map, by = 'region');
# keep data sorted by polygon order
map_data = arrange(map_data, order);
# plot map using ggplot2
p0 = ggplot(map_data, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = cut_number(rate, 5))) +
geom_path(colour = 'gray', linestyle = 2) +
scale_fill_brewer('Unemployment Rate (Jan 2011)', pal = 'PuRd') +
coord_map();
#You may need to spell out the argument pal as pallete
Ramnath nailed this one. If you're still looking for other solutions, there's a decent example using other packages at the SAS-and-R blog.

Resources