R ggplot2 merge with shapefile and csv data to fill polygons

R ggplot2 merge with shapefile and csv data to fill polygons - r

We daily produce maps that show a calculated level for temperature in 30 distinct areas of our region, each area is filled with a different colour depending on the level. This maps look like
Now I want to switch map generation to R. I've downloaded provincial and municipal boundaries (you can find boundaries for whole Spain or here the subset for my region) and managed to plot them with ggplot2 following Hadley's example.
I can also produce an ascii file that contains two columns: identifier (CODINE) and daily level. You can download here.
This is my first script attempting to plot shapefiles with R and ggplot2 so there may be mistakes and for sure it can be improved, suggestions welcome. The following code (based on Hadley's previously mentioned) works for me:
> require("rgdal")
> require("maptools")
> require("ggplot2")
> require("plyr")
# Reading municipal boundaries
esp = readOGR(dsn=".", layer="lineas_limite_municipales_etrs89")
muni=subset(esp, esp$PROV1 == "46" | esp$PROV1 == "12" | esp$PROV1 == "3")
muni#data$id = rownames(muni#data)
muni.points = fortify(muni, region="id")
muni.df = join(muni.points, muni#data, by="id")
# Reading province boundaries
prov = readOGR(dsn=".", layer="poligonos_provincia_etrs89")
pr=subset(prov, prov$CODINE == "46" | prov$CODINE == "12" | prov$CODINE == "03" )
pr#data$id = rownames(pr#data)
pr.points = fortify(pr, region="id")
pr.df = join(pr.points, pr#data, by="id")
ggplot(muni.df) + aes(long,lat,group=group) + geom_path(color="blue") +
+ coord_equal()+ geom_path(data=pr.df, +
aes(x=long, y=lat, group=group),color="red", size=0.5)
This code plots a nice map with all the boundaries
For polygon filling by level I tried to read and then merge as suggested in http://tormodboe.wordpress.com/2011/02/22/g%C3%B8y-med-kart-2/
level=read.csv("levels.dat",header=T,sep=" ")
munlevel=merge(muni.df,level,by="CODINE")
but it gives an error
Error en fix.by(by.x, x) : 'by' must specify a uniquely valid column
I am not familiar with shapefiles, maybe I need to learn more on shp data attributes to find the right choice to merge both data sets. How can I merge data so I can plot the lines (municipal boundaries) and then fill it with levels?

[NB: This question was asked over a month ago so OP has probably found a different way to solve their problem. I stumbled upon it while working on this related question. This answer is included in hopes it will benefit someone else.]
This appears to be what OP is asking for...
... and was produced with the following code:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
# read temperature data
setwd("<location if your data file>")
temp.data <- read.csv(file = "levels.dat", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
# read municipality polygons
setwd("<location of your shapefile")
esp <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")
muni <- subset(esp, esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")
# fortify and merge: muni.df is used in ggplot
muni#data$id <- rownames(muni#data)
muni.df <- fortify(muni)
muni.df <- join(muni.df, muni#data, by="id")
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
# create the map layers
ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group))
ggp <- ggp + geom_polygon(aes(fill=LEVEL)) # draw polygons
ggp <- ggp + geom_path(color="grey", linestyle=2) # draw boundaries
ggp <- ggp + coord_equal()
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444",
space = "Lab", na.value = "grey50",
guide = "colourbar")
ggp <- ggp + labs(title="Temperature Levels: Comunitat Valenciana")
# render the map
print(ggp)
Explanation:
Shapefiles imported into R with readOGR(...) are of type SpacialDataFrame and have two main sections: a ploygon section which contains the coordinates of all the points on each polygon, and a data section which contains information about each polygon (so, one row per polygon). These can be referenced, e.g., using muni#polygons and muni#data. The utility function fortify(...) converts the polygon section to a data frame organized for plotting with ggplot. So the basic workflow is:
[1] Import temperature data file (temp.data)
[2] Import polygon shapefile of municipalities (muni)
[3] Convert muni polygons to a data frame for plotting (muni.df <- fortify(...))
[4] Join columns from muni#data to muni.df
[5] Join columns from temp.data to muni.df
[6] Make the plot
The joins must be done on common fields, and this is where most of the problems come in. Each polygon in the original shapefile has a unique ID attribute. Running fortify(...) on the shapefile creates a column, id, which is based on this. But there is no ID column in the data section. Instead, the polygon IDs are stored as row names. So first we must add an id column to muni#data as follows:
muni#data$id <- rownames(muni#data)
Now we have an id field in muni#data and a corresponding id field in muni.df, so we can do the join:
muni.df <- join(muni.df, muni#data, by="id")
To create the map we will need to set fill colors based on temperature level. To do that we need to join the LEVEL column from temp.data to muni.df. In temp.data there is a field CODINE which identifies the municipality. There is also, now, a corresponding field CODIGOINE in muni.df. But there's a problem: CODIGOINE is char(5), with leading zeros, whereas CODINE is integer which means leading zeros are missing (imported from Excel, perhaps?). So just joining on these two fields produces no matches. We must first convert CODINE into char(5) with leading zeros:
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
Now we can join temp.dat to muni.df based on the corresponding fields.
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
We use merge(...) instead of join(...) because the join fields have different names and join(...) requires them to have the same name. (Note, however that join(...) is faster and should be used if possible). So, finally, we have a data frame which contains all the information for plotting the polygons and the temperature LEVEL which can be used to establish the fill color for each polygon.
Some notes on OP's original code:
OP's first map (the green one at the top) identifies "30 distinct areas for our region...". I could find no shapefile identifying those areas. The municipality file identifies 543 municipalities, and I could see no way to group these into 30 areas. In addition, the temperature level file has 542 rows, one for each municipality (more or less).
OP was importing line shapefiles for municipality to draw the boundaries. You don't need that because geom_polygon(...) will draw (and fill) the polygons and geom_path(...) will draw the boundaries.

Related

Putting Values on a County Map in R

I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data

The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")

Shading a map with underlying data in Julia

I want to create a map of Germany where each state is shaded according to its gross domestic product. I know how to do this in R (and put the code below). Is there a possibility to do this in Julia in an equally simple way?
library(tidyverse)
library(ggplot2)
library(sf)
shpData = st_read("./geofile.shp")
GDPData <- read.delim("./stateGDP.csv", header=FALSE)
GDPData <- rename(GDPData,StateName=V1,GDP=V2)
GDPData %>%
left_join(shpData) ->mergedData
ggplot(mergedData) + geom_sf(data = mergedData, aes(fill = BIP,geometry=geometry)) + coord_sf(crs = st_crs(mergedData))-> pBIP1

You'd load the Shapefile and use Plots to plot it.
The ideomatic code is something like
using Plots, Shapefile, CSV
shp = Shapefile.shapes(Shapefile.Table("geofile.shp"))
GDPData = CSV.read("stateGDP.csv")
plot(shp, fill_z = GDPData.V2')
Note the ' which transposes the values to a column vector - this will tell Plots to apply the colors to individual polygons.

Wrong districts filled on state map plot

I have a shapfile of school districts in Texas and am trying to use ggplot2 to highlight 10 in particular. I've tinkered with it and gotten everything set up, but when I spot checked it I realized the 10 districts highlighted are not in fact the ones I want to be highlighted.
The shapefile can be downloaded from this link to the Texas Education Agency Public Open Data Site.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex#data$NAME2), as.character(tex#data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]
# turn shapefile into df
tex_df <- fortify(tex)
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
As you'll see, when the plot gets created it looks like it's done exactly what I want. The problem is, those ten districts highlighted are not hte ones in the districts vector above. I've re-ran everything clean numerous times, double checked that I wasn't having a factor/character conversion issue, and double checked within the web data explorer that the IDs that I get from the shapefile are indeed the ones that should match with my list of names. I really have no idea where this issue could be coming from.
This is my first time working with shapefiles and rgdal so if I had to guess there's something simple about the structure that I don't understand and hopefully one of you can quickly point it out for me. Thanks!
Here's the output:

Alternative 1
With the fortify function add the argument region specifying "NAME2", the column id will include your district names then. Then create your dummy fill variable based on that column.
I am not familiar with Texas districts, but I assume the result is right.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
Alternative 2
Without passing the argument region to fortify function. Addressing seeellayewhy's issue implementing previous alternative. We add two layers, no need to create dummy variable or merge any data frame.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts))
# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)
# Plot two geom_polygon layers, one for each data frame
ggplot() +
geom_polygon(data = tex_df1,
aes(x = long, y = lat, group = group, fill = "#CCCCCC"),
color = "#CCCCCC")+
geom_polygon(data = tex_df2,
aes(x = long, y = lat, group = group, fill ="#003082")) +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

When trying to implement #mpalanco's solution of adding the "region" argument to the fortify() function, I got an error that I could solve through numerous other stack posts (Error: isTRUE(gpclibPermitStatus()) is not TRUE). I also tried using broom::tidy() which is the non-deprecated euqivalent to fortify() and had the same error.
Ultimately, I ended up implementing #luchanocho's solution from here. I don't like the fact that it uses seq() to generate the ID because it's not necessarily preserving the proper order, but my case was simple enough that I could go through every district and confirm that the correct ones were highlighted.
My code is below. Output is the same as #mpalanco's answer. Since he obviously got the right result and used something that's not shaky the way the implemented solution is, I'm going to give him the answer assuming it works. The solution below can be considered a workaround if others experience the same error I got.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# convert shapefile to a df
tex_df <- fortify(tex)
# generate temp df with IDs to merge back in
names_df <- data.frame(tex#data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1) # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")
# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))
ggplot(data=final) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")

Overlay multiple lines from data frame with index column onto existing plot

I have a dataframe with 3 columns, (Id, Lat, Long), you can construct a small section of this with the following data:
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
The Id column is an index column. So all the rows with the same Id number have the coordinates for a single line. In my data frame this Id number varies from 1 through to 7696. So I have 7696 lines to plot.
Each Id number relates to an individual separate line of Lat and Long coordinates. What I want to do is overlay onto an existing plot all of these 7696 individual lines.
With the example data above this contains the Lat & Long coordinates for lines 1, 2, 3.
What is the best way to overlay all these lines onto an existing plot, I was thinking maybe some kind of loop?

Using ggplot2:
#dummy data
df <- data.frame(
Id=c(1,1,2,2,2,2,2,2,3,3,3,3,3,3),
Lat=c(58.12550, 58.17426, 58.46461, 58.45812, 58.45207, 58.44512, 58.43358, 58.42727, 57.77700, 57.76034, 57.73614, 57.72411, 57.70498, 57.68453),
Long=c(-5.098068, -5.314452, -4.914108, -4.899922, -4.887067, -4.873312, -4.852384, -4.840817, -5.666568, -5.648711, -5.617588, -5.594681, -5.557740, -5.509405))
library(ggplot2)
#plot
ggplot(data=df,aes(Lat,Long,colour=as.factor(Id))) +
geom_line()
Using base R:
#plot blank
with(df,plot(Lat,Long,type="n"))
#plot lines
for(i in unique(df$Id))
with(df[ df$Id==i,],lines(Lat,Long,col=i))

To be honest, I think that any approach to take is going to result in a very cluttered plot since you have so many Ids (unless their lines do not overlap much). Either way, I would probably use ggplot2 for this.
##
if( !("ggplot2" %in% installed.packages()[,1]) ){
install.packages("ggplot2",dependencies=TRUE)
}
library(ggplot2)
##
D <- data.frame(
Id=Id,
Lat=Lat,
Long=Long
)
##
ggplot(data=D,aes(x=Lat,y=Long,group=Id,color=Id))+
geom_point()+ ## you might want to omit geom_point() in your plot
geom_line()
##
The reason I used group=Id, color=Id in aes() rather than passing Id as a factor to aes() and just using color=Id is that you will end up with a legend containing 7000+ factor levels (the majority of which will not be visible in the plot area).

Spatial Plot in R : how to plot the polygon and color as per the data to be visualized

I have been trying to draw the county based Choropleth map in R for visualizing my dataset for the State of Arizona.
For plotting the thematic map using the polygon bases data for the county from arizona.edu (Spatial Library) and data is from az.gov
It have the following for plotting the COUNTY polygon-
library(maptools)
library(rgdal)
library(ggplot2)
library(plyr)
county <- readShapePoly(file.choose())
county#data$id <- rownames(county#data)
county.points <- fortify(county, coords="id")
county.df <- join(county.points, county#data, by="id")
ggplot(county.df) + aes(long,lat,group=group, fill="id") +
geom_polygon() +
geom_path(color="white") +
coord_equal() +
scale_fill_brewer("County Arizona")
This code is not giving me any error and also no output.
My Source of Shape file here
Data Source here

I can't speak to why your code is not generating output - there are too many possible reasons - but is this what you are trying to achieve?
Code
library(rgdal)
library(ggplot2)
library(plyr)
library(RColorBrewer)
setwd("< directory with all your files >")
map <- readOGR(dsn=".",layer="ALRIS_tigcounty")
marriages <- read.csv("marriages.2012.csv",header=T,skip=3)
marriages <- marriages[2:16,]
marriages$County <- tolower(gsub(" ","",marriages$County))
marriages$Total <- as.numeric(as.character(marriages$Total))
data <- data.frame(id=rownames(map#data), NAME=map#data$NAME, stringsAsFactors=F)
data <- merge(data,marriages,by.x="NAME",by.y="County",all.x=T)
map.df <- fortify(map)
map.df <- join(map.df,data, by="id")
ggplot(map.df, aes(x=long, y=lat, group=group))+
geom_polygon(aes(fill=Total))+
geom_path(colour="grey50")+
scale_fill_gradientn("2012 Marriages",
colours=rev(brewer.pal(8,"Spectral")),
trans="log",
breaks=c(100,300,1000,3000,10000))+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank())+
coord_fixed()
Explanation
To generate a choropleth map, ultimately we need to associate polygons with your datum of interest (total marriages by county). This is a three step process: first we associate polygon ID with county name:
data <- data.frame(id=rownames(map#data), NAME=map#data$NAME, stringsAsFactors=F)
Then we associate county name with total marriages:
data <- merge(data,marriages,by.x="NAME",by.y="County",all.x=T)
Then we associate the result with the polygon coordinate data:
map.df <- join(map.df,data, by="id")
Your specific case has a lot of potential traps:
The link you provided was to a pdf - utterly useless. But poking around a bit revealed an Excel file with the same data. Even this file needs cleaning: the data has "," separators, which need to be turned off, and some of the cells have footnotes, which have to be removed. Finally, we have to save as a csv file.
Since we are matching on county name, the names have to match! In the shapefile attributes table, the county names are all lower case, and spaces have been removed (e.g., "Santa Cruz" is "santacruz". So we need to lowercase the county names and remove spaces:
marriages$County <- tolower(gsub(" ","",marriages$County))
The totals column comes in as a factor, which has to be converted to numeric:
marriages$Total <- as.numeric(as.character(marriages$Total))
Your actual data is highly skewed: maricopa county had 23,600 marriages, greenlee had 50. So using a linear color scale is not very informative. Consequently, we use a logarithmic scale:
scale_fill_gradientn("2012 Marriages",
colours=rev(brewer.pal(8,"Spectral")),
trans="log",
breaks=c(100,300,1000,3000,10000))+

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R ggplot2 merge with shapefile and csv data to fill polygons - r

Related

Putting Values on a County Map in R

Shading a map with underlying data in Julia

Wrong districts filled on state map plot

Overlay multiple lines from data frame with index column onto existing plot

Spatial Plot in R : how to plot the polygon and color as per the data to be visualized

Categories

Resources