I have a dataset containing different states, zip codes, and claim counts each in separate columns. I am trying to create a plot to show the total claim count according to zip codes for the state of MA.
Dataset:
I used this to filter by MA:
MA_medicare <- medicare %>%
filter(medicare$NPPES.Provider.State == "MA")
I then used this to set the fips code for plot_usmap:
MA_medicare$NPPES.Provider.State <- fips(MA_medicare$NPPES.Provider.State)
setnames(MA_medicare, old=c("NPPES.Provider.State"), new=c("fips"))
And last tried to graph (not sure why this doesn't work):
plot_usmap(data = MA_medicare, values= c("Total.Claim.Count", "NPPES.Provider.Zip.Code"), include = c("MA")) + scale_fill_continuous(low= "white", high= "red") + theme(legend.position = "right")
Error: Aesthetics must be either length 1 or the same as the data (4350838): fill
I'm the developer of usmap. plot_usmap only accepts one column of values for plotting so you're probably looking for the following:
plot_usmap(data = MA_medicare, values = "Total.Claim.Count", include = c("MA"))
However, your data is by zip code, and currently usmap doesn't support zip code maps (only state and county level maps). It uses the FIPS column to assign colors to states/counties on the map. Since you defined the FIPS codes by state, you'll just get the entire state of Massachusetts filled in with one solid color.
Related
I have plotted a figure of the US states in R.
Here is the very simple code:
library(usmap)
library(ggplot2)
plot_usmap(region = 'states')
And here is the resulting figure:
Figure of US states in R - states are not colored
Furthermore, I have a csv file containing the names of the states in US, and a color value, equal to red if that state voted for Republicans or blue if the state voted for Democrats. This is the top 5 rows of the CSV file:
State
Color
Alabama
#E81B23
Alaska
#E81B23
Arizona
#1405bd
Arkansas
#E81B23
How can I fill the states of my figure based on the colors in the CSV file?
To color the regions specified in the plot_usmap() function, you can provide your data via data= and then set the values= argument to the column in your data used for mapping the colors.
Here's an example with some randomly-generated data. The plot_usmap() is using a dataset that includes the 50 US states + the District of Columbia, so you'll want to make sure they are all in your dataset or you may get some NA labels.
library(usmap)
library(ggplot2)
set.seed(1234)
color_data <- data.frame(
state = c(state.name, "District of Columbia"),
the_colors = sample(c("A", "B"), size=51, replace=TRUE)
)
plot_usmap(
region = "states",
data = color_data,
values = "the_colors",
color="white"
) +
scale_fill_manual(values=c("#E81B23", "#1405bd"))
Note that I think the lines between the states look good in white, so color="white" fixes that. You may also notice that you typically don't specify the actual color in the dataframe - you can specify that via scale_fill_manual(values=...). In your case, you can use scale_fill_identity().
For your data, just make sure the "States" column in your dataset is renamed "state" and it should work.
I am newish to R and very new to GIS plotting on R with sf and ggplot2 packages. I have a dataset "comuni" containing all communes in Italy (similar to counties) and one of all motorways in Italy called "only_motorway". I know that I can use certain regions as a cookie cutter and keep only the motorways that are contained within such regions using st_intersection() function. However, I would like to do the inverse where, given I have a shapefile of the A3 motorway, I would like to keep only those communes that are crossed by that specific motorway.
I've tried using st_intersection function in the following way:
only_motorway_A3 <- only_motorway %>%
filter(ref == "A3")
comuni_A3 <- st_intersection(only_motorway_A3,comuni)
ggplot() +
geom_sf(data = comuni_A3,
color = "black", size = 0.1, fill = "black") +
geom_sf(data = only_motorway_A3, size = 0.15, color = "green") +
coord_sf(crs = 32632) +
theme_void()
But the results is the picture below:
ggplot
ie both only_motorway_A3 and comuni_A3 have the same geometry column and they both plot the highway line. What I wanted to plot instead was the highway line (in green) from only_motoway_A3 and all around it the communes crossed by it (in black) from comuni_A3. I hope it is clear and thank you in advance for your help!
Consider a sf::st_join() call, using first your polygons and secondly your line string objects, with parameter left set to false.
It will perform an inner (filtering) spatial join of the two objects. Only those polygons (the first argument) that contain a motorway will be retained.
So I'm plotting a shape file (from the ONS) of Great Britain split into 11 regions with the hope of creating a choropleth map based on COVID-19 cases.
I join the covid data with the shape file so that I can work within 1 data frame, joining on the region name.
I've used the longitude and latitude fields of the shape file for the x and y values within the aesthetics.
covid <- data.frame(Name = c("Scotland","Eastern","West Midlands","Yorkshire and the Humber","East Midlands","London","South West","South East","North West","North East","Wales"),
Cases = c(20,50,45,30,25,75,100,5,60,35,80))
#'greatb' is the name of the shape file
join <- merge(greatb,covid,by=c("NAME","Name"),by.x=c("NAME"),by.y=c("Name"), all=TRUE)
ggplot()+
geom_polygon(data=join, aes(x=long, y=lat, group=group, fill=Cases))
However, it seems that once I do this I can't use a variable name to fill the regions of the map. I get confronted with the error message: object 'Cases' not found
I'm unsure why I get this is message though as 'covid$data' is clearly an object and therefore so is 'join$data'. Can anyone help me with this?
despite having some experience with R, I am much less experienced using R for GIS-like tasks.
I have a shapefile of all communities within Germany and created a new object that only shows the borders of the 16 states of Germany.
gem <- readOGR(path/to/shapefile.shp) # reading shapefile
gemsf <- st_read(path/to/shapefile.shp) # reading shapefile as sf object
f00 <- gUnaryUnion(gem, id = gem#data$SN_L) # SN_L is the column of the various states - this line creates a new sp object with only the states instead of all communities
f002 <- sf::st_as_sf(f00, coords = c("x","y")) # turning the object into an sf object, so graphing with ggplot is easier
To check my work so far I plotted the base data (communities) using
gemsf %>%
ggplot(data = .,) + geom_sf( aes(fill = SN_L)) # fill by state
as well as plot(f002) which creates a plot of the 16 states, while the ggplot-code provides a nice map of Germany by community, with each state filled in a different color.
Now I'd like to overlay this with a second layer that indicates the borders of the states (so if you e.g. plot population density you can still distinguish states easily).
My attempt to do so, I used "standard procedure" and added another layer
ggplot() +
geom_sf(data = gemsf, aes(fill = SN_L)) + # fill by state
geom_sf(data = f002) # since the f002 data frame/sf object ONLY has a geometry column, there is no aes()
results in the following output: https://i.ibb.co/qk9zWRY/ggplot-map-layer.png
So how do I get to add a second layer that only provides the borders and does not cover the actual layer of interest below? In QGIS or ArcGIS, this is common procedure and not a problem, and I'd like to be able to recreate this in R, too.
Thank you very much for your help!
I found a solution which I want to share with everyone.
ggplot() +
geom_sf(data = gemsf_data, aes(fill = log(je_km2))) + # fill by state
geom_sf(data = f002, alpha = 0, color = "black") + # since the f002 data frame/sf object ONLY has a geometry column, there is no aes()
theme_minimal()
The trick was adding "alpha" not in the aes() part, but rather just as shown above.
We daily produce maps that show a calculated level for temperature in 30 distinct areas of our region, each area is filled with a different colour depending on the level. This maps look like
Now I want to switch map generation to R. I've downloaded provincial and municipal boundaries (you can find boundaries for whole Spain or here the subset for my region) and managed to plot them with ggplot2 following Hadley's example.
I can also produce an ascii file that contains two columns: identifier (CODINE) and daily level. You can download here.
This is my first script attempting to plot shapefiles with R and ggplot2 so there may be mistakes and for sure it can be improved, suggestions welcome. The following code (based on Hadley's previously mentioned) works for me:
> require("rgdal")
> require("maptools")
> require("ggplot2")
> require("plyr")
# Reading municipal boundaries
esp = readOGR(dsn=".", layer="lineas_limite_municipales_etrs89")
muni=subset(esp, esp$PROV1 == "46" | esp$PROV1 == "12" | esp$PROV1 == "3")
muni#data$id = rownames(muni#data)
muni.points = fortify(muni, region="id")
muni.df = join(muni.points, muni#data, by="id")
# Reading province boundaries
prov = readOGR(dsn=".", layer="poligonos_provincia_etrs89")
pr=subset(prov, prov$CODINE == "46" | prov$CODINE == "12" | prov$CODINE == "03" )
pr#data$id = rownames(pr#data)
pr.points = fortify(pr, region="id")
pr.df = join(pr.points, pr#data, by="id")
ggplot(muni.df) + aes(long,lat,group=group) + geom_path(color="blue") +
+ coord_equal()+ geom_path(data=pr.df, +
aes(x=long, y=lat, group=group),color="red", size=0.5)
This code plots a nice map with all the boundaries
For polygon filling by level I tried to read and then merge as suggested in http://tormodboe.wordpress.com/2011/02/22/g%C3%B8y-med-kart-2/
level=read.csv("levels.dat",header=T,sep=" ")
munlevel=merge(muni.df,level,by="CODINE")
but it gives an error
Error en fix.by(by.x, x) : 'by' must specify a uniquely valid column
I am not familiar with shapefiles, maybe I need to learn more on shp data attributes to find the right choice to merge both data sets. How can I merge data so I can plot the lines (municipal boundaries) and then fill it with levels?
[NB: This question was asked over a month ago so OP has probably found a different way to solve their problem. I stumbled upon it while working on this related question. This answer is included in hopes it will benefit someone else.]
This appears to be what OP is asking for...
... and was produced with the following code:
require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")
# read temperature data
setwd("<location if your data file>")
temp.data <- read.csv(file = "levels.dat", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
# read municipality polygons
setwd("<location of your shapefile")
esp <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")
muni <- subset(esp, esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")
# fortify and merge: muni.df is used in ggplot
muni#data$id <- rownames(muni#data)
muni.df <- fortify(muni)
muni.df <- join(muni.df, muni#data, by="id")
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
# create the map layers
ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group))
ggp <- ggp + geom_polygon(aes(fill=LEVEL)) # draw polygons
ggp <- ggp + geom_path(color="grey", linestyle=2) # draw boundaries
ggp <- ggp + coord_equal()
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444",
space = "Lab", na.value = "grey50",
guide = "colourbar")
ggp <- ggp + labs(title="Temperature Levels: Comunitat Valenciana")
# render the map
print(ggp)
Explanation:
Shapefiles imported into R with readOGR(...) are of type SpacialDataFrame and have two main sections: a ploygon section which contains the coordinates of all the points on each polygon, and a data section which contains information about each polygon (so, one row per polygon). These can be referenced, e.g., using muni#polygons and muni#data. The utility function fortify(...) converts the polygon section to a data frame organized for plotting with ggplot. So the basic workflow is:
[1] Import temperature data file (temp.data)
[2] Import polygon shapefile of municipalities (muni)
[3] Convert muni polygons to a data frame for plotting (muni.df <- fortify(...))
[4] Join columns from muni#data to muni.df
[5] Join columns from temp.data to muni.df
[6] Make the plot
The joins must be done on common fields, and this is where most of the problems come in. Each polygon in the original shapefile has a unique ID attribute. Running fortify(...) on the shapefile creates a column, id, which is based on this. But there is no ID column in the data section. Instead, the polygon IDs are stored as row names. So first we must add an id column to muni#data as follows:
muni#data$id <- rownames(muni#data)
Now we have an id field in muni#data and a corresponding id field in muni.df, so we can do the join:
muni.df <- join(muni.df, muni#data, by="id")
To create the map we will need to set fill colors based on temperature level. To do that we need to join the LEVEL column from temp.data to muni.df. In temp.data there is a field CODINE which identifies the municipality. There is also, now, a corresponding field CODIGOINE in muni.df. But there's a problem: CODIGOINE is char(5), with leading zeros, whereas CODINE is integer which means leading zeros are missing (imported from Excel, perhaps?). So just joining on these two fields produces no matches. We must first convert CODINE into char(5) with leading zeros:
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')
Now we can join temp.dat to muni.df based on the corresponding fields.
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
We use merge(...) instead of join(...) because the join fields have different names and join(...) requires them to have the same name. (Note, however that join(...) is faster and should be used if possible). So, finally, we have a data frame which contains all the information for plotting the polygons and the temperature LEVEL which can be used to establish the fill color for each polygon.
Some notes on OP's original code:
OP's first map (the green one at the top) identifies "30 distinct areas for our region...". I could find no shapefile identifying those areas. The municipality file identifies 543 municipalities, and I could see no way to group these into 30 areas. In addition, the temperature level file has 542 rows, one for each municipality (more or less).
OP was importing line shapefiles for municipality to draw the boundaries. You don't need that because geom_polygon(...) will draw (and fill) the polygons and geom_path(...) will draw the boundaries.