I am trying to add state abbreviations to a US map generated using ggplot2 and having some difficulties with it. I believe the "fill = " option is causing it, but I am not sure.
Below I provide the code I am using. Initially, I generate the map the way I want except for the state names. Next, I try to overlay state abbreviations on the same map.
Unfortunately it is not working out for me. If I comment out "fill = " option from the first map, I can generate a map with state abbreviations. But that map does not show what I intend to show. I have tried several ways. I am just leaving one option in the code for the moment.
To add the state abbreviations, I am following some of the suggestions I have read in this forum. In particular, I am trying to follow the advice from a discussion titled "ggplot centered names on a map" dated February 25, 2012.
I would appreciate any help on how I can add/overlay the state abbreviations to the first map.
# Master US location data
states <- map_data("state")
# Read in the data
rate <- read.csv("~/R/HealthCare/Data/Test_data.csv")
names(rate) <- tolower(names(rate))
rate$numer <- as.factor(rate$numer)
rate$region <- tolower(rate$statename)
# Create data for US mapping
tomap <- merge(states, rate, sort = FALSE, by = "region")
tomap <- tomap[order(tomap$order), ]
## US Map
# 1. Target Map (w/o state abbr)
p <- qplot(long, lat, data = tomap,
group = group,
fill = numer,
geom = "polygon")
p + scale_fill_brewer(palette = "Greens",
guide = guide_legend(reverse = TRUE),
labels = c("1st cat", "2nd cat",
"3rd cat", "4th cat"))
# 2. Add State Abbreviations to Target Map
stannote <- aggregate(cbind(long, lat, group, numer) ~ stateabbr, data = tomap,
FUN=function(x)mean(range(x)))
q <- qplot(long, lat, data = tomap,
group = group,
#fill = numer,
fill = "red", #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr), color = "blue", size=3) +
coord_map()
q
The sample data file looks like the following –
StateName,StateAbbr,Numer
Alabama,AL,0
Alaska,AK,0
Arizona,AZ,0
Arkansas,AR,0
California,CA,0
Colorado,CO,0
Connecticut,CT,0
Delaware,DE,0
District of Columbia,DC,1
Florida,FL,0
Georgia,GA,0
Hawaii,HI,0
Idaho,ID,1
Illinois,IL,0
Indiana,IN,0
Iowa,IA,1
Kansas,KS,0
Kentucky,KY,1
Louisiana,LA,1
Maine,ME,2
Maryland,MD,0
Massachusetts,MA,2
Michigan,MI,0
Minnesota,MN,1
Mississippi,MS,0
Missouri,MO,0
Montana,MT,0
Nebraska,NE,0
Nevada,NV,1
New Hampshire,NH,1
New Jersey,NJ,2
New Mexico,NM,1
New York,NY,3
North Carolina,NC,0
North Dakota,ND,1
Ohio,OH,0
Oklahoma,OK,0
Oregon,OR,2
Pennsylvania,PA,0
Rhode Island,RI,0
South Carolina,SC,0
South Dakota,SD,1
Tennessee,TN,0
Texas,TX,0
Utah,UT,1
Vermont,VT,2
Virginia,VA,0
Washington,WA,2
West Virginia,WV,0
Wisconsin,WI,0
Wyoming,WY,0
As often happens to me with R, it turns out the error message was telling you exactly what was happening (it just takes a while to figure it out). Your numer variable in your second dataset stannote is continuous (check the structure with str(stannote) to see this). So you can just change that variable to a factor. Watch out, though: when you used cbind in aggregate I think you forced the factor to be turned into a numeric variable and so numer in stannote goes from 1-4 instead of 0-3.
Option 1:
stannote$numer = factor(stannote$numer, labels = c(0, 1, 2, 3))
qplot(long, lat, data = tomap,
group = group,
fill = numer, #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr),
color = "blue", size=3) + scale_fill_brewer(palette = "Greens")
Alternatively, you could remove the fill aesthetic that you set for the overall plot from the call to geom_text using fill = NULL. You don't actually need fill for the text, just for the polygons. This is a situation where if you were using ggplot instead of qplot you might just set the fill aesthetic for geom_polygon.
Option 2:
stannote$numer = as.numeric(stannote$numer)
qplot(long, lat, data = tomap,
group = group,
fill = numer, #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr, fill = NULL),
color = "blue", size=3) + scale_fill_brewer(palette = "Greens")
Related
I am trying to make a map of the U.S. which shows two categorical variables, for example the income group of the state and the region the state belongs in. The idea is to use the "fill" aesthetic to show the income level of each state, and then the "color" aesthetic to show the outlines of each region. The information that I am trying to communicate is that lower-income and higher-income states are clustered in certain regions.
An alternative would be to somehow show the regional boundaries with a bolder or thicker boundary than the state boundaries, but I am also unsure how to do this. Other ideas which communicate the same information would also be welcome.
Ideally, it would be some combination of the following two plots:
## Create map data
state_map_data <- map_data("state")
state_regions <- tibble(state_name = tolower(state.name), state.region,
as_tibble(state.x77)) %>%
mutate(income_cat = cut(Income, breaks = 3,
labels = c("low", "medium", "high")))
state_map_data <- state_map_data %>%
left_join(state_regions,
by = c("region" = "state_name"))
## Map with just income
p1 <- ggplot() +
geom_polygon(data = state_map_data,
aes(x = long, y = lat, group = group,
fill = income_cat))
print(p1)
This generates the following map with income
## Map with just regions
p2 <- ggplot() +
geom_polygon(data = state_map_data,
aes(x = long, y = lat, group = group,
color = state.region))
print(p2)
This generates the following map with regions
## Map with both
p <- ggplot() +
geom_polygon(data = state_map_data,
aes(x = long, y = lat, group = group,
fill = income_cat)) +
geom_polygon(data = state_map_data,
aes(x = long, y = lat, group = group,
color = state.region))
print(p)
This does not produce the expected results of a map with both a color outline by region and filled states by income as seen here
The way you have your code you are drawing two sets of polygons, with state.region polygons on top of the income_cat polygons. Instead, you want to draw one set of polygons with the correct outline color and fill color:
ggplot() +
geom_polygon(data = state_map_data,
aes(x = long, y = lat, group = group,
fill = income_cat, color = state.region)
)
I made a map in R and was wondering how to label the States Codes (variable which is in my dataset) appropriately. Using the simple geom_text or even geom_text_repel I get a lot of labels for each State (I can actually understand why), as I proceed to show:
Map
How can I solve it so each State gets 1 and only 1 text abbreviation (these State Codes are in my dataset as a variable under the name State Codes)? Thanks in advance.
Code below:
library(tidyverse)
library(maps)
library(wesanderson)
library(hrbrthemes)
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(aes(label = black_percentage)) +
theme_void() +
theme(legend.position = "bottom",
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5, family = "Times", face = "bold"),
plot.subtitle = element_text(hjust = 0.5, family = "Times", face = "italic"),
plot.caption = element_text(family = "Times", face = "italic"),
legend.key.height = unit(0.85, "cm"),
legend.key.width = unit(0.85, "cm")) +
scale_fill_gradient(low = "#E6A0C4",
high = "#7294D4") +
labs(title = "Percentage of Black People, US States 2018",
subtitle = "Pink colors represent lower percentages. Light-blue colors represents higer percentages") +
ggsave("failed_map.png")
Can you provide the/some sample data?
One possible reason for multiple labels is that each state has multiple rows in the data, so ggplot thinks it needs to plot multiple labels. If you only need a single label, a solution is to create a separate summary dataset, which has only one row for each state/label. You then provide this summary data to geom_text() rather than the original data. Although not the problem in this instance, this is a solution to the common problem of 'blurry' labels; when 10's or 100's of labels are printed on top of one another they appear blurry, but when a single label is printed it appears fine.
Looking at your code and mapping aesthetics, it looks like geom_text() is inheriting the x and y aesthetics from the first ggplot() line. Therefore geom_text() will make a label for every value of x and y (long and lat) per state. This also explains why the labels all appear to follow the state borders.
I would suggest that you summarise each state to a single (x, y) coordinate (e.g. the middle of the state), and give this to geom_text(). Again, without some sample data it may be hard to explain, but something like:
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
mean_black = mean(black_percentage)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = mean_black))
As the name of the x and y coords are the same in your data and the new state_labels summary we made (long and lat), geom_text() will 'inherit' (assume/use) the same x and y aesthetics that you supplied inside the first line of ggplot(). This is convenient, but sometimes can cause you grief if either dataset has different/the same column names or you want to assign different aesthetics. For example, you don't need geom_text() to inherit the fill = black_percentage aesthetic (although in this instance I don't think it will cause a problem, as geom_text() doesn't accept a fill aesthetic). To disable aesthetic inheritance, simply provide inherit.aes = FALSE to the geom. In this instance, it would look like this, note how we now provide geom_text() with x and y aesthetics.
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(x = long, y = lat, label = mean_black), inherit.aes = FALSE)
EDIT If you want a single label, but the label is not a numeric value and you can't calculate a summary statistic using mean or similar, then the same principles apply; you want to create a summarised version of the data, with a single coordinates for each state and a single label - 1 row for each state. There's many ways to do this, but my go-to would be something like dplyr::first or similar.
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
my_label = first(`State Codes`)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = my_label))
Finally, ggplot has several built-in functions to plot and map spatial data. It is a good idea to use these where possible, as it will make your life a lot easier. A great 3-part tutorial can be found here, and it even includes an example of exactly what you are trying to do.
This is a reproductible example of an issue I am facing. I am trying to create maps with ggplot2 in multiple stages. Here is the issue I face.
Consider the data borderwith the polygons of states in US at the Mexican border, and border.countywith the polygons of the counties in these states. The following code allows you to get the data:
library(maps)
library(ggmap)
library(ggplot2)
USA <- get_googlemap(center = 'usa', zoom = 4,
style = 'administrative|element:labels|visibility:off')
us.df <- map_data("state")
border <- subset(us.df,
region %in% c("california","arizona","new mexico","texas"))
counties <- map_data("county")
border.county <- subset(counties,
region %in% c("california","arizona","new mexico","texas"))
Now I wan to create a map, with the background of a map from Google Maps, with the state polygons and the county borders. If I do the following, it works neatly:
Allmap <- ggmap(USA) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group),
data=border, color = "white") +
geom_polygon(aes(x = long, y = lat, group = group),
data=border.county, fill=NA, color="red")
Now if I wanted to create this map in multiple stages, I hit problems. I just want the county boundaries for background information (as sort of 'recurrent theme'), and I will create multiple maps with changing information at the state level. So I create the 'background map' with counties, which works fine:
Countmap <- ggmap(USA) +
geom_polygon(aes(x = long, y = lat, group = group),
data=border.county, fill=NA, color="red")
And now I try to combine it with the state maps:
Statmap <- ggmap(USA) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group),
data=border, color = "white") +
Countmap
That gives me the error:
Error: Don't know how to add o to a plot
How can I solve this? I can combine the maps in the other way (as in: Statmap <- Countmap + geom_polygon(aes(x = long, y = lat, fill = region, group = group), data=border, color = "white")); however, that puts the counties under the state boundaries.
I also know this specific problem has the easy solution of just drawing a map with the states first, and combine it with the counties in a second stage. However, in my real scenario, that is not an option because the recurrent theme of the map is something that needs to be drawn in second place : cities and geographic borders (like my county boundaries here).
This is the map I want to create:
If I understand your description correctly, you don't want to combine maps. You want to combine layers, specifically, to overlay the county outlines on changing state-level maps.
Try this:
# define county outlines as a geom_polygon layer
county.layer <- geom_polygon(aes(x = long, y = lat, group = group),
data = border.county, fill = NA, color = "red")
# add county.layer as the last layer to your state-level map
Statmap <- ggmap(USA) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group),
data=border, color = "white") +
county.layer
Statmap
Edit in response to comment
If you have multiple county layers to plot, place them in a list:
border.county2 <- subset(counties, region %in% c("montana"))
layer2 <- list(geom_polygon(aes(x = long, y = lat, group = group),
data = border.county2, fill = NA, color = "blue"),
geom_polygon(aes(x = long, y = lat, group = group),
data = border.county, fill = NA, color = "red"))
Statmap <- ggmap(USA) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group),
data=border, color = "white") +
layer2
I see some people using R to fill maps from shapefiles, e.g. here. With my R knowledge, I only read the shapefile and plot it.
With QGIS I added an extra column to the attribute table called "OCCURENCE". This column is a year when a specific event occurred. I'd like to fill each country according to the year of occurence using a color scale, leaving the countries with no data without fill. The folder with the shapefiles is here. As example, I added some years to a few countries, and I like to obtain something like:
and a legend .
library(rgdal)
library(ggplot2)
World <- readOGR(dsn = "mundo", layer = "mundo")
class(World)
World2 <- fortify(World)
class(World2)
ggplot() +
geom_polygon(data = World2, aes(x = long, y = lat, group = group),
colour = "black", size = 0.5, fill = "white")
Any help??
The simple way, as #jazzurro said, is to work with sf package. But, you can achieve this with the method proposed by you adding a couple of steps.
You need to add an id field in SpatialPolygonsDataFrame and join these attributes with fortify() product. With this, you can fill polygons with any field:
library(rgdal)
library(ggplot2)
World <- readOGR(dsn = "mundo", layer = "mundo")
class(World)
World2 <- fortify(World)
class(World2)
World#data$id <- 0:(dim(World#data)[1]-1) # add id field
World2_join = plyr::join(x = World2,y = World#data, by="id") # join by id
ggplot() +
geom_polygon(data = World2_join, aes(x = long, y = lat, group = group, fill = OCCURENCE), # fill by OCCURENCE
colour = "black", size = 0.5)
I am very new to working with spacial data with R. So I was hoping someone could point out where I am making a mistake.
The map works perfectly without the geom_text element. However, when I try to label the different regions on my map, the whole plot becomes black.
#First function loads spacial data, second loads arguments data
geo_data <- readOGR(dsn = file.choose(), layer = "CIV_adm01")
population_data <- read.csv(file.choose(), stringsAsFactors = `FALSE) #load
new attributes
population_data <- as.factor(population_data$phones)
# merge on common variable, here called 'NAME_1'
m <- merge(geo_data, population_data, by='NAME_1')
# saves shapefile in directory
shapefile(m, "path/merged.shp")
m_f <- fortify(m) #defines polygons, but loses atributes of the data
m$id <- row.names(m) #needed for join, extracts ID column
m_f <- left_join(m_f, m#data) # join the data
#change "fill" to variable you wish to analyze
map <- ggplot(m_f, aes(long, lat, group = group, fill = phones)) +
geom_polygon() +
coord_equal() +
geom_text(aes(x=long,y=lat,label=NAME_1), data = m_f, size=3, alpha = 0.3) +
labs(x = "", y = "",
fill = "Concentration of Phones") + #title of legend
ggtitle("Phones per region") #title of map
#Map Style 1: Blue color
map + scale_fill_gradient(high = "#132B43", low = "#56B1F7", space = "Lab",
na.value = "grey50", guide = "colourbar")
This is the result I get.