Labeling a US States map - r

I made a map in R and was wondering how to label the States Codes (variable which is in my dataset) appropriately. Using the simple geom_text or even geom_text_repel I get a lot of labels for each State (I can actually understand why), as I proceed to show:
Map
How can I solve it so each State gets 1 and only 1 text abbreviation (these State Codes are in my dataset as a variable under the name State Codes)? Thanks in advance.
Code below:
library(tidyverse)
library(maps)
library(wesanderson)
library(hrbrthemes)
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(aes(label = black_percentage)) +
theme_void() +
theme(legend.position = "bottom",
legend.title = element_blank(),
plot.title = element_text(hjust = 0.5, family = "Times", face = "bold"),
plot.subtitle = element_text(hjust = 0.5, family = "Times", face = "italic"),
plot.caption = element_text(family = "Times", face = "italic"),
legend.key.height = unit(0.85, "cm"),
legend.key.width = unit(0.85, "cm")) +
scale_fill_gradient(low = "#E6A0C4",
high = "#7294D4") +
labs(title = "Percentage of Black People, US States 2018",
subtitle = "Pink colors represent lower percentages. Light-blue colors represents higer percentages") +
ggsave("failed_map.png")

Can you provide the/some sample data?
One possible reason for multiple labels is that each state has multiple rows in the data, so ggplot thinks it needs to plot multiple labels. If you only need a single label, a solution is to create a separate summary dataset, which has only one row for each state/label. You then provide this summary data to geom_text() rather than the original data. Although not the problem in this instance, this is a solution to the common problem of 'blurry' labels; when 10's or 100's of labels are printed on top of one another they appear blurry, but when a single label is printed it appears fine.
Looking at your code and mapping aesthetics, it looks like geom_text() is inheriting the x and y aesthetics from the first ggplot() line. Therefore geom_text() will make a label for every value of x and y (long and lat) per state. This also explains why the labels all appear to follow the state borders.
I would suggest that you summarise each state to a single (x, y) coordinate (e.g. the middle of the state), and give this to geom_text(). Again, without some sample data it may be hard to explain, but something like:
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
mean_black = mean(black_percentage)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = mean_black))
As the name of the x and y coords are the same in your data and the new state_labels summary we made (long and lat), geom_text() will 'inherit' (assume/use) the same x and y aesthetics that you supplied inside the first line of ggplot(). This is convenient, but sometimes can cause you grief if either dataset has different/the same column names or you want to assign different aesthetics. For example, you don't need geom_text() to inherit the fill = black_percentage aesthetic (although in this instance I don't think it will cause a problem, as geom_text() doesn't accept a fill aesthetic). To disable aesthetic inheritance, simply provide inherit.aes = FALSE to the geom. In this instance, it would look like this, note how we now provide geom_text() with x and y aesthetics.
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(x = long, y = lat, label = mean_black), inherit.aes = FALSE)
EDIT If you want a single label, but the label is not a numeric value and you can't calculate a summary statistic using mean or similar, then the same principles apply; you want to create a summarised version of the data, with a single coordinates for each state and a single label - 1 row for each state. There's many ways to do this, but my go-to would be something like dplyr::first or similar.
# make the summary label dataframe
state_labels <- your_data %>%
group_by(state) %>%
summarise(
long = mean(long),
lat = mean(lat),
my_label = first(`State Codes`)
)
# then we plot it
ggplot(data = data,
mapping = aes(x = long,
y = lat,
group = group,
fill = black_percentage)) +
geom_polygon(col = "black") +
geom_text(data = state_labels, aes(label = my_label))
Finally, ggplot has several built-in functions to plot and map spatial data. It is a good idea to use these where possible, as it will make your life a lot easier. A great 3-part tutorial can be found here, and it even includes an example of exactly what you are trying to do.

Related

ggplot: Using factors to determine colour, shape and fill (issue with fill/color)

I have the following:
set.seed(100)
df <- data.frame(
lng = runif(n=20, min=5, max=10),
lat = runif(n=20, min=40, max=50),
year = rep(c("2001","2002","2003","2004"), each=5),
season = sample(c("spring", "autumn"), 10, replace = T),
info = sample(c("yes","no"), 10, replace = T)
)
Which can be plotted by:
ggplot() +
geom_point(data=df,
aes(x = lng,
y = lat,
color = year,
shape = season),
size=3)
To produce:
Great. But I want a red outline on the shapes were info == "yes".
The desired output would be:
Not made using actual data, just for demonstrative purpose. Made in powerpoint.
Admittedly it is similar to this question here, but not quite.
I am happy to split the df using a filter if easier then two + geom_points()
Many thanks
Jim
Below is a quick solution (not the best), which is to use another scale, and below I use size as the scale, then use guides() to manually specify the shape to appear in the legend. you need to plot the bigger red shapes first and then plot over so that it looks like an outline:
ggplot() +
geom_point(data=subset(df,info=="yes"),
aes(x=lng,y=lat,shape = season,size=info),col="red") +
scale_size_manual(values=3.6)+
geom_point(data=df,
aes(x = lng,
y = lat,
color = year,
shape = season),
size=3)+
guides(size = guide_legend(override.aes = list(shape = 1)))
You can change the legend for the shape by playing around with options in the guide()

Show only data labels for every N day and highlight a line for a specific variable in R ggplot

I'm trying to figure out two problems in R ggplot:
Show only data labels for every N day/data point
Highlight (make the line bigger and/or dotted) for a specific variable
My code is below:
gplot(data = sales,
aes(x = dates, y = volume, colour = Country, size = ifelse(Country=="US", 1, 0.5) group = Country)) +
geom_line() +
geom_point() +
geom_text(data = sales, aes(label=volume), size=3, vjust = -0.5)
I can't find out a way how to space the data labels as currently they are being shown for each data point per every day and it's very hard to read the plot.
As for #2, unfortunately, the size with ifelse doesn't work as 'US' line is becoming super huge and I can't change that size not matter what I specify in the first parameter of ifelse.
Would appreciate any help!
As no data was provided the solution is probably not perfect, but nonetheless shows you the general approach. Try this:
sales_plot <- sales %>%
# Create label
# e.g. assuming dates are in Date-Format labels are "only" created for even days
mutate(label = ifelse(lubridate::day(dates) %% 2 == 0, volume, ""))
ggplot(data = sales_plot,
# To adjust the size: Simply set labels. The actual size is set in scale_size_manual
aes(x = dates, y = volume, colour = Country, size = ifelse(Country == "US", "US", "other"), group = Country)) +
geom_line() +
geom_point() +
geom_text(aes(label = label), size = 3, vjust = -0.5) +
# Set the size according the labels
scale_size_manual(values = c(US = 2, other = .9))

ggplot2: Adding fill aesthetic to smooth geom inside stat_summary

I have what I think is a version of remove data points when using stat_summary to generate mean and confidence band or How to set multiple colours in a ggplot2 stat_summary plot? and may also relate to this bug report relating to the SE parameter https://github.com/tidyverse/ggplot2/issues/1546, but I can't seem to figure out what I am doing wrong.
I have weekly data and I am trying to plot current year, previous year, 5 year average, and 5 year range. I can get the plot and all the elements that I want, but I can't get the fill in the range to relate to my scale_fill command.
Here is the code I am using:
library(plyr)
require(dplyr)
require(tidyr)
library(ggplot2)
library(lubridate)
library(zoo)
library(viridis)
ggplot(df1,aes(week,value)) +
geom_point(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date()) ),size=1.7,aes(colour="1"))+
geom_line(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
geom_point(data=subset(df1,year(date)==year(Sys.Date())-1 ),size=1.7,aes(colour="2"))+
#stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom = 'smooth', alpha = 0.2,size=1.7,
# fun.data = median_hilow,aes(colour=c("1","2","3"),fill="range"))+
stat_summary(data=subset(df1,year(date)<year(Sys.Date()) &year(date)>year(Sys.Date())-6),geom="smooth",fun.y = mean, fun.ymin = min, fun.ymax = max,size=1.7,aes(colour="c",fill="b"))+
#stat_summary(fun.data=mean_cl_normal, geom='smooth', color='black')+
scale_color_viridis("",discrete=TRUE,option="C",labels=c(year(Sys.Date()), year(Sys.Date())-1,paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\naverage",sep ="")))+
scale_fill_viridis("",discrete=TRUE,option="C",labels=paste(year(Sys.Date())-6,"-",year(Sys.Date())-1,"\nrange",sep =""))+
#scale_fill_continuous()+
scale_x_continuous(limits=c(min(df1$week),max(df1$week)),expand=c(0,0))+
theme_minimal()+theme(
legend.position = "bottom",
legend.margin=margin(c(0,0,0,0),unit="cm"),
legend.text = element_text(colour="black", size = 12),
plot.caption = element_text(size = 14, face = "italic"),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(size = 14, face = "italic"),
#panel.grid.minor = element_blank(),
text = element_text(size = 14,face = "bold"),
axis.text.y =element_text(size = 14,face = "bold", colour="black"),
axis.text.x=element_text(size = 14,face = "bold", colour="black",angle=90, hjust=1),
)+
labs(y="Crude Oil Imports \n(Weekly, Thousands of Barrels per Day)",x="Week",
title=paste("US Imports of Crude Oil",sep=""),
caption="Source: EIA API, graph by Andrew Leach.")
I have placed an test.Rdata file here with the df1 data frame: https://drive.google.com/file/d/1aMt4WQaOi1vFJcMlgXFY7dzF_kjbgBiU/view?usp=sharing
Ideally, I'd like to have a fill legend item that looks like this, only with the text as I have it in my graph:
Any help would be much appreciated.
The short answer is that you seem to be misunderstanding how ggplot's scale_xx_xx commands are meant to be used (this trips up a lot of people). Whenever possible, the intention is for the aesthetics (the aes() bit inside most geoms) to be mapped to the scale functions. For example, the following code maps year to line color:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line()
print(plot.simple)
Since we specified that year (converted to a factor) should be used to define line color, ggplot defaults to using scale_color_hue. We could use a different scale:
plot.gray <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line() +
scale_color_grey()
print(plot.gray)
If we don't want to tie aesthetics such as color or fill to values in the data, we can just specify them outside of the call to aes(). Typically you only do this if you don't have multiple values for an aesthetic:
plot.simple <- ggplot(data = df1, aes(x = week, y = value, color = as.factor(year(date)))) +
geom_line(alpha = 0.2)
print(plot.simple)
But you're in the unenviable position of wanting both of these things at once. For your 2017 and 2018 lines, color is meaningful. For the summary ribbon and its associated line, color is just decorative. In such cases, I usually avoid ggplot's built-in summary functions, since they can often "help" in ways that end up confusing or cumbersome.
I would suggest creating two data sets, one containing the 2017 and 2018 years, and the other containing the summary statistics for the ribbon:
df.years <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 1)
df.year.range <- df1 %>%
mutate(year = year(date)) %>%
filter(year >= year(Sys.Date()) - 6 & year <= year(Sys.Date()) - 1) %>%
group_by(week) %>%
summarize(mean = mean(value), min = min(value), max = max(value))
We can then trick ggplot into printing a nice title for the fill on the legend, by setting fill inside aes to the intended string. Because fill is set in aes(), we control its color with scale_fill_manual.
the.plot <- ggplot() +
geom_ribbon(data = df.year.range, aes(x = week, ymin = min, ymax = max, fill = 'Previous 5 Year Range\nof Weekly Exports')) +
geom_line(data = df.year.range, aes(x = week, y = mean), color = 'purple') +
geom_line(data = df.years, aes(x = week, y = value, color = as.factor(year))) +
geom_point(data = filter(df.years, year == year(Sys.Date())), aes(x = week, y = value, color = as.factor(year))) +
scale_fill_manual(values = '#ffccff')
print(the.plot)
This is still rather cumbersome, because you have quite a few different elements tied to various different sources of data (lines for some years, points for others, a ribbon for a summary, etc). But it gets the job done!

Strange behavior on ggplot2

I'm trying to do a map to identify specific areas by coloring them. First, I made this plot to check if the data was ok (Setor is the sector's number):
ggplot(aes(x = long, y = lat, fill = Setor), data = mapa2010) + geom_polygon(colour = 'black') # data is ok
Them I tried to made the plot, filling by another variable (AGSN):
ggplot(aes(x = long, y = lat, fill = AGSN), data = mapa2010) + geom_polygon(colour = 'black')
The data is exactly the same, there is no code lines between this 2 commands. I've already tried to reorder the data, but still wrong.
Anyone know why this happens, and how to solve it?
Adding the parameter group = group in aes() for second plot solve. Don't know why only the second map needs.
ggplot(aes(x = long, y = lat, fill = AGSN, group = group), data = mapa2010[order(AGSN, id, piece, order), ]) + geom_polygon(colour = 'black')

Adding State abbreviations to map generated using ggplot2

I am trying to add state abbreviations to a US map generated using ggplot2 and having some difficulties with it. I believe the "fill = " option is causing it, but I am not sure.
Below I provide the code I am using. Initially, I generate the map the way I want except for the state names. Next, I try to overlay state abbreviations on the same map.
Unfortunately it is not working out for me. If I comment out "fill = " option from the first map, I can generate a map with state abbreviations. But that map does not show what I intend to show. I have tried several ways. I am just leaving one option in the code for the moment.
To add the state abbreviations, I am following some of the suggestions I have read in this forum. In particular, I am trying to follow the advice from a discussion titled "ggplot centered names on a map" dated February 25, 2012.
I would appreciate any help on how I can add/overlay the state abbreviations to the first map.
# Master US location data
states <- map_data("state")
# Read in the data
rate <- read.csv("~/R/HealthCare/Data/Test_data.csv")
names(rate) <- tolower(names(rate))
rate$numer <- as.factor(rate$numer)
rate$region <- tolower(rate$statename)
# Create data for US mapping
tomap <- merge(states, rate, sort = FALSE, by = "region")
tomap <- tomap[order(tomap$order), ]
## US Map
# 1. Target Map (w/o state abbr)
p <- qplot(long, lat, data = tomap,
group = group,
fill = numer,
geom = "polygon")
p + scale_fill_brewer(palette = "Greens",
guide = guide_legend(reverse = TRUE),
labels = c("1st cat", "2nd cat",
"3rd cat", "4th cat"))
# 2. Add State Abbreviations to Target Map
stannote <- aggregate(cbind(long, lat, group, numer) ~ stateabbr, data = tomap,
FUN=function(x)mean(range(x)))
q <- qplot(long, lat, data = tomap,
group = group,
#fill = numer,
fill = "red", #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr), color = "blue", size=3) +
coord_map()
q
The sample data file looks like the following –
StateName,StateAbbr,Numer
Alabama,AL,0
Alaska,AK,0
Arizona,AZ,0
Arkansas,AR,0
California,CA,0
Colorado,CO,0
Connecticut,CT,0
Delaware,DE,0
District of Columbia,DC,1
Florida,FL,0
Georgia,GA,0
Hawaii,HI,0
Idaho,ID,1
Illinois,IL,0
Indiana,IN,0
Iowa,IA,1
Kansas,KS,0
Kentucky,KY,1
Louisiana,LA,1
Maine,ME,2
Maryland,MD,0
Massachusetts,MA,2
Michigan,MI,0
Minnesota,MN,1
Mississippi,MS,0
Missouri,MO,0
Montana,MT,0
Nebraska,NE,0
Nevada,NV,1
New Hampshire,NH,1
New Jersey,NJ,2
New Mexico,NM,1
New York,NY,3
North Carolina,NC,0
North Dakota,ND,1
Ohio,OH,0
Oklahoma,OK,0
Oregon,OR,2
Pennsylvania,PA,0
Rhode Island,RI,0
South Carolina,SC,0
South Dakota,SD,1
Tennessee,TN,0
Texas,TX,0
Utah,UT,1
Vermont,VT,2
Virginia,VA,0
Washington,WA,2
West Virginia,WV,0
Wisconsin,WI,0
Wyoming,WY,0
As often happens to me with R, it turns out the error message was telling you exactly what was happening (it just takes a while to figure it out). Your numer variable in your second dataset stannote is continuous (check the structure with str(stannote) to see this). So you can just change that variable to a factor. Watch out, though: when you used cbind in aggregate I think you forced the factor to be turned into a numeric variable and so numer in stannote goes from 1-4 instead of 0-3.
Option 1:
stannote$numer = factor(stannote$numer, labels = c(0, 1, 2, 3))
qplot(long, lat, data = tomap,
group = group,
fill = numer, #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr),
color = "blue", size=3) + scale_fill_brewer(palette = "Greens")
Alternatively, you could remove the fill aesthetic that you set for the overall plot from the call to geom_text using fill = NULL. You don't actually need fill for the text, just for the polygons. This is a situation where if you were using ggplot instead of qplot you might just set the fill aesthetic for geom_polygon.
Option 2:
stannote$numer = as.numeric(stannote$numer)
qplot(long, lat, data = tomap,
group = group,
fill = numer, #testing
geom = "polygon") +
geom_text(data=stannote, aes(long, lat, label = stateabbr, fill = NULL),
color = "blue", size=3) + scale_fill_brewer(palette = "Greens")

Resources