How do I fill certain counties on a US map in R? - r

I am trying to construct a map of the eastern US with the counties lying in Appalachia highlighted as a certain color, while non-Appalachian counties are left white. I have constructed a county map of the eastern US using the following code:
library(usmap)
library(maps)
library(ggplot2)
us.counties = map_data('county')
head(us.counties)
#> long lat group order region subregion
#> 1 -86.50517 32.34920 1 1 alabama autauga
#> 2 -86.53382 32.35493 1 2 alabama autauga
#> 3 -86.54527 32.36639 1 3 alabama autauga
#> 4 -86.55673 32.37785 1 4 alabama autauga
#> 5 -86.57966 32.38357 1 5 alabama autauga
#> 6 -86.59111 32.37785 1 6 alabama autauga
plot_usmap("counties",
include = c(.east_north_central, .east_south_central, .south_atlantic,
.south_region, .northeast_region),
exclude = c('TX', 'AR', 'LA', 'OK'))
Which returned this map showing the eastern US with counties outlined.
I also have the following data frame appalachian.counties containing a list of all US counties in Appalachia by name and state they are in.
> head(appalachian.counties)
region subregion
1 alabama bibb
2 alabama blount
3 alabama calhoun
4 alabama chambers
5 alabama cherokee
6 alabama chilton
I would like to construct a map that looks like the blank map included above, but with the Appalachian counties included in the data frame appalachian.counties filled in a blue and the Appalachian counties specifically in Kentucky filled in red. Is this possible?

You could try this:
library(usmap)
library(maps)
us.counties = map_data('county')
states <- us.counties[us.counties$region %in% appalachian.counties$region,]
app <- us.counties[paste(us.counties$region, us.counties$subregion) %in%
paste(appalachian.counties$region, appalachian.counties$subregion),]
ken <- app[app$region == "kentucky",]
ggplot(states, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "gray75") +
geom_polygon(fill = "red", data = app, colour = "white") +
geom_polygon(fill = "blue", data = ken, colour = "white") +
coord_equal() +
theme_void()

Related

How do I use the usmap package to convert the state variable to the FIPS code?

I also want to change the state column to be in terms of the FIPS code. Just not sure what parameters to use and how to do this since I am new to R.
Here are the parameters given by R:
plot_usmap(regions = c("states", "state", "counties", "county"),
include = c(), data = data.frame(), values = "values",
theme = theme_map(), lines = "black", labels = FALSE,
label_color = "black")
It is unclear exactly what you are trying to achieve without an example, but here is how I was able to convert a column state in a data.frame from the abbreviation to the FIPS code:
> library(usmap)
> df <- statepop[1:5, -1]
> names(df)[1] <- 'state'
> df
# A tibble: 5 x 3
state full pop_2015
<chr> <chr> <dbl>
1 AL Alabama 4858979
2 AK Alaska 738432
3 AZ Arizona 6828065
4 AR Arkansas 2978204
5 CA California 39144818
> df$fips <- fips(df$state)
> df
# A tibble: 5 x 4
state full pop_2015 fips
<chr> <chr> <dbl> <chr>
1 AL Alabama 4858979 01
2 AK Alaska 738432 02
3 AZ Arizona 6828065 04
4 AR Arkansas 2978204 05
5 CA California 39144818 06

Modify x axis label for each facet

I have this mosaic plot
I'd like to have only label on x-axis for individuals in correct facet.
for example you can see that in the last facet there are only 7 bars. i'd like to show only x axis labels for those 7 bars.
Hope i have been clear enough
here's my code and data
p<-ggplot(data = newdata) +
geom_mosaic(aes(weight = frequency, x = product(region),fill=factor(categ)),na.rm=TRUE) +facet_grid(~cutt) +theme(axis.text.x=element_text(angle=90, hjust= .1))+
guides(fill=guide_legend(title = "Type of Crime", reverse = TRUE))`
head(newdata)
region categ frequency median_income cutt vec
1 alabama burglary 0.25773 42917 39k-51k 0
2 alabama larceny 0.67646 42917 39k-51k 0
3 alabama motor_veichle_theft 0.06581 42917 39k-51k 0
4 arizona burglary 0.20239 50036 39k-51k 0
5 arizona larceny 0.71590 50036 39k-51k 0
6 arizona motor_veichle_theft 0.08171 50036 39k-51k 0

Sorting Y Axis Values ggplot

I'm trying to create a dotplot where countries are listed on my Y axis from A-Z top to bottom. The medal count will be the X axis for each of the four plots, one each for gold, silver, bronze, and total. Of course, ggplot prefers to plot countries from Z-A and despite reading all about the problem, I haven't resolved the issue. I appreciate any straightforward help on both the coding and comprehension fronts.
mdat <- melt(raw, value.name = "Count", variable.name = "Place", id.var = "Country")
mdat[, "Place"] <- factor(mdat[, "Place"], levels=c("Gold", "Silver", "Bronze", "Total"))
##I know my problem is likely on or around the above line ##
plot1 <- ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point() +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33", "#999999", "#CC6600", "#000000"))
print(plot1)
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
You have to sort the levels of Country before you plot. Also, there is no Total level the data you provided. The following appraoch should give you the desired result:
Reading the data (including a Total level for the Place variable):
mdat <- read.table(text="Country Place Count
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
Algeria Total 10
Argentina Total 12
Armenia Total 3", header=TRUE)
Sorting the levels of the Country variable:
mdat$Country <- factor(mdat$Country,levels=sort(unique(mdat$Country),decreasing=TRUE))
Getting your Place variable in the correct order:
levels(mdat$Place) <- c("Bronze"=3,"Gold"=1,"Silver"=2,"Total"=4)
mdat$Place <- as.numeric(mdat$Place)
mdat$Place <- as.factor(mdat$Place)
levels(mdat$Place) <- c("Gold","Silver","Bronze","Total")
Creating the plot:
ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point(size=4) +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33","#999999","#CC6600","#000000"))
which gives the following plot:
As you melted your data already, I suspect that there is no Total variable in the raw dataframe. You can calculte that with:
raw$Total <- rowSums(..specify the Gold, Silver & Bronze columns here..)

Changing a continuous scale from decimal to percents

The scale for penetration is listed as a decimal (.5 and down), but I am having a problem changing it to a percent.
I tried to format it in my data as a percentage using this code
penetration_levels$Penetration<-sprintf("%.1f %%", 100*penetration_levels$Penetration)
which worked from a format sense, but when I tried to graph the plot I got an error saying penetration was used as a discrete, not continuous scale.
To fix that, used this code to format it as a numeric variable
penetration_levels$Penetration<-as.numeric(as.character(penetration_levels$Penetration))
Which returned a bunch of NAs. Does anyone know any other method of how I can change it to a percent?
Here is the code I used to map
ggplot code:
map <- ggplot(penetration_levels,aes(long,lat,group=region,fill=Penetration),) + geom_polygon() + coord _equal() + scale_fill_gradient2(low="red",mid="white",high="green",midpoint=.25)
map <- map + geom_point(data=mydata, aes(x=long, y=lat,group=1,fill=0, size=Annualized.Opportunity), color="gray6") + scale_size(name="Total Annual Opportunity-Millions",range=c(2,4))
map <- map + theme(plot.title = element_text(size = 12,face="bold"))
map
Head of mydata and penetration
head(mydata)
Sold.To.Customer City State Annualized.Opportunity location lat long
21 10000110 NEW YORK NY 12.142579 NEW YORK,NY 40.71435 -74.00597
262 10016487 FORT LAUDERDALE FL 12.087310 FORT LAUDERDALE,FL 26.12244 -80.13732
349 11001422 ALLEN PARK MI 10.910575 ALLEN PARK,MI 42.25754 -83.21104
19 10000096 ALTON IL 10.040067 ALTON,IL 38.89060 -90.18428
477 11067228 BAY CITY TX 10.030829 BAY CITY,TX 28.98276 -95.96940
230 10014909 BETHPAGE NY 9.320271 BETHPAGE,NY 40.74427 -73.48207
head(penetration_levels)
State region long lat group order subregion state To From Total Penetration
17 AL alabama -87.46201 30.38968 1 1 <NA> AL 10794947 12537359 23332307 0.462661
18 AL alabama -87.48493 30.37249 1 2 <NA> AL 10794947 12537359 23332307 0.462661
22 AL alabama -87.52503 30.37249 1 3 <NA> AL 10794947 12537359 23332307 0.462661
36 AL alabama -87.53076 30.33239 1 4 <NA> AL 10794947 12537359 23332307 0.462661
37 AL alabama -87.57087 30.32665 1 5 <NA> AL 10794947 12537359 23332307 0.462661
65 AL alabama -87.58806 30.32665 1 6 <NA> AL 10794947 12537359 23332307 0.462661
I also just noticed that there was a white strip, similar to a polygon that is missing in Washington… do you happen to know why that is? I tried to re-merge my data and order it again, but still the same result.
Any insight would be greatly appreciated.
Also, I noticed that Washington has a white polygon missing? Does anyone know why this happens?
You may load the scales package and use scale_fill_continuous(labels = percent). The percent argument is not very well documented in the argument section of the help text, but an example of this function, and other convenient formats from the scales package, can be found in the example section here.
A small example:
library(scales)
df <- data.frame(long = 1:10, lat = 1:10,
penetration = seq(from = 0.1, to = 1, by = 0.1))
ggplot(data = df, aes(x = long, y = lat, fill = penetration)) +
geom_point(shape = 21, size = 6) +
scale_fill_continuous(labels = percent)

Change colour scheme for ggplot geom_polygon in R

I'm creating a map using the maps library and ggplot's geom_polygon. I'd simply like to change the default blue, red, purple colour scheme to something else. I'm extremely new to ggplot so please forgive if I'm just not using the right data types. Here's what the data I'm using looks like:
> head(m)
region long lat group order subregion Group.1 debt.to.income.ratio.mean ratio total
17 alabama -87.46201 30.38968 1 1 <NA> alabama 12.4059 20.51282 39
18 alabama -87.48493 30.37249 1 2 <NA> alabama 12.4059 20.51282 39
19 alabama -87.52503 30.37249 1 3 <NA> alabama 12.4059 20.51282 39
20 alabama -87.53076 30.33239 1 4 <NA> alabama 12.4059 20.51282 39
21 alabama -87.57087 30.32665 1 5 <NA> alabama 12.4059 20.51282 39
22 alabama -87.58806 30.32665 1 6 <NA> alabama 12.4059 20.51282 39
> head(v)
Group.1 debt.to.income.ratio.mean ratio region total
alabama alabama 12.40590 20.51282 alabama 39
alaska alaska 11.05333 33.33333 alaska 6
arizona arizona 11.62867 25.55556 arizona 90
arkansas arkansas 11.90300 5.00000 arkansas 20
california california 11.00183 32.59587 california 678
colorado colorado 11.55424 30.43478 colorado 92
Here's the code:
library(ggplot2)
library(maps)
states <- map_data("state")
m <- merge(states, v, by="region")
m <- m[order(m$order),]
p<-qplot(long, lat, data=m, group=group, fill=ratio, geom="polygon")
I've tried the below and more:
cols <- c("8" = "red","4" = "blue","6" = "darkgreen", "10" = "orange")
p + scale_colour_manual(values = cols)
p + scale_colour_brewer(palette="Set1")
p + scale_color_manual(values=c("#CC6666", "#9999CC"))
The problem is that you are using a color scale but are using the fill aesthetic in the plot. You can use scale_fill_gradient() for two colors and scale_fill_gradient2() for three colors:
p + scale_fill_gradient(low = "pink", high = "green") #UGLY COLORS!!!
I was getting issues with scale_fill_brewer() complaining about a continuous variable supplied when a discrete variable was expected. One easy fix is to create discrete bins with cut() and then use that as the fill aesthetic:
m$breaks <- cut(m$ratio, 5) #Change to number of bins you want
p <- qplot(long, lat, data = m, group = group, fill = breaks, geom = "polygon")
p + scale_fill_brewer(palette = "Blues")

Resources