Changing a continuous scale from decimal to percents - r

The scale for penetration is listed as a decimal (.5 and down), but I am having a problem changing it to a percent.
I tried to format it in my data as a percentage using this code
penetration_levels$Penetration<-sprintf("%.1f %%", 100*penetration_levels$Penetration)
which worked from a format sense, but when I tried to graph the plot I got an error saying penetration was used as a discrete, not continuous scale.
To fix that, used this code to format it as a numeric variable
penetration_levels$Penetration<-as.numeric(as.character(penetration_levels$Penetration))
Which returned a bunch of NAs. Does anyone know any other method of how I can change it to a percent?
Here is the code I used to map
ggplot code:
map <- ggplot(penetration_levels,aes(long,lat,group=region,fill=Penetration),) + geom_polygon() + coord _equal() + scale_fill_gradient2(low="red",mid="white",high="green",midpoint=.25)
map <- map + geom_point(data=mydata, aes(x=long, y=lat,group=1,fill=0, size=Annualized.Opportunity), color="gray6") + scale_size(name="Total Annual Opportunity-Millions",range=c(2,4))
map <- map + theme(plot.title = element_text(size = 12,face="bold"))
map
Head of mydata and penetration
head(mydata)
Sold.To.Customer City State Annualized.Opportunity location lat long
21 10000110 NEW YORK NY 12.142579 NEW YORK,NY 40.71435 -74.00597
262 10016487 FORT LAUDERDALE FL 12.087310 FORT LAUDERDALE,FL 26.12244 -80.13732
349 11001422 ALLEN PARK MI 10.910575 ALLEN PARK,MI 42.25754 -83.21104
19 10000096 ALTON IL 10.040067 ALTON,IL 38.89060 -90.18428
477 11067228 BAY CITY TX 10.030829 BAY CITY,TX 28.98276 -95.96940
230 10014909 BETHPAGE NY 9.320271 BETHPAGE,NY 40.74427 -73.48207
head(penetration_levels)
State region long lat group order subregion state To From Total Penetration
17 AL alabama -87.46201 30.38968 1 1 <NA> AL 10794947 12537359 23332307 0.462661
18 AL alabama -87.48493 30.37249 1 2 <NA> AL 10794947 12537359 23332307 0.462661
22 AL alabama -87.52503 30.37249 1 3 <NA> AL 10794947 12537359 23332307 0.462661
36 AL alabama -87.53076 30.33239 1 4 <NA> AL 10794947 12537359 23332307 0.462661
37 AL alabama -87.57087 30.32665 1 5 <NA> AL 10794947 12537359 23332307 0.462661
65 AL alabama -87.58806 30.32665 1 6 <NA> AL 10794947 12537359 23332307 0.462661
I also just noticed that there was a white strip, similar to a polygon that is missing in Washington… do you happen to know why that is? I tried to re-merge my data and order it again, but still the same result.
Any insight would be greatly appreciated.
Also, I noticed that Washington has a white polygon missing? Does anyone know why this happens?

You may load the scales package and use scale_fill_continuous(labels = percent). The percent argument is not very well documented in the argument section of the help text, but an example of this function, and other convenient formats from the scales package, can be found in the example section here.
A small example:
library(scales)
df <- data.frame(long = 1:10, lat = 1:10,
penetration = seq(from = 0.1, to = 1, by = 0.1))
ggplot(data = df, aes(x = long, y = lat, fill = penetration)) +
geom_point(shape = 21, size = 6) +
scale_fill_continuous(labels = percent)

Related

How do I fill certain counties on a US map in R?

I am trying to construct a map of the eastern US with the counties lying in Appalachia highlighted as a certain color, while non-Appalachian counties are left white. I have constructed a county map of the eastern US using the following code:
library(usmap)
library(maps)
library(ggplot2)
us.counties = map_data('county')
head(us.counties)
#> long lat group order region subregion
#> 1 -86.50517 32.34920 1 1 alabama autauga
#> 2 -86.53382 32.35493 1 2 alabama autauga
#> 3 -86.54527 32.36639 1 3 alabama autauga
#> 4 -86.55673 32.37785 1 4 alabama autauga
#> 5 -86.57966 32.38357 1 5 alabama autauga
#> 6 -86.59111 32.37785 1 6 alabama autauga
plot_usmap("counties",
include = c(.east_north_central, .east_south_central, .south_atlantic,
.south_region, .northeast_region),
exclude = c('TX', 'AR', 'LA', 'OK'))
Which returned this map showing the eastern US with counties outlined.
I also have the following data frame appalachian.counties containing a list of all US counties in Appalachia by name and state they are in.
> head(appalachian.counties)
region subregion
1 alabama bibb
2 alabama blount
3 alabama calhoun
4 alabama chambers
5 alabama cherokee
6 alabama chilton
I would like to construct a map that looks like the blank map included above, but with the Appalachian counties included in the data frame appalachian.counties filled in a blue and the Appalachian counties specifically in Kentucky filled in red. Is this possible?
You could try this:
library(usmap)
library(maps)
us.counties = map_data('county')
states <- us.counties[us.counties$region %in% appalachian.counties$region,]
app <- us.counties[paste(us.counties$region, us.counties$subregion) %in%
paste(appalachian.counties$region, appalachian.counties$subregion),]
ken <- app[app$region == "kentucky",]
ggplot(states, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "gray75") +
geom_polygon(fill = "red", data = app, colour = "white") +
geom_polygon(fill = "blue", data = ken, colour = "white") +
coord_equal() +
theme_void()

Drawing colored US State map with cut_number() in R

I have a dataframe called "drawdata":
GeoName Ranking
1 Alabama 15
2 Alaska 2
3 Arizona 28
4 Arkansas 12
5 California 19
6 Colorado 7
7 Connecticut 42
8 Delaware 37
9 District of Columbia 9
10 Florida 38
11 Georgia 11
12 Hawaii 48
13 Idaho 10
14 Illinois 16
15 Indiana 26
16 Iowa 34
17 Kansas 27
18 Kentucky 20
19 Louisiana 4
20 Maine 51
21 Maryland 30
22 Massachusetts 39
23 Michigan 14
24 Minnesota 23
25 Mississippi 41
26 Missouri 32
27 Montana 25
28 Nebraska 21
29 Nevada 45
30 New Hampshire 47
31 New Jersey 33
32 New Mexico 5
33 New York 44
34 North Carolina 13
35 North Dakota 31
36 Ohio 35
37 Oklahoma 6
38 Oregon 18
39 Pennsylvania 40
40 Rhode Island 49
41 South Carolina 29
42 South Dakota 46
43 Tennessee 43
44 Texas 3
45 Utah 17
46 Vermont 50
47 Virginia 8
48 Washington 24
49 West Virginia 22
50 Wisconsin 36
51 Wyoming 1
And I want to draw a US State map with different colors for each ranking. The code I have is:
names(drawdata) = c('region','value')
drawdata[,1] = tolower(drawdata[,1])
states = data.frame(state.center, state.abb)
states_map = map_data("state")
df = merge(drawdata, states_map, by = "region")
df$num = 49
p1 = ggplot(data = df, aes(x = long, y = lat, group = group))
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
p1 = p1 + geom_path(colour = 'gray', linestyle = 2)
p1 = p1 + scale_fill_brewer('', palette = 'PuRd')
p1 = p1 + coord_map()
p1 = p1 + scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL)
p1 = p1 + theme(legend.position="none")
p1 = p1 + geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)
p1
This perfectly works if 'num', or the number of colors to fill, is small. However, when I set 'num=49', then it produces an error:
Error in cut.default(x, breaks(x, "n", n), include.lowest = TRUE, ...) :
'breaks' are not unique
When I alter the code from
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
to
p1 = p1 + geom_polygon(aes(fill = cut_number(unique(value), num[1])))
then it gives me a different error:
Error: Aesthetics must either be length one, or the same length as the dataProblems:cut_number(unique(value), num[1])
I want a map where every 49 States in the map have different colors, each reflecting their 'Ranking'. Any help is very appreciated!
Brewer palettes deliberately have small maximums (generally < 12) since it's pretty much impossible for humans to map the subtle differences to the discrete values you have. You can achieve what you're looking for by "faking" it with scale_fill_gradient2 (NOTE: I deliberately left the legend in as you should too):
library(ggplot2)
names(drawdata) <- c('region','value')
drawdata[,1] <- tolower(drawdata[,1])
states <- data.frame(state.center, state.abb)
states <- states[!(states$state.abb %in% c("AK", "HI")),] # they aren't part of states_map
states_map <- map_data("state")
p1 <- ggplot()
# borders
p1 <- p1 + geom_map(data=states_map, map=states_map,
aes(x=long, y=lat, map_id=region),
color="white", size=0.15)
# fills
p1 <- p1 + geom_map(data=drawdata, map=states_map,
aes(fill=value, map_id=region),
color="white", size=0.15)
# labels
p1 <- p1 + geom_text(data=states,
aes(x=x, y=y, label=state.abb, group=NULL), size=2)
# decent projection
p1 <- p1 + coord_map("albers", lat0=39, lat1=45)
p1 <- p1 + scale_fill_gradient2(low="#f7f4f9", mid="#df65b0", high="#67001f")
# better theme
p1 <- p1 + labs(x=NULL, y=NULL)
p1 <- p1 + theme_bw()
p1 <- p1 + theme(panel.grid=element_blank())
p1 <- p1 + theme(panel.border=element_blank())
p1 <- p1 + theme(axis.ticks=element_blank())
p1 <- p1 + theme(axis.text=element_blank())
p1
You can get an even better result with scale_fill_distiller which does alot under the scenes to let you use a Color Brewer palette with continuous data (I'd argue you do not have continuous data tho):
p1 <- p1 + scale_fill_distiller(palette="PuRd")
I'd strongly suggest continuing to use cut like you had originally and having a max of 9 breaks to fit into the Color Brewer palette you're trying to work with. In reality, folks are still going to need a table to really grok the rankings (never assume Americans know either state shapes, locations or even the two-letter abbreviations for them), so I'd also pretty much just suggest using an actual table with full names at least with this choropleth if not in place of it.
Note also that the way you're trying to build the map deliberately excluded Alaska, Hawaii and the District of Columbia. You'll need to use a real shapefile and something like I cover here to get them to show up nicely.
If you want different colors for each state, using a gradient, you can work with scale_fill_gradient. Here is one version, using green and red at the ends of the gradient, so that each state is on that scale.
ggplot(data = df, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = value)) +
geom_path(colour = 'gray', linestyle = 2) +
scale_fill_gradient(low = "green", high = "red") +
coord_map() +
scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL) +
theme(legend.position="none") +
geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)

Maps, ggplot2, fill by state is missing certain areas on the map

I am working with maps and ggplot2 to visualize the number of certain crimes in each state for different years. The data set that I am working with was produced by the FBI and can be downloaded from their site or from here (if you don't want to download the dataset I don't blame you, but it is way too big to copy and paste into this question, and including a fraction of the data set wouldn't help, as there wouldn't be enough information to recreate the graph).
The problem is easier seen than described.
As you can see California is missing a large chunk as well as a few other states. Here is the code that produced this plot:
# load libraries
library(maps)
library(ggplot2)
# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")
# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")
# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
facets=~Year, fill=Robbery, group=group)
This is what the states data looks like:
long lat group order region subregion
1 -87.46201 30.38968 1 1 alabama <NA>
2 -87.48493 30.37249 1 2 alabama <NA>
3 -87.52503 30.37249 1 3 alabama <NA>
4 -87.53076 30.33239 1 4 alabama <NA>
5 -87.57087 30.32665 1 5 alabama <NA>
6 -87.58806 30.32665 1 6 alabama <NA>
And this is what the fbi data looks like:
Year Population Violent Property Murder Forcible.Rape Robbery
1 1960 3266740 6097 33823 406 281 898
2 1961 3302000 5564 32541 427 252 630
3 1962 3358000 5283 35829 316 218 754
4 1963 3347000 6115 38521 340 192 828
5 1964 3407000 7260 46290 316 397 992
6 1965 3462000 6916 48215 395 367 992
Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr state region
1 4512 11626 19344 2853 AL Alabama alabama
2 4255 11205 18801 2535 AL Alabama alabama
3 3995 11722 21306 2801 AL Alabama alabama
4 4755 12614 22874 3033 AL Alabama alabama
5 5555 15898 26713 3679 AL Alabama alabama
6 5162 16398 28115 3702 AL Alabama alabama
I then merged the two sets along region. The subset I am trying to plot is
region Year Robbery long lat group
8283 alabama 2012 5020 -87.46201 30.38968 1
8284 alabama 2012 5020 -87.48493 30.37249 1
8285 alabama 2012 5020 -87.95475 30.24644 1
8286 alabama 2012 5020 -88.00632 30.24071 1
8287 alabama 2012 5020 -88.01778 30.25217 1
8288 alabama 2012 5020 -87.52503 30.37249 1
... ... ... ...
Any ideas on how I can create this plot without those ugly missing spots?
I played with your code. One thing I can tell is that when you used merge something happened. I drew states map using geom_path and confirmed that there were a couple of weird lines which do not exist in the original map data. I, then, further investigated this case by playing with merge and inner_join. merge and inner_join are doing the same job here. However, I found a difference. When I used merge, order changed; the numbers were not in the right sequence. This was not the case with inner_join. You will see a bit of data with California below. Your approach was right. But merge somehow did not work in your favour. I am not sure why the function changed order, though.
library(dplyr)
### Call US map polygon
states <- map_data("state")
### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)
### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.
setdiff(fbi$state, states$region)
#[1] "alaska" "hawaii" "washington d. c."
setdiff(states$region, fbi$state)
#[1] "district of columbia"
### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
filter(Year == 2012) %>%
select(state, Robbery)
Now I created two data frames with merge and inner_join.
### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))
ana %>%
filter(state == "california") %>%
slice(1:5)
# state Robbery long lat group order subregion
#1 california 56521 -119.8685 38.90956 4 676 <NA>
#2 california 56521 -119.5706 38.69757 4 677 <NA>
#3 california 56521 -119.3299 38.53141 4 678 <NA>
#4 california 56521 -120.0060 42.00927 4 667 <NA>
#5 california 56521 -120.0060 41.20139 4 668 <NA>
bob %>%
filter(state == "california") %>%
slice(1:5)
# state Robbery long lat group order subregion
#1 california 56521 -120.0060 42.00927 4 667 <NA>
#2 california 56521 -120.0060 41.20139 4 668 <NA>
#3 california 56521 -120.0060 39.70024 4 669 <NA>
#4 california 56521 -119.9946 39.44241 4 670 <NA>
#5 california 56521 -120.0060 39.31636 4 671 <NA>
ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()
The problem is in the order of arguments to merge
fbimap <- merge(fbi, states, by="region")
has the thematic data first and the geo data second. Switching the order with
fbimap <- merge(states, fbi, by="region")
the polygons should all close up.

Adding NAs to a continuous scale in ggplot2

I created this geographic chart in R and am having a problem graphing the NA values ( the black areas) in the legend
Here is the code I used for mapping:
map<-ggplot(penetration_levels,aes(long,lat,group=region,fill=Penetration,))+geom_polygon()+coord_equal()+scale_fill_gradient2(low="Red",mid="white",high="Green",midpoint=.33,na.value="Black",label=percent)
map<-map+geom_point(data=mydata, aes(x=long, y=lat,group=1,fill=0, size=Annualized.Opportunity),color="gray6") + scale_size(name="Total Annual Opportunity-Millions",range=c(1,6))
map<-map+borders("state", colour="black", alpha=0.8)
map<-map+theme(plot.title = element_text(size = 12))
map<-map+theme_bw()+theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border = element_blank())
map
I tried a few thing using map + legend() and trying different ways to add NAs as a aesthetic, but I was having difficulties, with the primary problem being user error.
Basically what I am looking to do is add a legend under what exists that has a filled box and says 'No info available"
merge code:
states<-merge(states,statelookup,by="region",all.x=T)
states<-states[order(states$order),]
penetration_levels<-merge(states,penetration_levels,by="State",all.x=T)
penetration_levels<-penetration_levels[order(penetration_levels$order),]
heads of Variables:
head(penetration_levels)
State region long lat group order subregion state From To Total Penetration
23 AL alabama -87.46201 30.38968 1 1 <NA> AL 3104873 2691875 5796748 0.4643768
24 AL alabama -87.48493 30.37249 1 2 <NA> AL 3104873 2691875 5796748 0.4643768
53 AL alabama -87.52503 30.37249 1 3 <NA> AL 3104873 2691875 5796748 0.4643768
54 AL alabama -87.53076 30.33239 1 4 <NA> AL 3104873 2691875 5796748 0.4643768
55 AL alabama -87.57087 30.32665 1 5 <NA> AL 3104873 2691875 5796748 0.4643768
56 AL alabama -87.58806 30.32665 1 6 <NA> AL 3104873 2691875 5796748 0.4643768
head(mydata)
Sold.To.Customer City State From.To Annualized.Opportunity location lat long
16426 10000110 NEW YORK NY FROM 13.39604 NEW YORK,NY 40.71435 -74.00597
117702 10016487 INDEPENDENCE OH FROM 12.99607 INDEPENDENCE,OH 41.36866 -81.63790
165397 11001422 DETROIT MI FROM 11.37319 DETROIT,MI 42.33143 -83.04575
13322 10000096 SAINT LOUIS MO FROM 10.79246 SAINT LOUIS,MO 38.62700 -90.19940
224992 11067228 HOUSTON TX FROM 10.69957 HOUSTON,TX 29.76019 -95.36939
101902 10014909 MANHASSET NY FROM 10.59856 MANHASSET,NY 40.79788 -73.69957

Change colour scheme for ggplot geom_polygon in R

I'm creating a map using the maps library and ggplot's geom_polygon. I'd simply like to change the default blue, red, purple colour scheme to something else. I'm extremely new to ggplot so please forgive if I'm just not using the right data types. Here's what the data I'm using looks like:
> head(m)
region long lat group order subregion Group.1 debt.to.income.ratio.mean ratio total
17 alabama -87.46201 30.38968 1 1 <NA> alabama 12.4059 20.51282 39
18 alabama -87.48493 30.37249 1 2 <NA> alabama 12.4059 20.51282 39
19 alabama -87.52503 30.37249 1 3 <NA> alabama 12.4059 20.51282 39
20 alabama -87.53076 30.33239 1 4 <NA> alabama 12.4059 20.51282 39
21 alabama -87.57087 30.32665 1 5 <NA> alabama 12.4059 20.51282 39
22 alabama -87.58806 30.32665 1 6 <NA> alabama 12.4059 20.51282 39
> head(v)
Group.1 debt.to.income.ratio.mean ratio region total
alabama alabama 12.40590 20.51282 alabama 39
alaska alaska 11.05333 33.33333 alaska 6
arizona arizona 11.62867 25.55556 arizona 90
arkansas arkansas 11.90300 5.00000 arkansas 20
california california 11.00183 32.59587 california 678
colorado colorado 11.55424 30.43478 colorado 92
Here's the code:
library(ggplot2)
library(maps)
states <- map_data("state")
m <- merge(states, v, by="region")
m <- m[order(m$order),]
p<-qplot(long, lat, data=m, group=group, fill=ratio, geom="polygon")
I've tried the below and more:
cols <- c("8" = "red","4" = "blue","6" = "darkgreen", "10" = "orange")
p + scale_colour_manual(values = cols)
p + scale_colour_brewer(palette="Set1")
p + scale_color_manual(values=c("#CC6666", "#9999CC"))
The problem is that you are using a color scale but are using the fill aesthetic in the plot. You can use scale_fill_gradient() for two colors and scale_fill_gradient2() for three colors:
p + scale_fill_gradient(low = "pink", high = "green") #UGLY COLORS!!!
I was getting issues with scale_fill_brewer() complaining about a continuous variable supplied when a discrete variable was expected. One easy fix is to create discrete bins with cut() and then use that as the fill aesthetic:
m$breaks <- cut(m$ratio, 5) #Change to number of bins you want
p <- qplot(long, lat, data = m, group = group, fill = breaks, geom = "polygon")
p + scale_fill_brewer(palette = "Blues")

Resources