Trying to create a choropleth map showing state population also labeling capital cities. I had two data frame initially but was not able not add ggplot 1 to ggplot 2, so I combined two data frames together, part of the table looks like this:
basically trying to combines these two images together:
and
I've written
ggplot(spr, aes(long, lat)) + borders("state") + geom_point() +
coord_quickmap() +geom_label_repel(aes(label = city), size = 2) +
geom_polygon(aes(long, lat, group = capital, fill = pcls),color = "grey") +
coord_map("bonne", parameters=45) +ggthemes::theme_map() +
scale_fill_brewer(palette = "Reds")
but map looks off:
i think it's the polygon part is throwing me off but not sure what to do about it.
You'll need shapefiles, or at least have the borders known to map the data to.
In keeping with your question from the other day, you can still use state. scale_fill_brewer is designed for use with discrete variables. Use scale_fill_gradientn, specifying brewer.pal. Add the capitals layer in there as desired.
library(ggplot2)
library(usmap)
library(maps)
library(ggrepel)
library(ggthemes)
us <- map_data("state") # get the data to plot and map data to
data(statepop)
pops <- statepop
pops$full <- tolower(pops$full)
ggplot() + geom_map(data = us, map = us, aes(long, lat, map_id = region), fill = "#ffffff", color = "#ffffff", size = 0.15) +
geom_map(data = pops, map = us, aes(fill = pop_2015, map_id = full), size = 0.15) +
coord_map("bonne", parameters=45) +
scale_fill_gradientn(colors = brewer.pal(9, "Reds")) + #adjust the number as necessary
borders("state") +
ggthemes::theme_map()
Related
I would like to create a map of the US showing both state and county boundaries (i.e. state boundaries in a different color). I typically do this using either shape files that I import or using ggplot2's map_data function. However, I face three obstacles.
1) I cannot install gdal and geos in my computing environment so that precludes the use of any shape files or GeoJSON files (my attempts to map county level shape files loaded using fastshp have not been successful but I'm open to any solution that can reproduce the map below but with state boundaries included).
2) I need to include Hawaii and Alaska, so that excludes the use of map_data from ggplot2.
3) I need the map to include both state AND county boundaries, which makes the use of usmap package problematic as its a wrapper function for ggplot2 but without the ease and general ability to customize to the level of a raw ggplot2 object.
4) Also, cannot make use of sf package bc it has a non R library dependency (units package depends on C library libudunits2).
What I need: A map that can project Alaska and Hawaii and display state and county boundaries using contrasting colors and I need to accomplish all this without resorting to any packages that rely on rgeos, rgdal, and/or units.
What I've tried thus far plot_usmap from the usmap package:
library(dplyr)
library(stringr)
library(ggplot2)
library(usmap)
library(mapproj)
devtools::install_github("wmurphyrd/fiftystater")
library(fiftystater)
county_data<-read.csv("https://www.ers.usda.gov/webdocs/DataFiles/48747/PovertyEstimates.csv?v=2529") %>% #
filter(Area_name != "United States") %>%
select(FIPStxt, Stabr, Area_name, PCTPOVALL_2017) %>%
rename(fips = FIPStxt)
crimes <- data.frame(state = tolower(rownames(USArrests)), USArrests)
state_map <- map_data("state")
plot_usmap(data = county_data, values = "PCTPOVALL_2017", color = "white") +
geom_map(data = crimes, aes(map_id = state), map = fifty_states, color= "red") +
geom_path(data = state_map, aes(x =long , y=lat), color= "red")+
expand_limits(x = fifty_states$long, y = fifty_states$lat) +
theme(legend.position = "none") +
theme_map() #no go
plot_usmap(data = county_data, values = "PCTPOVALL_2017", color = "white") +
geom_map(data = crimes, aes(map_id = state), map = fifty_states, color= "red") +
expand_limits(x = fifty_states$long, y = fifty_states$lat) +
theme(legend.position = "none") +
theme_map() #no go
plot_usmap(data = county_data, values = "PCTPOVALL_2017", color = "white") +
geom_map(data = crimes, aes(map_id = state, color= "red"), map = fifty_states) +
expand_limits(x = fifty_states$long, y = fifty_states$lat) +
theme(legend.position = "none") +
theme_map() #no go
What I suspect is happening is that one layer (the original ggplot code) is projected using a different CRS system than the other layer -generated by plot_usmap. That second layer results in a very small red dot (see circle in map below). Not sure how to re-project without geos/gdal installed. See the map below with the black circle highlighting where the red dot is.
Ok after some suggestions from the package author and some of my own tinkering around I was finally able to get my desired output.
This approach is ideal for folks looking to generate a US map w/ Alaska and Hawaii included who...
1) Do not have the ability to install non-R packages in the
environment their R engine is running on (e.g. lack admin access)
2) Need to map both county and state boundaries using contrasting
colors
library(dplyr)
library(ggplot2)
library(usmap)
#Example data (poverty rates)
county_data<-read.csv("https://www.ers.usda.gov/webdocs/DataFiles/48747/PovertyEstimates.csv?v=2529") %>% #
filter(Area_name != "United States") %>%
select(FIPStxt, Stabr, Area_name, PCTPOVALL_2018) %>%
rename(fips = FIPStxt)
states <- plot_usmap("states",
color = "red",
fill = alpha(0.01)) #this parameter is necessary to get counties to show on top of states
counties <- plot_usmap(data = county_data,
values = "PCTPOVALL_2018",
color = "black",
size = 0.1)
Using the layers meta info already embedded in the data from us_map
ggplot() +
counties$layers[[1]] + #counties needs to be on top of states for this to work
states$layers[[1]] +
counties$theme +
coord_equal() +
theme(legend.position="none") +
scale_fill_gradient(low='white', high='grey20') #toggle fill schema using vanilla ggplot scale_fill function
Using just the raw data obtained from the us_map package
ggplot() +
geom_polygon(data=counties[[1]],
aes(x=x,
y=y,
group=group,
fill = counties[[1]]$PCTPOVALL_2018),
color = "black",
size = 0.1) +
geom_polygon(data=states[[1]],
aes(x=x,
y=y,
group=group),
color = "red",
fill = alpha(0.01)) +
coord_equal() +
theme_map() +
theme(legend.position="none") +
scale_fill_gradient(low='white', high='grey20')
I have a basic map of India with states and borders, some labels, and a number of other specifications stored as a gg object. I'd like to generate a number of maps with a district layer, which will bear data from different variables.
To prevent the district maps overwriting state and country borders, it must be before all the previous code, which I'd like to avoid repeating.
I thought I could do this by calling on $layers for the gg object as per this answer. However, it throws an error. Reprex is below:
library(ggplot2)
library(sf)
library(raster)
# Download district and state data (should be less than 10 Mb in total)
distSF <- st_as_sf(getData("GADM",country="IND",level=2))
stateSF <- st_as_sf(getData("GADM",country="IND",level=1))
# Add border
countryborder <- st_union(stateSF)
# Basic plot
basicIndia <- ggplot() +
geom_sf(data = stateSF, color = "white", fill = NA) +
geom_sf(data = countryborder, color = "blue", fill = NA) +
theme_dark()
basicIndia
# Data-bearing plot
districts <- ggplot() +
geom_sf(data = distSF, fill = "gold")
basicIndia$layers <- c(geom_sf(data = distSF, fill = "gold"), basicIndia$layers)
basicIndia
#> Error in y$layer_data(plot$data): attempt to apply non-function
Intended outcome
Any help would be much appreciated!
I'm still not sure if I'm missing a detail of what you're looking for, but ggplot2 draws layers in the order you provide them. So something like
ggplot(data) +
geom_col() +
geom_point(...) +
geom_line(...)
will draw columns, then points on top of those, then lines on top of the previous layers.
Same goes for sf plots, which makes it easy to make a plot like this of multiple geographic levels.
(I'm using rmapshaper::ms_simplify on the sf objects just to simplify them and speed things up for plotting.)
library(dplyr)
library(ggplot2)
library(sf)
library(raster)
distSF <- st_as_sf(getData("GADM",country="IND",level=2)) %>% rmapshaper::ms_simplify()
...
Then you can plot by adding up the layers in the order you need them displayed. Keep in mind that if you needed to do other calculations with any of these sfs, you could do that in advance or inside your geom_sf.
ggplot() +
geom_sf(data = distSF, fill = "gold", size = 0.1) +
geom_sf(data = stateSF, color = "white", fill = NA) +
geom_sf(data = countryborder, color = "blue", fill = NA)
Regarding trying to add one plot to another: ggplot2 works in layers, so you create a single base ggplot object, then add geometries on top of it. So you could make, for example, two valid plots:
state_plot <- ggplot(stateSF) +
geom_sf(color = "white", fill = NA)
country_plot <- ggplot(countryborder) +
geom_sf(color = "blue", fill = NA)
But you can't add them, because you would have 2 base ggplot objects. This should be the error you mentioned:
state_plot +
country_plot
#> Error: Don't know how to add country_plot to a plot
Instead, if you need to make a plot, then add something else on top of it, make the base ggplot, then add geometry layers, such as a geom_sf with a different set of data.
state_plot +
geom_sf(data = countryborder, fill = NA, color = "blue")
Created on 2018-10-29 by the reprex package (v0.2.1)
If you look at geom_sf(data=distSF) you'll see that it is a list made up of two elements - you want the first one which contains the layer information, so geom_sf(data = distSF, fill = "gold")[[1]] should work.
districts <- ggplot() +
geom_sf(data = distSF, fill = "gold")
basicIndia$layers <- c(geom_sf(data = distSF, fill = "gold")[[1]], basicIndia$layers)
I am having trouble in separating the legends in a ggplot2 graph with multiple layers. What my plot does is to fill different municipalities according to the number of textile companies present there and I also plot the plant localization with geom_point. My guess is to use aes.override() somehow, but I haven't been able to do this still. The solutions that I have read do not deal with a different variable for the plots detailed in the aes() of geom_point().
If you want to test the code below, you could download the shapefile for the brazilian municipalities here, use readOGR and fortify, then choose to fill the municipalities with your preference with fill and set arbitrarily random points within Brazil for geom_point() creating a different variable, such as lat_plant and long_plant below. The region column below details brazilian regions -- in this case, the "1" details the northern region of Brazil.
The Code
#setting the ggplot
library(ggplot2)
gg2 < -ggplot(data = out[out$region =="1",],
aes(x = long, y = lat, group = group, fill = as.factor(companies))) +
geom_polygon() +
ggtitle("title") +
scale_fill_discrete(name = "Number of Textile Companies") +
theme(plot.title = element_text(size = 30, face = "bold")) +
theme(legend.text = element_text(size = 12),
legend.title = element_text(colour = "blue", size = 16, face = "bold"))
#graph output
gg2 +
geom_point(data = out[out$region =="1",], aes(x = long_plant, y = lat_plant), color = "red")
What I am getting as legend is this:
And I would like to separate it, detailing that the dots as localizations and the colors as the filling for the number of textile companies in the region.
I leave another option for you. hmgeiger treated the number of textile companies as factor. But, I rather treated the variable as a continuous variable. Since there is NO reproducible data, I created a sample data by myself. Here, I created random samples uing longitude and latitude of Brazil, and made sure that some data points stay in Brazil. whatever2 contains data points staying in Brazil. I did a bit of trick here as well. I added a new column called Factory location. This is the dummy variable for adding color to data points in the final graphic. hmgeiger created Dummy.var that contains characters for you. I rather left "" in this column since you may not want to see any text in legend.
For your legend issue, as Antonio mentioned and hmgeiger did, you need to add color in aes() in geom_point(). This solves it. I did a bit more thing for you. If you do not know how many factories exist in each municipal, you need to count the number of factories. I did the job using poly.count() in the GISTools package and created another data frame that contains the numbers of factories in each municipal.
When I drew the map, I had three layers. One is for the polygons and another for filling the polygons with colors. They are done with geom_cartogram() from the ggalt package. The key thing is that you need to have a common key column for map_id. id in the first geom_cartogram() and ind in the second geom_cartogram() are identical information. In geom_point() you need color in aes(). The legend has a continuous bar for the number of factories and a single dot for factory location. No text exists next to it. So this makes the legend tidy, I think.
library(raster)
library(tidyverse)
library(GISTools)
library(RColorBrewer)
library(ggalt)
library(ggthemes)
# Get polygon data for Brazil
brazil <- getData("GADM", country = "brazil", level = 1)
mymap <- fortify(brazil)
# Create dummy data staying in the polygons
# For more information: https://stackoverflow.com/questions/47696382/removing-data-outside-country-map-boundary-in-r/47699405#47699405
set.seed(123)
mydata <- data.frame(long = runif(200, min = quantile(mymap$long)[1], max = quantile(mymap$long)[4]),
lat = runif(200, min = quantile(mymap$lat)[1], max = quantile(mymap$lat)[4]),
factory = paste("factory ", 1:200, sep = ""),
stringsAsFactors = FALSE)
spdf <- SpatialPointsDataFrame(coords = mydata[, c("long", "lat")], data = mydata,
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))
whatever <- spdf[!is.na(over(spdf, as(brazil, "SpatialPolygons"))), ]
whatever2 <- as.data.frame(whatever) %>%
mutate(`Factory location` = "")
# Now I check how many data points (factories) exist in each polygon
# and create a data frame
factory.num <- poly.counts(pts = whatever, polys = brazil)
factory.num <- stack(factory.num)
ggplot() +
geom_cartogram(data = mymap, aes(x = long, y = lat, map_id = id),
map = mymap) +
geom_cartogram(data = factory.num, aes(fill = values, map_id = ind),
map = mymap) +
geom_point(data = whatever2, aes(x = long, y = lat, color = `Factory location`)) +
scale_fill_gradientn(name = "Number of factories", colours = brewer.pal(5, "Greens")) +
coord_map() +
theme_map()
FYI, the link you posted to download the shape file is quite slow, at least to download to a US computer.
This link has downloads that work a lot better, and also shows how to read in shape data: https://dioferrari.wordpress.com/2014/11/27/plotting-maps-using-r-example-with-brazilian-municipal-level-data/
I made an example using the regions rather than municipalities data to keep it simple.
Data I used available for download here: https://drive.google.com/file/d/0B64xLcn8DZfwakNMbHFLQWo4YzA/view?usp=sharing
#Load libraries.
library(rgeos)
library(rgdal)
library(ggplot2)
#Read in and format map data.
regions_OGR <- readOGR(dsn="/Users/hmgeiger/Downloads/regioes_2010",
layer = "regioes_2010")
map_regions <- spTransform(regions_OGR,CRS("+proj=longlat +datum=WGS84"))
map_regions_fortified <- fortify(map_regions)
#We make there be 0, 1, or 3 textile companies.
#map_regions_fortified is in order by ID (region).
#So, we add a column with the number of textile companies
#repeated the right number of times for how many of each region there is.
num_rows_per_region <- data.frame(table(map_regions_fortified$id))
map_regions_fortified <- data.frame(map_regions_fortified,
Num.factories = factor(rep(c(1,0,1,3,1),times=num_rows_per_region$Freq)))
#First, plot without any location dots.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
Now, let's add the factory locations.
#Set latitude and longitude based on the number of factories per region.
factory_locations <- data.frame(long = c(-65,-55,-51,-44,-42,-38),
lat = c(-5,-15,-27,-7,-12,-8))
#Add a dummy variable, which then allows the colour of the dots
#to be a part of the legend.
factory_locations <- data.frame(factory_locations,
Dummy.var = rep("One dot = one factory location",times=nrow(factory_locations)))
#Replot adding factory location dots.
#We will use black dots here since will be easier to see.
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
#Bonus: Let's change the color vector to something more color-blind friendly.
mycol <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442",
"#0072B2", "#D55E00", "#CC79A7","#490092")
ggplot()+geom_polygon(data=map_regions_fortified,
aes(x = long,y = lat, group=group, fill=Num.factories),colour="black")
+ geom_point(data = factory_locations,aes(x = long,y = lat,colour = Dummy.var))
+ scale_colour_manual(values="black") + labs(colour="")
+ scale_fill_manual(values=mycol)
My challenge is to add several text labels around the same point on a map. The MWE data frame puts six sports teams around New York City.
library(maps)
library(mapproj)
library(maptools)
all_states <- map_data("state") # load map outline and borders for US states
ny <- subset(all_states, region == "new york") # select only New York
nyteams <- c("Mets", "Yankees", "Knicks", "Giants", "Islanders", "Jets") # for text labels
df <- data.frame(long = rep(-73.99, times = 6), lat = rep(40.71, times =6)) # NYC coordinates for each team
df <- cbind(nyteams, df) # combine the columns to create the data frame for ggplot2
df <- cbind(df, rownum = seq(1:nrow(df))) # variable for spreading text labels by vertical latitude
It is simple to add the text labels vertically by incrementing the latitude of each point.
df$lat2 <- df$lat + (0.1*df$rownum) # to spread the text labels up the latitude
ggplot(data = df, aes(long, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") + # did not include geom_point() since text labels are sufficient
geom_text(aes(label = nyteams), size = 3)
But I worked out manually a system for placing the first team at the NYC latitude and longitude, the 2nd team just above it, the 3rd team to the right, the 4th just below it, and the 5th to the left (I can shorten names of teams to avoid over-writing Islanders, for example), etc..
df$lat2 <- df$lat + c(0, 0.1, 0.0, -0.1, 0.0, 0.2)
df$long2 <- df$long + c(0, 0.0, 0.3, 0.0, -0.5, 0.0)
ggplot(data = df, aes(long2, lat2)) +
geom_polygon(data = ny, aes(x=long, y=lat, group = group), colour="grey70", fill="white") +
coord_map("mercator") +
geom_text(aes(label = nyteams), size = 3)
Programming Question: How might R create such multiple placements of text labels without so much manual intervention?
I tried position = "jitter" and position = "dodge" to no avail.
Several other questions on SO have asked about adding multiple text annotations to a map. Limitations afflict all of them.
Can you plot a table onto a ggmap similar to annotation_custom method for non- Cartesian coordinates
Dynamic position for ggplot2 objects (especially geom_text)?
But directlabels package does not work on individual points
https://stats.stackexchange.com/questions/16057/how-do-i-avoid-overlapping-labels-in-an-r-plot/62856#62856
FField package
Here's one "semi"-automated approach, which places the labels in a circular pattern around the point.
ny.coords <- data.frame(long=-73.99, lat=40.71)
n <- length(nyteams)
r <- 0.3
th <- seq(0,2*(n-1)/n*pi,len=n)
coords <- data.frame(long=r*sin(th)+ny.coords$long,lat=r*cos(th)+ny.coords$lat)
ggplot(data=ny,aes(x=long,y=lat)) +
geom_polygon(data = ny, aes(group = group), colour="grey70", fill="white") +
geom_text(data=coords,aes(label = nyteams), size = 3)+
geom_point(data=ny.coords,color="red",size=3)+
coord_map("mercator",xlim=c(-75,-73),ylim=c(40,41.5))
The "semi" bit is that I picked a radius (r) based on the scale of the map, but you could probably automate that as well.
EDIT: Response to OP's comment.
There's nothing in this approach that explicitly avoids overlaps. However, changing the line
th <- seq(0,2*(n-1)/n*pi,len=n)
to
th <- seq(0,2*(n-1)/n*pi,len=n) + pi/(2*n)
produces this:
which has the the label positions rotated a bit and can (sometimes) avoid overlaps, if there are not too many labels.
Also, you should check out the directlabels package.
I have sampled 10,000 coordinates from my data in this file. I have around 130,000 points.
https://www.dropbox.com/s/40hfyx6a5hsjuv7/data.csv
I am trying to plot these points on the Americas map using ggplot2. Here is my code.
library(ggplot2)
library(maps)
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot(data = data_coords, legend = FALSE) +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group)) +
geom_point(aes(x = lon, y = lat), shape = 19, size = 0.00001,
alpha = 0.3, colour = "red") +
theme(panel.grid.major = element_blank()) +
theme(panel.grid.minor = element_blank()) +
theme(axis.text.x = element_blank(),axis.text.y = element_blank()) +
theme(axis.ticks = element_blank()) +
xlab("") + ylab("")
png("my_plot.png", width = 8000, height = 7000, res = 1000)
print(p)
dev.off()
The points seem to cover the whole area in which they were plotted. I would like them to be more smaller to better represent a location. You can see that I've set the size to 0.00001. I was just trying to see if it has any effect but it doesn't seem to help after a certain limit. Is this the best that is possible at this resolution or could it be reduced more?
I had actually plotted around 400,000 points but only on the US map before and they looked much better like below. Hoping to get something like this. Thanks.
https://www.dropbox.com/s/8d0niu9g6ygz0wo/Clusters_reduced.png
Try playing with very small values of alpha, instead of the point size:
http://docs.ggplot2.org/0.9.3.1/geom_point.html
# Varying alpha is useful for large datasets
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 1/1000)
Edit:
Additional ideas are given in the documentation. Here's a summary:
Details
The scatterplot is useful for displaying the relationship between two continuous variables, although it can also be used with one continuous and one categorical variable, or two categorical variables. See geom_jitter for possibilities.
The bubblechart is a scatterplot with a third variable mapped to the size of points. There are no special names for scatterplots where another variable is mapped to point shape or colour, however.
The biggest potential problem with a scatterplot is overplotting: whenever you have more than a few points, points may be plotted on top of one another. This can severely distort the visual appearance of the plot. There is no one solution to this problem, but there are some techniques that can help. You can add additional information with stat_smooth, stat_quantile or stat_density2d. If you have few unique x values, geom_boxplot may also be useful. Alternatively, you can summarise the number of points at each location and display that in some way, using stat_sum.
Another technique is to use transparent points, geom_point(alpha = 0.05).
Edit 2:
Combining the details from the manual with the hints in Transparency and Alpha levels for ggplot2 stat_density2d with maps and layers in R
This might look like the solution:
library(ggplot2)
library(maps)
data_coords <- read.csv("C:/Downloads/data.csv")
map_world <- map_data("world")
map_world <- subset(map_world, (lat >= -60 & lat <= 75))
map_world <- subset(map_world, (long >= -170 & long <= -30))
p <- ggplot( data = data_coords, legend = FALSE) +
geom_polygon( data = map_world, aes(x = long, y = lat, group = group)) +
stat_density2d( data = data_coords, aes(x=lon, y=lat, fill = as.factor(..level..)), size=1, bins=10, geom='polygon') +
scale_fill_manual(values = c("yellow","red","green","royalblue", "black","white","orange","brown","grey"))
png("my_plot2k.png", width = 2000, height = 2000, res = 500)
print(p)
dev.off()
Resulting image (not the best colour palette used):