How to properly join data and geometry using ggmap - r

An image is worth a thousand words:
Observed behaviour: As can be seen from the image above, countries' names do not match with their actual geometries.
Expected behaviour: I would like to properly join a data frame with its geometry and display the result in ggmap.
I've previously joined different data frames, but things get wrong by the fact that apparently ggmap needs to "fortify" (actually I don't know what really means) data frame in order to display results.
This is what I've done so far:
library(rgdal)
library(dplyr)
library(broom)
library(ggmap)
# Load GeoJSON file with countries.
countries = readOGR(dsn = "https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/a6f69b6c3b4a75b02858e966b9d36c85982cbd32/countries.geojson")
# Load dataframe.
df = read.csv("https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/a6f69b6c3b4a75b02858e966b9d36c85982cbd32/sample-dataframe.csv")
# Join geometry with dataframe.
countries$iso_a2 = as.factor(countries$iso_a2)
countries#data = left_join(countries#data, df, by = c('iso_a2' = 'country_code'))
# Convert to dataframe so it can be used by ggmap.
countries.t = tidy(countries)
# Here's where the problem starts, as by doing so, data has been lost!
# Recover attributes' table that was destroyed after using broom::tidy.
countries#data$id = rownames(countries#data) # Adding a new id variable.
countries.t = left_join(countries.t, countries#data, by = "id")
ggplot(data = countries.t,
aes(long, lat, fill = country_name, group = group)) +
geom_polygon() +
geom_path(colour="black", lwd=0.05) + # polygon borders
coord_equal() +
ggtitle("Data and geometry have been messed!") +
theme(axis.text = element_blank(), # change the theme options
axis.title = element_blank(), # remove axis titles
axis.ticks = element_blank()) # remove axis ticks

While your work is a reasonable approach - I would like to rethink your design, mainly because of two simple reasons:
1) while GeoJSON is the future, R still heavily relies on the sp package and its correspondent sp* objects - very soon you wish you had switched early on. It`s just about the packages and most of them (if not all) rely on sp* objects.
2) ggplot has great plotting capabilities combined with ggmap - but its still quite limited compared to sp* in combination with leaflet for R etc.
probably the fastest way to go is simple as:
library(sp)
library(dplyr)
library(geojsonio)
library(dplyr)
library(tmap)
#get sp* object instead of geojson
countries <- geojsonio::geojson_read("foo.geojson",what = "sp")
#match sp* object with your data.frame
countries#data <- dplyr::left_join(countries#data, your_df, by =
c("identifier_1" = "identifier_2"))
#creates a fast and nice looking plot / lots of configuration available
p1 <- tm_shape(countries) +
tm_polygons()
p1
#optional interactive leaflet plot
tmap_leaflet(p1)
It is written out of my head / bear with me if there are minor issues.
It is a different approach but its at least in my eyes a faster and more concise approach in R right now (hopefully geojson will receive more support in the future).

There is a reason for the messed up behaviour.
countries starts out as a large SpatialPolygonsDataFrame with 177 elements (and correspondingly 177 rows in countries#data). When you perform left_join on countries#data and df, the number of elements in countries isn't affected, but the number of rows in countries#data grows to 210.
Fortifying countries using broom::tidy converts countries, with its 177 elements, into a data frame with id running from 0 to 176. (I'm not sure why it's zero-indexed, but I usually prefer to specify the regions explicitly anyway).
Adding id to countries#data based on rownames(countries#data), on the other hand, results in id values running from 1 to 210, since that's the number of rows in countries#data after the earlier join with df. Consequently, everything is out of sync.
Try the following instead:
# (we start out right after loading countries & df)
# no need to join geometry with df first
# convert countries to data frame, specifying the regions explicitly
# (note I'm using the name column rather than the iso_a2 column from countries#data;
# this is because there are some repeat -99 values in iso_a2, and we want
# one-to-one matching.)
countries.t = tidy(countries, region = "name")
# join with the original file's data
countries.t = left_join(countries.t, countries#data, by = c("id" = "name"))
# join with df
countries.t = left_join(countries.t, df, by = c("iso_a2" = "country_code"))
# no change to the plot's code, except for ggtitle
ggplot(data = countries.t,
aes(long, lat, fill = country_name, group = group)) +
geom_polygon() +
geom_path(colour="black", lwd = 0.05) +
coord_equal() +
ggtitle("Data and geometry are fine") +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
p.s. You don't actually need the ggmap package for this. Just the ggplot2 package that it loads.

Related

Map FAO fishing areas in R

I would like to make a map in R that colours in the FAO Fishing Areas according to a data set (in my case, length data of shark species).
I would prefer to do a choropleth map in ggplot but other types of maps are also fine. Worst case scenario a base map of FAO areas that I can add bubbles to. Even just an existing base map of FAO areas would be great. Any suggestions welcome!
I went to this page and clicked through to find this link to retrieve a GeoJSON file:
download.file("http://www.fao.org/fishery/geoserver/fifao/ows?service=WFS&request=GetFeature&version=1.0.0&typeName=fifao:FAO_AREAS_CWP&outputFormat=json", dest="FAO.json")
From here on, I was following this example from the R graph gallery, with a little help from this SO question and these notes:
library(geojsonio)
library(sp)
library(broom)
library(ggplot2)
library(dplyr) ## for joining values to map
spdf <- geojson_read("FAO.json", what = "sp")
At this point, plot(spdf) will bring up a plain (base-R) plot of the regions.
spdf_fortified <- tidy(spdf)
## make up some data to go with ...
fake_fish <- data.frame(id = as.character(1:324), value = rnorm(324))
spdf2 <- spdf_fortified %>% left_join(fake_fish, by = "id")
ggplot() +
geom_polygon(data = spdf2, aes( x = long, y = lat, group = group,
fill = value), color="grey") +
scale_fill_viridis_c() +
theme_void() +
theme(plot.background = element_rect(fill = 'lightgray', colour = NA)) +
coord_map() +
coord_sf(crs = "+proj=cea +lon_0=0 +lat_ts=45") ## Gall projection
ggsave("FAO.png")
notes
some of the steps are slow, it might be worth looking up how to coarsen/lower resolution of a spatial polygons object (if you just want to show the picture, the level of resolution might be overkill)
to be honest the default sequential colour scheme might be better but all the cool kids seem to like "viridis" these days so ...
There are probably better ways to do a lot of these pieces (e.g. set map projection, fill in background colour for land masses, ... ?)

How to combine sf elements (layers) in R

despite having some experience with R, I am much less experienced using R for GIS-like tasks.
I have a shapefile of all communities within Germany and created a new object that only shows the borders of the 16 states of Germany.
gem <- readOGR(path/to/shapefile.shp) # reading shapefile
gemsf <- st_read(path/to/shapefile.shp) # reading shapefile as sf object
f00 <- gUnaryUnion(gem, id = gem#data$SN_L) # SN_L is the column of the various states - this line creates a new sp object with only the states instead of all communities
f002 <- sf::st_as_sf(f00, coords = c("x","y")) # turning the object into an sf object, so graphing with ggplot is easier
To check my work so far I plotted the base data (communities) using
gemsf %>%
ggplot(data = .,) + geom_sf( aes(fill = SN_L)) # fill by state
as well as plot(f002) which creates a plot of the 16 states, while the ggplot-code provides a nice map of Germany by community, with each state filled in a different color.
Now I'd like to overlay this with a second layer that indicates the borders of the states (so if you e.g. plot population density you can still distinguish states easily).
My attempt to do so, I used "standard procedure" and added another layer
ggplot() +
geom_sf(data = gemsf, aes(fill = SN_L)) + # fill by state
geom_sf(data = f002) # since the f002 data frame/sf object ONLY has a geometry column, there is no aes()
results in the following output: https://i.ibb.co/qk9zWRY/ggplot-map-layer.png
So how do I get to add a second layer that only provides the borders and does not cover the actual layer of interest below? In QGIS or ArcGIS, this is common procedure and not a problem, and I'd like to be able to recreate this in R, too.
Thank you very much for your help!
I found a solution which I want to share with everyone.
ggplot() +
geom_sf(data = gemsf_data, aes(fill = log(je_km2))) + # fill by state
geom_sf(data = f002, alpha = 0, color = "black") + # since the f002 data frame/sf object ONLY has a geometry column, there is no aes()
theme_minimal()
The trick was adding "alpha" not in the aes() part, but rather just as shown above.

R crashed when using Geom_Point for large data frame

Background: I have a large data frame data_2014, containing ~ 1,000,000 rows like this
library(tidyverse)
tibble(
date_time = "4/1/2014 0:11:00",
Lat = 40.7690,
Lon = -73.9549,
Base = "B02512"
)
Problem: I want to create a plot like this
This is what I've attempted to do:
library(tidyverse)
library(ggthemes)
library(scales)
min_lat <- 40.5774
max_lat <- 40.9176
min_long <- -74.15
max_long <- -73.7004
ggplot(data_2014, aes(Lon, Lat)) +
geom_point(size = 1, color = "chocolate") +
scale_x_continuous(limits = c(min_long, max_long)) +
scale_y_continuous(limits = c(min_lat, max_lat)) +
theme_map() +
ggtitle("NYC Map Based on Uber Rides Data (April-September 2014)")
However, when I ran this code, Rstudio crashed. I'm not particularly sure how to fix or improve this. Is there any suggestion?
A million points is a lot for ggplot2, but do-able if your computer is good enough. Yours may or may not be. Short of getting a bigger computer here's what you should do.
This is spatial data, so use the sf package.
library(sf)
data_2014 <- st_as_sf(data_2014, coords = c('Lon', 'Lat')) %>%
st_set_crs(4326)
If you're only plotting the points, get rid of the columns of data you don't need. I'm guessing they might include trip distance, time, borough, etc. Use dplyr's select, or whatever other method you're familiar with.
Try plotting some of the data, and then a little more. See where your computer slows down & stop there. You can plot the data from row 1:n, or sample x number of rows.
# try starting with 100,000 and go up from there.
n <- 100000
ggplot(data_2014[1:n,]) +
geom_sf()
# Alternatively sample a fraction of the data.
# Start with ~10% and go up until R crashes again.
data_2015 %>%
sample_frac(.1) %>%
ggplot() +
geom_sf()

Putting Values on a County Map in R

I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data
The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")

Chloropleth map in R looks bizarre

I have a data set of ~25,000 people that have complete postal codes. I'm trying to create a map of Canada at the FSA level but always seem to get bizarre results. I would appreciate if someone could point out where my mistakes are happening or what I'm missing.
library(rgeos)
library(maptools)
library(ggplot2)
fsas = readShapeSpatial('./Resources/FSA/gfsa000a11a_e.shp')
data = fortify(fsas, region = 'CFSAUID')
data$fsa = factor(data$id)
data$id = NULL
df$fsa = substr(df$Postal, 1, 3)
prvdr_cts = data.frame(table(df$fsa)) ; names(prvdr_cts) = c('fsa', 'ct')
plot.data = merge(data, prvdr_cts, by = 'fsa')
ggplot(plot.data, aes(x = long, y = lat, group = group, fill = ct)) +
geom_polygon() +
coord_equal()
This is my resulting plot
I got my map file from http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-2011-eng.cfm under 'Forward sortation areas'. df has two columns Person ID and FSA.
I've seen similar problems before when I forgot group = group (as #r.bot points out), but as you have that I wonder if it's because the shapefile you're using is highly detailed.
I suggest trying the sf package to load shapefiles which has superseded using readShapePoly. This has the advantages of being faster and you don't need to fortify(). I've also simplified the shapefile somewhat to make plotting faster. Finally, you need to development version of ggplot2 to use the new geom_sf() (ATOW):
install.packages(c("rmapshaper", "sf", "devtools"))
devtools::install_github("tidyverse/ggplot2")
library("ggplot2")
fsas = sf::read_sf("gfsa000a11a_e.shp")
fsas = rmapshaper::ms_simplify(fsas, keep = 0.05)
ggplot(fsas) + geom_sf()

Resources