Create choropleth map from coordinate points - r

I have a data frame consisting of multiple data points with specific geocoordinates (latitude and longitude). I'm looking to create a choropleth-style world map where geographical regions are shaded according to how many data points fall within the boundaries of the region.
Is there a simple way to accomplish what I'm trying to do in R, preferably using the "maps" package's world map and the "ggplot2" map-plotting functions?
Here is a minimally reproducible result of what I have:
library(ggplot2)
library(maps)
data <- data.frame(lat = 40.730610, lon = -73.935242)
ggplot() +
geom_polygon(data = map_data("world"), aes(x = long, y = lat, group = group, fill = group)) +
coord_fixed(1.3)
I've noticed that the fill parameter on plot item functions can be used to create a choropleth effect. Here, the fill parameter on the aes() function of the geom_polygon() function is used to create a choropleth where each group is color coded differently.

There are many ways to achieve this task. The general idea is to convert both the point data and polygon data to spatial objects. After that, count how many points fall within that polygon. I know we can do this using the sp package, which is widespread and well-known in the R community, but I decided to use the sf package because sf would be the next generation standard of spatial objects in R (https://cran.r-project.org/web/packages/sf/index.html). Knowing the usage and functionality of sf will probably be beneficial.
First, the OP provided an example point, but I decided to add more points so that we can see how to count the points and aggregate the data. To do so, I used the ggmap pakcage to geocode some cities that I selected as an example.
# Load package
library(tidyverse)
library(ggmap)
library(maps)
library(maptools)
library(sf)
# Create point data as a data frame
point_data <- data.frame(lat = 40.730610, lon = -73.935242)
# Geocode a series of cities
city <- c("Detroit", "Seattle", "Toranto", "Denver", "Mexico City", "Paris", "New Orleans",
"Tokyo", "Osaka", "Beijing", "Canberra", "New York", "Istanbul", "New Delhi",
"London", "Taipei", "Seoul", "Manila", "Bangkok", "Lagos", "Chicago", "Shanghai")
point_data2 <- geocode(city)
# Combine OP's example and the geocoding result
point_data3 <- bind_rows(point_data, point_data2)
Next, I converted the point_data3 data frame to the sf object. I will also get the polygon data of the world using the maps package and convert it to an sf object.
# Convert to simple feature object
point_sf <- st_as_sf(point_data3, coords = c("lon", "lat"), crs = 4326)
# Get world map data
worldmap <- maps::map("world", fill = TRUE, plot = FALSE)
# Convert world to sp class
IDs <- sapply(strsplit(worldmap$names, ":"), "[", 1L)
world_sp <- map2SpatialPolygons(worldmap, IDs = IDs,
proj4string = CRS("+proj=longlat +datum=WGS84"))
# Convert world_sp to simple feature object
world_sf <- st_as_sf(world_sp)
# Add country ID
world_sf <- world_sf %>%
mutate(region = map_chr(1:length(world_sp#polygons), function(i){
world_sp#polygons[[i]]#ID
}))
Now both point_sf and world_sf are sf objects. We can use the st_within function to examine which points are within which polygons.
# Use st_within
result <- st_within(point_sf, world_sf, sparse = FALSE)
# Calculate the total count of each polygon
# Store the result as a new column "Count" in world_sf
world_sf <- world_sf %>%
mutate(Count = apply(result, 2, sum))
The total count information is in the Count column of world_sf. We can get the world data frame as the OP did using the map_data function. We can then merge world_data and world_df.
# Convert world_sf to a data frame world_df
world_df <- world_sf
st_geometry(world_df) <- NULL
# Get world data frame
world_data <- map_data("world")
# Merge world_data and world_df
world_data2 <- world_data %>%
left_join(world_df, by = c("region"))
Now we are ready to plot the data. The following code is the same as the OP's ggplot code except that the input data is now world_data2 and fill = Count.
ggplot() +
geom_polygon(data = world_data2, aes(x = long, y = lat, group = group, fill = Count)) +
coord_fixed(1.3)

Related

How to create State & district level map in using GADM and ggplot?

I am using Covid data & looking to plot State & district level Indian data on map.
I have State, District Name of India along with Cases but do not have needed lat, long for them.
I came across this so post How to map an Indian state with districts in r?
and tried raster::getData("GADM", country = "India", level = 2) %>% as_tibble() but this doesn't work as it doesnt have lat,lon, shapefile etc.
library(raster)
library(rgdal)
library(rgeos)
state_level_map <- raster::getData("GADM", country = "India", level = 1) %>%
as_tibble() %>%
filter(NAME_1 == "Rajasthan") %>%
fortify()
ggplot() +
geom_map(data= state_level_map, map = state_level_map,
aes(x = long, y = lat, map_id = id, group = group))
I am new to spatial data / maps and not sure how exactly I can proceed in this situation. Is it possible to get lat, lon, shapefile etc. for State/districts name's info from any r packages or the only way is to manually google them for lat,lon ?
Appreciate any help.
You were almost there. Use sf for that.
library(raster)
library(sf)
library(rgeos)
library(dplyr)
state_level_map <- raster::getData("GADM", country = "India", level = 1) %>%
st_as_sf() %>%
filter(NAME_1 == "Rajasthan")
ggplot() +
geom_sf(data = state_level_map)
you can then easily use aes() to change your aesthetics of the ggplot as you normally would using variables.
sf uses a dataframe-like notation that incorporates both attribute data as well as geometries into a single and easy to use dataframe. just have a look at print(state_level_map). That is, you could join data using district names to augment you attributes and visualize them through aes(color = yourjoinedvar).

Spatial Visualization using country names - R

I have a dataframe with several columns in which i'd like to visualize some information.
I want to display the information using a world map. Like this:
In my dataframe ,i have a column with the countries names, but i don't have the latitude/longitude informations.
How can i make this plot using only the country names?
Many options. One is the rworldmap package.
library(rworldmap)
We need some data to map.
COVID <- read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", na.strings = "", fileEncoding = "UTF-8-BOM")
Aggregate to get the total cases.
library(dplyr)
CASES <- COVID %>% group_by(countriesAndTerritories) %>%
summarise(`Total cases` = sum(cases)) %>%
mutate(countriesAndTerritories=gsub("_", " ", countriesAndTerritories))
If you've already got your data, then you can start from here. Just two steps.
Step 1.
Join the map with your own data using "NAME" for joinCode and the name of the variable in your data that represents the country name for nameJoinColumn.
COVID.map <- joinCountryData2Map(CASES, joinCode = "NAME", nameJoinColumn = "countriesAndTerritories")
Step 2. Plot this object.
par(mar=c(0,0,1,0))
mapCountryData(COVID.map, nameColumnToPlot="Total cases")
It's not a particularly useful map because the data are highly skewed. But you can see how easy it is. The most difficult part is to ensure your country names match those from the package. You can see these from:
countryRegions$ADMIN
[1] "Afghanistan" "Akrotiri Sovereign Base Area" "Aland"
[4] "Albania" "Algeria" "American Samoa"
There's also a country synonym data base:
countrySynonyms
A ggplot version:
library(ggplot2)
library(scales)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", returnclass = "sf")
COVID.world <- merge(world, CASES, by.x="admin", by.y="countriesAndTerritories")
ggplot(data = COVID.world) +
geom_sf(aes(fill=Total)) +
scale_fill_gradient(label=comma) +
theme_void()

Putting Values on a County Map in R

I am using an excel sheet for data. One column has FIPS numbers for GA counties and the other is labeled Count with numbers 1 - 5. I have made a map with these values using the following code:
library(usmap)
library(ggplot2)
library(rio)
carrierdata <- import("GA Info.xlsx")
plot_usmap( data = carrierdata, values = "Count", "counties", include = c("GA"), color="black") +
labs(title="Georgia")+
scale_fill_continuous(low = "#56B1F7", high = "#132B43", name="Count", label=scales::comma)+
theme(plot.background=element_rect(), legend.position="right")
I've included the picture of the map I get and a sample of the data I am using. Can anyone help me put the actual Count numbers on each county?
Thanks!
Data
The usmap package is a good source for county maps, but the data it contains is in the format of data frames of x, y co-ordinates of county outlines, whereas you need the numbers plotted in the center of the counties. The package doesn't seem to contain the center co-ordinates for each county.
Although it's a bit of a pain, it is worth converting the map into a formal sf data frame format to give better plotting options, including the calculation of the centroid for each county. First, we'll load the necessary packages, get the Georgia data and convert it to sf format:
library(usmap)
library(sf)
library(ggplot2)
d <- us_map("counties")
d <- d[d$abbr == "GA",]
GAc <- lapply(split(d, d$county), function(x) st_polygon(list(cbind(x$x, x$y))))
GA <- st_sfc(GAc, crs = usmap_crs()#projargs)
GA <- st_sf(data.frame(fips = unique(d$fips), county = names(GAc), geometry = GA))
Now, obviously I don't have your numeric data, so I'll have to make some up, equivalent to the data you are importing from Excel. I'll assume your own carrierdata has a column named "fips" and another called "values":
set.seed(69)
carrierdata <- data.frame(fips = GA$fips, values = sample(5, nrow(GA), TRUE))
So now we left_join our imported data to the GA county data:
GA <- dplyr::left_join(GA, carrierdata, by = "fips")
And we can calculate the center point for each county:
GA$centroids <- st_centroid(GA$geometry)
All that's left now is to plot the result:
ggplot(GA) +
geom_sf(aes(fill = values)) +
geom_sf_text(aes(label = values, geometry = centroids), colour = "white")

Keep getting NAs when I run Over() function on Points(Lat,Lon) on shapefile polygons

I am not sure why i am keep getting NA whenever I run the Over function with Latitude and Longitude point on the polygon from shapefile. Please note that this is first time for me doing the spatial analysis, but I have done my research and replicated things, but didn't succeed. I need some points which are outside of the polygon to be NA, so I can focus on the real data.
I read these sources since these pertain to my cause but I can't work my problem out:
sp::over() for point in polygon analysis
https://gis.stackexchange.com/questions/133625/checking-if-points-fall-within-polygon-shapefile
https://gis.stackexchange.com/questions/278723/r-error-in-checking-point-inside-polygon
Here is my code chunk
library(sp)
library(rgdal)
library(readr)
gainsville_df <- read_csv("311_Service_Requests__myGNV_.csv")
gnv <- readOGR("~\\Downloads\\GIS_cgbound", layer = "cgbound")
gnv_latlon <- spTransform(gnv, CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
gnv_raw <- data.frame(Longitude= gainsville_df$Longitude, Latitude= gainsville_df$Latitude)
coordinates(gnv_raw) <- ~Longitude + Latitude
proj4string(gnv_raw) <- proj4string(gnv)
over(gnv_raw, as(gnv,"SpatialLinesDataFrame"))
#Yeilds:
# FID_cgboun Id Perimeter Area Acres Hectares Shape_Leng
#1 NA NA NA NA NA NA NA
# Desired Output:
# Whereas I should have seen which gainesville Latitudes and Longitude are within the shpaefile
# polygon so I can drop the outliers, that have the NA. According to this, none of my LatLon points
# are inside the polygon.
The datafiles are here:
Shapefile: https://github.com/THsTestingGround/SO_readOGR_quest/tree/master/GIS_cgbound
reading csv file: https://github.com/THsTestingGround/SO_readOGR_quest/blob/master/311_Service_Requests__myGNV_.csv
I would appreciate if someone can help me out.
I realized that your point data is an sf object since you have POINT (-82.34323174 29.67058748) as character. Hence, I reconstructed your data first. I assigned a projection here as well.
library(tidyverse)
library(sf)
library(RCurl)
url <- getURL("https://raw.githubusercontent.com/THsTestingGround/SO_readOGR_quest/master/311_Service_Requests__myGNV_.csv")
mydf <- read_csv(url) %>%
mutate(Location = gsub(x = Location, pattern = "POINT \\(|\\)", replacement = "")) %>%
separate(col = "Location", into = c("lon", "lat"), sep = " ") %>%
st_as_sf(coords = c(3,4)) %>%
st_set_crs(4326)
I imported your shapefile using sf package since your data (mydf in this demonstration) is an sf object. When I imported the data, I realized that I had LINESTRING, not polygons. I believe this is the reason why over() did not work. Here I created polygons. Specifically, I joined all seven polygons all together.
mypoly <- st_read("cgbound.shp") %>%
st_transform(crs = 4326) %>%
st_polygonize() %>%
st_union()
Let's check how your data points and polygon are like. You surely have data points staying outside of the polygon.
ggplot() +
geom_sf(data = mypoly) +
geom_point(data = mydf, aes(x = Longitude, y = Latitude))
You said, "I need some points which are outside of the polygon to be NA." So I decided to create a new column in mydf using st_intersects(). If a data point stays in the polygon, you see TRUE in the new column, check. Otherwise, you see FALSE.
mutate(mydf,
check = as.vector(st_intersects(x = mydf, y = mypoly, sparse = FALSE))) -> result
Finally, check how data points are checked.
ggplot() +
geom_sf(data = mypoly) +
geom_point(data = result, aes(x = Longitude, y = Latitude, color = check))
If you wanna use over() mixing with this sf way, you can do the following.
mutate(mydf,
check = over(as(mydf, "Spatial"), as(mypoly, "Spatial")))
The last thing you wanna do is to subset the data
filter(result, check == TRUE)
THE SIMPLEST WAY
I demonstrated you how things are working with this sf approach. But the following is actually all you need. st_filter() extracts data points staying in mypoly. In this case, data points staying outside are removed. If you do not have to create NAs for these points, this is much easier.
st_filter(x = mydf, y = mypoly, predicate = st_intersects) -> result2
ggplot() +
geom_sf(data = mypoly) +
geom_sf(data = result2)

Mapping coordinates to the world map and labeling them

I have a dataframe with three columns: city_name, longitude, latitude. Using ggplot I am attempting to visualize the data using longitude and latitude as coordinates, which represent the given city. I also want to label each point with the city name. Unfortunately the scale isn't quite right, so the points are mapped in the right location.
Example data for dataframe:
city_name <- c("Berlin", "Brussels", "Paris")
longitude <- c("13.405", "4.3517", "2.3522")
latitude <- c("52.52", "50.8503", "48.8566")
df <- data.frame(city_name, longitude, latitude)
I am using ggplot2.
mapWorld <- borders("world", colour="gray50", fill="gray50") # create a layer of borders
ggplot(df, aes(x= longitude, y= latitude, label=Name))+
geom_point() +geom_text(aes(label=city_name),hjust=0, vjust=0) + mapWorld
Current result:
https://imgur.com/K3RvqTm
Expected result would be mapping the coordinates to their correct location.
Thank you all in advance!
The issue seems to stem from the format of your latitude and longitude data. Instead of quoting each coordinate, just refer to them without quotes.
I also recommend leaflet for a wider array of mapping functionality. The code below worked for me:
longitude <- c(13.405, 4.3517, 2.3522)
latitude <- c(52.52, 50.8503, 48.8566)
df <- data.frame(city_name, longitude, latitude)
library(leaflet)
df$longitude<-as.numeric(df$longitude)
df$latitude<-as.numeric(df$latitude)
leaflet() %>%
addTiles()%>%
addMarkers(data=df,lng=~longitude,lat=~latitude) %>%
setView(10,50,zoom=4)
On top of the solution already provided, you might find it helpful to look into the sf package which, in my opinion, makes spatial data much more pleasant to work with. For example you can do:
library(ggrepel)
library(sf)
library(ggplot2)
mapWorld <- borders("world", colour="gray50", fill="gray50") # create a layer of borders
# define data frame ensuring lat and lon are numeric vectors
df <- data.frame(city_name = c("Berlin", "Brussels", "Paris"),
longitude = c(13.405, 4.3517, 2.3522),
latitude = c(52.52, 50.8503, 48.8566))
# convert into an sf object, letting it know the columns we want to use for X and Y
# setting crs = 4326 for lon/lat data and remove = F to stop those columns from being dropped
df_sf <- st_as_sf(df, coords=c('longitude', 'latitude'), crs = 4326, remove = F)
# it plays nicely with ggplot via the 'geom_sf' geom
ggplot(df_sf)+
mapWorld +
geom_sf() +
geom_text_repel(aes(x=longitude, y=latitude,label=city_name))
You'll notice sf objects come with their own 'geometry' column which is recognised and plays nicely with ggplot. One thing to note is be careful with your layer ordering - by adding mapWorld to your ggplot as the last layer, it will appear at the very top of the plot and may cover your points!

Resources