I have a dataframe with several columns in which i'd like to visualize some information.
I want to display the information using a world map. Like this:
In my dataframe ,i have a column with the countries names, but i don't have the latitude/longitude informations.
How can i make this plot using only the country names?
Many options. One is the rworldmap package.
library(rworldmap)
We need some data to map.
COVID <- read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", na.strings = "", fileEncoding = "UTF-8-BOM")
Aggregate to get the total cases.
library(dplyr)
CASES <- COVID %>% group_by(countriesAndTerritories) %>%
summarise(`Total cases` = sum(cases)) %>%
mutate(countriesAndTerritories=gsub("_", " ", countriesAndTerritories))
If you've already got your data, then you can start from here. Just two steps.
Step 1.
Join the map with your own data using "NAME" for joinCode and the name of the variable in your data that represents the country name for nameJoinColumn.
COVID.map <- joinCountryData2Map(CASES, joinCode = "NAME", nameJoinColumn = "countriesAndTerritories")
Step 2. Plot this object.
par(mar=c(0,0,1,0))
mapCountryData(COVID.map, nameColumnToPlot="Total cases")
It's not a particularly useful map because the data are highly skewed. But you can see how easy it is. The most difficult part is to ensure your country names match those from the package. You can see these from:
countryRegions$ADMIN
[1] "Afghanistan" "Akrotiri Sovereign Base Area" "Aland"
[4] "Albania" "Algeria" "American Samoa"
There's also a country synonym data base:
countrySynonyms
A ggplot version:
library(ggplot2)
library(scales)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", returnclass = "sf")
COVID.world <- merge(world, CASES, by.x="admin", by.y="countriesAndTerritories")
ggplot(data = COVID.world) +
geom_sf(aes(fill=Total)) +
scale_fill_gradient(label=comma) +
theme_void()
Related
I am using Covid data & looking to plot State & district level Indian data on map.
I have State, District Name of India along with Cases but do not have needed lat, long for them.
I came across this so post How to map an Indian state with districts in r?
and tried raster::getData("GADM", country = "India", level = 2) %>% as_tibble() but this doesn't work as it doesnt have lat,lon, shapefile etc.
library(raster)
library(rgdal)
library(rgeos)
state_level_map <- raster::getData("GADM", country = "India", level = 1) %>%
as_tibble() %>%
filter(NAME_1 == "Rajasthan") %>%
fortify()
ggplot() +
geom_map(data= state_level_map, map = state_level_map,
aes(x = long, y = lat, map_id = id, group = group))
I am new to spatial data / maps and not sure how exactly I can proceed in this situation. Is it possible to get lat, lon, shapefile etc. for State/districts name's info from any r packages or the only way is to manually google them for lat,lon ?
Appreciate any help.
You were almost there. Use sf for that.
library(raster)
library(sf)
library(rgeos)
library(dplyr)
state_level_map <- raster::getData("GADM", country = "India", level = 1) %>%
st_as_sf() %>%
filter(NAME_1 == "Rajasthan")
ggplot() +
geom_sf(data = state_level_map)
you can then easily use aes() to change your aesthetics of the ggplot as you normally would using variables.
sf uses a dataframe-like notation that incorporates both attribute data as well as geometries into a single and easy to use dataframe. just have a look at print(state_level_map). That is, you could join data using district names to augment you attributes and visualize them through aes(color = yourjoinedvar).
I am using usmap and ggplot to plot population on a map. My data has two columns - population and zipcodes.
Question: How can I display data on city level using the same libraries or if you know of other libraries that can do the job.
Question: I am plotting California map and I want to zoom on LA county and nearby counties.
Below code gives me a nice California map and population as a color.
library(usmap)
library(ggplot2)
usmap::plot_usmap("counties",
include = ("CA") )
plot_usmap(data = data, values = "pop_2015", include = c("CA"), color = "grey") +
theme(legend.position = "right")+scale_fill_gradient(trans = "log10")
The tigris package makes downloading zip code tabulation areas fairly simple. You can download as a simple features dataframe so joining your data by zip code using dplyr functions is fairly easy. Here is a quick example:
library(tigris)
library(dplyr)
library(ggplot2)
df <- zctas(cb = TRUE,
starts_with = c("778"),
class = "sf")
## generate some sample data that
## can be joined to the downloaded data
sample_data <- tibble(zips = df$ZCTA5CE10,
values = rnorm(n = df$ZCTA5CE10))
## left join the sample data to the downloaded data
df <- df %>%
left_join(sample_data,
by = c("ZCTA5CE10" = "zips"))
## plot something
ggplot(df) +
geom_sf(aes(fill = values))
Rephrasing the question...I am preparing report and one part of it is spatial viz.
I have 2 datasets. First(Scores) is countries with their scores. Second one (Locations) is exact longitude and latitude that refers to an exact location inside those countries. Let that be examples:
Scores = data.frame( Country = c("Lebanon","UK","Chille"), Score =c(1,3.5,5))
Locations = data.frame(Location_Name = c("London Bridge", "US Embassy in Lebanon" , "Embassy of Peru in Santiago"),
LONG = c(-0.087749, 35.596614, -70.618236),
LAT = c(51.507911, 33.933586, -33.423285))
What i want to achieve is get filled map of the world (in my dataset i have every country) and color inside of its boundouries with the Score (Scores$Score) on continous scale.
On top of that I would like to add pins, bubbles or whatever marker of Locations from Locations dataframe.
So my desired outcome would be combination of this view:
and this view:
Ideally i would like also to be able to draw 2km radius around the Locations from Locations data.frame also.
I know to do them separately but cant seem to achieve it on one nice clean map.
I really appreciate any help or tips on this, got stuck for whole day on that one
As suggested by #agila you can use the tmap package.
First merge your Scores data with World so you can fill countries based on Scores data. Note that your Country column should match the name in World exactly when merging.
You will need to use st_as_sf from sf package to make your Locations an sf object to add to map.
tm_dots can show points. An alternative for bubbles is tm_bubbles.
library(tmap)
library(sf)
data(World)
Scores = data.frame(Country = factor(c("Mexico","Brazil","Chile"), levels = levels(World$name)),
Score =c(1,3.5,5))
Locations = data.frame(Location_Name = c("Rio de Janeiro", "US Embassy in Lebanon" , "Embassy of Peru in Santiago"),
LONG = c(-43.196388, 35.596614, -70.618236),
LAT = c(-22.908333, 33.933586, -33.423285))
map_data <- merge(World, Scores, by.x = "name", by.y = "Country", all = TRUE)
locations_sf <- st_as_sf(Locations, coords = c('LONG', 'LAT'))
tm_shape(map_data) +
tm_polygons("Score", palette = "-Blues") +
tm_shape(locations_sf) +
tm_dots(size = .1)
Map
I have a shapfile of school districts in Texas and am trying to use ggplot2 to highlight 10 in particular. I've tinkered with it and gotten everything set up, but when I spot checked it I realized the 10 districts highlighted are not in fact the ones I want to be highlighted.
The shapefile can be downloaded from this link to the Texas Education Agency Public Open Data Site.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# extract from shapefile data just the name and ID, then subset to only the districts of interest
dist_info <- data.frame(cbind(as.character(tex#data$NAME2), as.character(tex#data$FID)), stringsAsFactors=FALSE)
names(dist_info) <- c("name", "id")
dist_info <- dist_info[dist_info$name %in% districts, ]
# turn shapefile into df
tex_df <- fortify(tex)
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% dist_info$id, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
As you'll see, when the plot gets created it looks like it's done exactly what I want. The problem is, those ten districts highlighted are not hte ones in the districts vector above. I've re-ran everything clean numerous times, double checked that I wasn't having a factor/character conversion issue, and double checked within the web data explorer that the IDs that I get from the shapefile are indeed the ones that should match with my list of names. I really have no idea where this issue could be coming from.
This is my first time working with shapefiles and rgdal so if I had to guess there's something simple about the structure that I don't understand and hopefully one of you can quickly point it out for me. Thanks!
Here's the output:
Alternative 1
With the fortify function add the argument region specifying "NAME2", the column id will include your district names then. Then create your dummy fill variable based on that column.
I am not familiar with Texas districts, but I assume the result is right.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# turn shapefile into df
tex_df <- fortify(tex, region = "NAME2")
# create dummy fill var for if the district is one to be highlighted
tex_df$yes <- as.factor(ifelse(tex_df$id %in% districts, 1, 0))
# plot the graph
ggplot(data=tex_df) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
Alternative 2
Without passing the argument region to fortify function. Addressing seeellayewhy's issue implementing previous alternative. We add two layers, no need to create dummy variable or merge any data frame.
tex <- tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp"))
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# Subset the shape file into two
tex1 <- subset(tex, NAME2 %in% districts)
tex2 <- subset(tex, !(NAME2 %in% districts))
# Create two data frames
tex_df1 <- fortify(tex1)
tex_df2 <- fortify(tex2)
# Plot two geom_polygon layers, one for each data frame
ggplot() +
geom_polygon(data = tex_df1,
aes(x = long, y = lat, group = group, fill = "#CCCCCC"),
color = "#CCCCCC")+
geom_polygon(data = tex_df2,
aes(x = long, y = lat, group = group, fill ="#003082")) +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
When trying to implement #mpalanco's solution of adding the "region" argument to the fortify() function, I got an error that I could solve through numerous other stack posts (Error: isTRUE(gpclibPermitStatus()) is not TRUE). I also tried using broom::tidy() which is the non-deprecated euqivalent to fortify() and had the same error.
Ultimately, I ended up implementing #luchanocho's solution from here. I don't like the fact that it uses seq() to generate the ID because it's not necessarily preserving the proper order, but my case was simple enough that I could go through every district and confirm that the correct ones were highlighted.
My code is below. Output is the same as #mpalanco's answer. Since he obviously got the right result and used something that's not shaky the way the implemented solution is, I'm going to give him the answer assuming it works. The solution below can be considered a workaround if others experience the same error I got.
#install.packages(c("ggplot2", "rgdal"))
library(ggplot2)
library(rgdal)
#rm(list=ls())
#setwd("path")
# read shapefile
tex <- readOGR(dsn = paste0(getwd(), "/Current_Districts/Current_Districts.shp")
# colors to use and districts to highlight
cols<- c("#CCCCCC", "#003082")
districts <- c("Aldine", "Laredo", "Spring Branch", "United", "Donna", "Brownsville", "Houston", "Bryan", "Galena Park", "San Felipe-Del Rio Cons")
# convert shapefile to a df
tex_df <- fortify(tex)
# generate temp df with IDs to merge back in
names_df <- data.frame(tex#data$NAME2)
names(names_df) <- "NAME2"
names_df$id <- seq(0, nrow(names_df)-1) # this is the part I felt was sketchy
final <- merge(tex_df, names_df, by="id")
# dummy out districts of interest
final$yes <- as.factor(ifelse(final$NAME2 %in% districts, 1, 0))
ggplot(data=final) +
geom_polygon(aes(x=long, y=lat, group=group, fill=yes), color="#CCCCCC") +
scale_fill_manual(values=cols) +
theme_void() +
theme(legend.position = "none")
I have a data frame consisting of multiple data points with specific geocoordinates (latitude and longitude). I'm looking to create a choropleth-style world map where geographical regions are shaded according to how many data points fall within the boundaries of the region.
Is there a simple way to accomplish what I'm trying to do in R, preferably using the "maps" package's world map and the "ggplot2" map-plotting functions?
Here is a minimally reproducible result of what I have:
library(ggplot2)
library(maps)
data <- data.frame(lat = 40.730610, lon = -73.935242)
ggplot() +
geom_polygon(data = map_data("world"), aes(x = long, y = lat, group = group, fill = group)) +
coord_fixed(1.3)
I've noticed that the fill parameter on plot item functions can be used to create a choropleth effect. Here, the fill parameter on the aes() function of the geom_polygon() function is used to create a choropleth where each group is color coded differently.
There are many ways to achieve this task. The general idea is to convert both the point data and polygon data to spatial objects. After that, count how many points fall within that polygon. I know we can do this using the sp package, which is widespread and well-known in the R community, but I decided to use the sf package because sf would be the next generation standard of spatial objects in R (https://cran.r-project.org/web/packages/sf/index.html). Knowing the usage and functionality of sf will probably be beneficial.
First, the OP provided an example point, but I decided to add more points so that we can see how to count the points and aggregate the data. To do so, I used the ggmap pakcage to geocode some cities that I selected as an example.
# Load package
library(tidyverse)
library(ggmap)
library(maps)
library(maptools)
library(sf)
# Create point data as a data frame
point_data <- data.frame(lat = 40.730610, lon = -73.935242)
# Geocode a series of cities
city <- c("Detroit", "Seattle", "Toranto", "Denver", "Mexico City", "Paris", "New Orleans",
"Tokyo", "Osaka", "Beijing", "Canberra", "New York", "Istanbul", "New Delhi",
"London", "Taipei", "Seoul", "Manila", "Bangkok", "Lagos", "Chicago", "Shanghai")
point_data2 <- geocode(city)
# Combine OP's example and the geocoding result
point_data3 <- bind_rows(point_data, point_data2)
Next, I converted the point_data3 data frame to the sf object. I will also get the polygon data of the world using the maps package and convert it to an sf object.
# Convert to simple feature object
point_sf <- st_as_sf(point_data3, coords = c("lon", "lat"), crs = 4326)
# Get world map data
worldmap <- maps::map("world", fill = TRUE, plot = FALSE)
# Convert world to sp class
IDs <- sapply(strsplit(worldmap$names, ":"), "[", 1L)
world_sp <- map2SpatialPolygons(worldmap, IDs = IDs,
proj4string = CRS("+proj=longlat +datum=WGS84"))
# Convert world_sp to simple feature object
world_sf <- st_as_sf(world_sp)
# Add country ID
world_sf <- world_sf %>%
mutate(region = map_chr(1:length(world_sp#polygons), function(i){
world_sp#polygons[[i]]#ID
}))
Now both point_sf and world_sf are sf objects. We can use the st_within function to examine which points are within which polygons.
# Use st_within
result <- st_within(point_sf, world_sf, sparse = FALSE)
# Calculate the total count of each polygon
# Store the result as a new column "Count" in world_sf
world_sf <- world_sf %>%
mutate(Count = apply(result, 2, sum))
The total count information is in the Count column of world_sf. We can get the world data frame as the OP did using the map_data function. We can then merge world_data and world_df.
# Convert world_sf to a data frame world_df
world_df <- world_sf
st_geometry(world_df) <- NULL
# Get world data frame
world_data <- map_data("world")
# Merge world_data and world_df
world_data2 <- world_data %>%
left_join(world_df, by = c("region"))
Now we are ready to plot the data. The following code is the same as the OP's ggplot code except that the input data is now world_data2 and fill = Count.
ggplot() +
geom_polygon(data = world_data2, aes(x = long, y = lat, group = group, fill = Count)) +
coord_fixed(1.3)