Difficulty in customising cartogram output in R Studio - r

I am able to produce a cartogram using cartogram::cartogram_cont() But then have difficulty in customising the styling.
I have used broom::tidy() and dplyr::left_join() to fortify the cartogram, but I think perhaps the tidy stage has interfered with the plotOrder. If possible, I will include the output cartograms.
I'm attempting to replicate this type of output, but within my locality. Plesae note that the dataset used for the weighting in cartogram_cont() is not particularly significant, just a proof of concept at this stage:
[R Graph Gallery][1]
[1]: https://www.r-graph-gallery.com/331-basic-cartogram/
Shapefile from: [Lle Shapefile Location][2]
[2]: http://lle.gov.wales/catalogue/item/LocalAuthorities/?lang=en
library(dplyr)
library(leaflet)
library(maptools)
library(cartogram)
library(devtools)
install_github("HanOostdijk/odataR" , build_vignettes = T)
library(odataR)
library(tidyr)
library(rgdal)
library(htmltools)
#Read in shapefile and transform shape
#dsn = folder name, layer = filename but drop the .shp
shapefile <- readOGR(dsn = "Wales Shapefile",
layer = "localauthoritiesPolygon") %>%
#Transform coordinate referencing system
spTransform(CRS("+init=epsg:4326"))
#Next step is to join an interesting dataset to the shapefile using dplyr, then pass this to the cartoram package to render.
#Gone for the teacher sickness dataset from Stats Wales. Noticed it's only up to 2017, wonder if they've stopped collecting.
teacher_sickness_data <- odataR_query('http://open.statswales.gov.wales/dataset/schw0001')
#Check values for join.
categories <- unique(teacher_sickness_data$Area_ItemName_ENG)
categories_shp <- shapefile#data$name_en
categories
categories_shp
#Teacher data has "All Welsh local authorities". Not contained in shapefile so remove.
UA_sickness_data <- teacher_sickness_data[-c(2, 4:6, 8, 9, 11:13, 15:17)] %>%
filter(Area_ItemName_ENG != "All Welsh local authorities")
#Perform join to shapefile
shapefile_1 <- shapefile %>%
merge(UA_sickness_data, by.x = "name_en", by.y = "Area_ItemName_ENG",
duplicateGeoms = TRUE)
#Shiny App will allow choice of inputs to achieve one row per polygon. However, for testing
#functionality with cartograph functions, perform test filtering.
data_filtered <- UA_sickness_data %>%
filter(Year_ItemName_ENG == 2017) %>%
filter(Type_ItemName_ENG == "Full-time") %>%
filter(Variable_ItemName_ENG == "Total days of sick leave")
test_merge <- shapefile %>%
merge(data_filtered, by.x = "name_en", by.y = "Area_ItemName_ENG")
nc_pal <- colorNumeric(palette = "Reds",
domain = log(test_merge#data$Data))
m <-test_merge %>%
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(weight = 1,
color = ~nc_pal(log(Data)),
label = ~name_en,
highlight = highlightOptions(weight = 3,
color = "crimson",
bringToFront = TRUE),
popup = ~ paste0(Variable_ItemName_ENG, "<br/>",
"<b/>",
Data))
m
wales_cart <- cartogram_cont(test_merge, "Data", itermax=5)
plot(wales_cart)
[![Wales_Cartogram][3]][3]
[3]: https://i.stack.imgur.com/2tsMC.png
library(tidyverse)
library(ggmap)
library(broom)
library(rgeos) #used for gBuffer
#Buffer allows to tidy cartogram based on factor of choice.
wales_cart_buffered <- gBuffer(wales_cart, byid=TRUE, width=0)
#tidy cartogram in order to pass to ggplot
spdf_fortified_wales <- tidy(wales_cart_buffered, region = "name_en")
#Now perform a join based on english UA names
spdf_fortified_wales_joined <- spdf_fortified_wales %>%
left_join(. , wales_cart#data, by=c("id"="name_en"))
ggplot() +
geom_polygon(data = spdf_fortified_wales_joined, aes(fill = Data, x = long, y = lat, group = "name_en") , size=0, alpha=0.9) +
coord_map() +
theme_void()
[![incorrect_ggplot][4]][4]
[4]: https://i.stack.imgur.com/as0Z4.png
ggplot() +
geom_polygon(data = spdf_fortified_wales_joined, aes(fill = Data, x = long, y = lat, group = "name_en") , size=0, alpha=0.9) +
coord_map() +
theme_void()
Success Criteria: Polygons are rendered correctly distorted and colour scale reflects weighting variable.

Related

Projection issues after rasterize

I am having issues when plotting SST data after rasterizing the original data file. I eventually want to extract SST from a series of points for each day over a period of time, and so I noticed a shift in the raster relative to coastline after I recieved a bunch of NAs after extraction. I suspect I messed up my projection/transformation, but everything seems to work.
Query to get 1 month of SST data as a tibble. From heatwaveR documentation.
CRS for this data is EPSG:4326
according to ERDDAP. Resolution needed for later also from this step.
library(rerddap)
library(ggplot2)
library(dplyr)
library(raster)
library(rasterVis)
library(viridis)
library(purrr)
library(sf)
SST_sub_dl <- function(time_df){
SST_data_tas <- griddap(x = "NOAA_DHW_Lon0360",
url = "https://coastwatch.pfeg.noaa.gov/erddap/",
time = c(time_df$start, time_df$end),
latitude = c(-44.5, -38.5),
longitude = c(142, 150),
store=disk(),
fields = "CRW_SST")$data%>%
mutate(time = as.Date(stringr::str_remove(time, "T00:00:00Z"))) %>%
dplyr::rename(t = time, temp = CRW_SST) %>%
select(lon, lat, t, temp) %>%
na.omit()
}
#Assign years desired
tas_time <- data.frame(date_index = 1:2,
start = as.Date(c("2011-01-01")),
end = as.Date(c("2011-01-31")))
# The time this takes will vary greatly based on connection speed
#takes me 26 seconds
system.time(
SST_data_tas <- tas_time %>%
group_by(date_index) %>%
group_modify(~SST_sub_dl(.x)) %>%
ungroup() %>%
select(lon, lat, t, temp))
Now get Australia shapefile
aus.gadm<-getData("GADM",country="AUS",level=1, path=tempdir())
tas.gadm<-filter(aus.gadm,NAME_1=="Tasmania")
#clip function
gClip <- function(shp, bb){
if(class(bb) == "matrix") b_poly <- as(extent(as.vector(t(bb))), "SpatialPolygons")
else b_poly <- as(extent(bb), "SpatialPolygons")
gIntersection(shp, b_poly, byid = T)
}
#clip to remove outer islands
b<-st_bbox(c(xmin=142,xmax=150,ymin=-44.5,ymax=-38.5),crs=4326)
tasss.gadm<-gClip(tas.gadm,b)
tasss.gadm.fort<-sf::st_as_sf(tasss.gadm,"+proj=longlat +datum=WGS84 +nodefs")
Here, I plot the SST data from the dataframe and get SST data and Tasmania shapefile lined up well.
figure1<-SST_data_tas %>%
filter(t == "2011-01-01") %>%
ggplot(aes(x = lon, y = lat)) +
geom_tile(aes(fill = temp)) +
scale_fill_viridis_c(na.value="transparent") +
coord_quickmap(expand = F) +
labs(x = NULL, y = NULL, fill = "SST (°C)") +
theme(legend.position = "bottom")+
geom_sf(data=tasss.gadm.fort,inherit.aes=F,fill=NA)
Figure 1 output
Of course, since I eventually want to extract SST values underneath points, I need to convert this to a raster stack, one layer for each day.
#create blank raster
r_tas_obj<- raster(xmn=min(SST_data_tas$lon), xmx=max(SST_data_tas$lon), ymn=min(SST_data_tas$lat),
ymx=max(SST_data_tas$lat),res=c(0.04999,0.05),crs=sf::st_crs(4326)[[2]])
#rasterize SST data into a rasterstack
system.time(SST_data_tas_stack <- SST_data_tas %>%
group_split(t) %>%
purrr::map(~rasterize(x=.x[,c("lon", "lat")],y=r_tas_obj,field=.x[,4],fun=mean))%>%
stack())
#plot
figure2<-SST_data_tas_stack$layer.1%>%
gplot()+
geom_raster(aes(x = x, y = y,fill=value),stat='identity') +
scale_fill_viridis_c(na.value="transparent") +
coord_quickmap(expand = F) +
labs(x = NULL, y = NULL, fill = "SST (°C)") +
theme(legend.position = "bottom")+
geom_sf(data=tasss.gadm.fort,inherit.aes=F,fill=NA)
#CRS match...
compareCRS(SST_data_tas_stack,tasss.gadm.fort)
Figure2 output
What you will notice is that my raster result is slightly shifted compared to the original plotting of the data using geom_tile. I suspect this is due to an error at the rasterization process, but the CRS between my tasmania shapefile and SST raster match up.

How do you compare similarities between variables in an R data frame, based on two categorical variables and one numeric variables

I have a dataframe with three variables of interest: LGA(Location), Offence Category and Total (numeric)
What I am hoping to do, is compare the distance/similarity between each LGA, based on the Total value, in order to create a heat map or similar structure. Is this possible? And if so, what would the process be?
Here is a snippet of the data frame:
I don't really understand your question, but here is an example of a heatmap and a clustered heatmap for 'similar' data:
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
df2 <- df %>%
# format the data to "long format" for plotting
pivot_longer(cols = -c(`Premises type`)) %>%
# Change "Premises type" and "name" to factors
mutate(`Premises type` = factor(
`Premises type`, levels = unique(`Premises type`))
) %>%
mutate(name = factor(
name, levels = unique(name))
) %>%
# Remove the "Total" counts
filter(`Premises type` != "Total")
# Define colours for text (white for dark fill, black for light fill)
hcl <- farver::decode_colour(viridisLite::inferno(length(df2$value)), "rgb", "hcl")
label_col <- ifelse(hcl[, "l"] > 50, "black", "white")
# Plot the data (log scale for fill)
ggplot(df2, aes(y = fct_rev(`Premises type`),
x = name, fill = log(value))) +
geom_tile() +
geom_text(aes(label = value, color = factor(value)),
show.legend = FALSE, size = 2.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1.05),
axis.title = element_blank()) +
scale_color_manual(values = label_col) +
scale_fill_viridis_c(option = "inferno", na.value = "black")
And a clustered heatmap (similar Premises Type / Crime types cluster together):
# Load the raw data and format for pheatmap (expects a matrix)
dm <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit() %>%
column_to_rownames(var = "Premises type")
# Plot the data
pheatmap::pheatmap(as.matrix(dm), scale = "row")
Edit
I haven't used it before, so I don't know if the output is correct, but based on this SO post you can use cluster::daisy() to get the gower dissimilarity for "Premises Type" then plot using pheatmap, e.g.
library(cluster)
pheatmap::pheatmap(as.matrix(daisy(dm)))
Edit 2
You only need two variables for this heatmap (i.e. "Local government Area" (Character) and "Total" (Numeric) should be fine):
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
library(cluster)
library(pheatmap)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
# Select two variables, then set the Premises type as the rownames
df3 <- df %>%
select(`Premises type`, Robbery) %>%
column_to_rownames(var = "Premises type")
# (in your case, use "column_to_rownames(`Local government Area`)"
# Then plot the heatmap
pheatmap(daisy(as.matrix(df3)),
labels_row = rownames(df3),
labels_col = rownames(df3))

Leaflet popup graphs don't correspond to map

I am trying to make a choropleth map as an html widget using the leaflet package. I don't want to deal with shiny for this. I have covid death time series data for each state. I would like to be able to click on states and have the corresponding time series graph to popup. I have gotten so close but my problem is that the graphs that popup when you click on a state do not correctly correspond to that state. For example, if you click on Ohio a West virginia map pops up.
Shapefile data: https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2019&layergroup=States+%28and+equivalent%29
Covid data: https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36
library(tidyverse)
library(lubridate)
library(readr)
library(leaflet)
library(tigris)
library(rgdal)
library(leafpop)
states <- readOGR(dsn = "tl_2019_us_state", layer = "tl_2019_us_state")
covid_deaths<- read_csv("covid_deaths_usafacts.csv")
Clean_Deaths<- covid_deaths%>%
select(submission_date, state, tot_cases,new_case,tot_death,new_death)%>%
filter(new_death>=0)%>%
mutate(submission_date=as.Date(Clean_Deaths$submission_date, "%m/%d/%Y"))
my_list <- list()
loop<-for (i in unique(Clean_Deaths$state)) {
state<-Clean_Deaths%>% filter(state==i)
plot<-ggplot(state, aes(x = submission_date, y = new_death)) +
geom_line()+scale_x_date(date_breaks = "1 month",date_labels = "%b")+labs(title = i)
my_list[[i]] <- plot
}
m1 <- leaflet() %>%
addTiles() %>%
setView(lng = -120.5, lat = 44, zoom = 6)%>%
addPolygons(data = states,
fillColor = "red",
fillOpacity = 0.6,
color = "darkgrey",
weight = 1.5,
popup = popupGraph(my_list)
)
m1
I think you have abbreviations for state in Clean_Deaths$state (e.g., "NY") and you have full state names in states$NAME (e.g., "New York").
In your filter, you can convert from one to other. Your for loop can go through states$NAME which will match your data used in your map:
for (i in states$NAME) {
state<-Clean_Deaths%>% filter(state==state.abb[match(i, state.name)])
plot<-ggplot(state, aes(x = submission_date, y = new_death)) +
geom_line()+scale_x_date(date_breaks = "1 month",date_labels = "%b")+labs(title = i)
my_list[[i]] <- plot
}
Here is something comparable using lapply and simplified:
my_list <- lapply(states$NAME, function(i) {
Clean_Deaths %>%
filter(state == state.abb[match(i, state.name)]) %>%
ggplot(aes(x = submission_date, y = new_death)) +
geom_line() +
scale_x_date(date_breaks = "1 month",date_labels = "%b") +
labs(title = i)
})
As an aside, your mutate before this does not need the data frame referenced in the pipe:
mutate(submission_date=as.Date(submission_date, "%m/%d/%Y"))
Let me know if this addresses your problem.

Merge and plot multiple isochrones

What I like to do
I like to plot isochrones from multiple locations on a map so I can visually find the travel time from an arbitrary town to the closest location. It should look like a kernel density 2D plot:
library(purrr)
library(ggmap)
locations <- tibble::tribble(
~city, ~lon, ~lat,
"Hamburg", 9.992246, 53.550354,
"Berlin", 13.408163, 52.518527,
"Rostock", 12.140776, 54.088581
)
data <- map2_dfr(locations$lon, locations$lat, ~ data.frame(lon = rnorm(10000, .x, 0.8),
lat = rnorm(10000, .y, 0.7)))
ger <- c(left = min(locations$lon) - 1, bottom = min(locations$lat) - 1,
right = max(locations$lon) + 1, top = max(locations$lat) + 1)
get_stamenmap(ger, zoom = 7, maptype = "toner-lite") %>%
ggmap() +
stat_density_2d(data = data, aes(x= lon, y = lat, fill = ..level.., alpha = ..level..),
geom = "polygon") +
scale_fill_distiller(palette = "Blues", direction = 1, guide = FALSE) +
scale_alpha_continuous(range = c(0.1,0.3), guide = FALSE)
What I tried
You can easily get isochrones via osrm and plot them with leaflet. However, these isochrones are independent from each other. When I plot them they overlap each other.
library(osrm)
library(leaflet)
library(purrr)
library(ggmap)
locations <- tibble::tribble(
~city, ~lon, ~lat,
"Hamburg", 9.992246, 53.550354,
"Berlin", 13.408163, 52.518527,
"Rostock", 12.140776, 54.088581
)
isochrone <- map2(locations$lon, locations$lat,
~ osrmIsochrone(loc = c(.x, .y),
breaks = seq(0, 120, 30))) %>%
do.call(what = rbind)
isochrone#data$drive_times <- factor(paste(isochrone#data$min, "bis",
isochrone#data$max, "Minuten"))
factpal <- colorFactor("Blues", isochrone#data$drive_times, reverse = TRUE)
leaflet() %>%
setView(mean(locations$lon), mean(locations$lat), zoom = 7) %>%
addProviderTiles("Stamen.TonerLite") %>%
addPolygons(fill = TRUE, stroke = TRUE, color = "black",
fillColor = ~factpal(isochrone#data$drive_times),
weight = 0.5, fillOpacity = 0.6,
data = isochrone, popup = isochrone#data$drive_times,
group = "Drive Time") %>%
addLegend("bottomright", pal = factpal, values = isochrone#data$drive_time,
title = "Fahrtzeit")
How can I merge these isochrone so that they don't overlap?
Really cool question. What you want to do is merge the shapes by ID, so all the 0-30 minute areas are one shape, all the 30-60 minute areas are another, and so on. There are ways to do this with other spatial packages, but it seems well-suited to sf, which uses dplyr-style functions.
After you create isochrone, you can convert it to a sf object, make the same type of distance label, group by ID, and call summarise. The default when you summarize sf objects is just a spatial union, so you don't need to supply a function there.
library(sf)
library(dplyr)
iso_sf <- st_as_sf(isochrone)
iso_union <- iso_sf %>%
mutate(label = paste(min, max, sep = "-")) %>%
group_by(id, label) %>%
summarise()
I didn't have leaflet handy, so here's just the default print method:
plot(iso_union["label"], pal = RColorBrewer::brewer.pal(4, "Blues"))
I'm not sure what's up with the areas that have abrupt vertical edges, but those are in your plot as well.
I had a hard time using the map2 method you used because it does both a union as well as, I think, another set theory like function to create specific intervals. Instead, I would recommend creating a raster layer of the layers you create and apply one opacity to that one raster, like the ggmap example does. There's an excellent blog post that I stole alot of code from here (along with from user:camille).
It uses a different API that requires mapbox but it is free. Another limitation is that it won't return isocrones that are the size you like but I recreated it in another location where three points are closer together to prove the method.
I also didn't bother vectorizing the process of creating the isocrone web request so I leave that to someone smarter.
# First be sure to get your mapbox token
library(fasterize)
library(sf)
library(mapboxapi)
library(leaflet)
#mapboxapi::mb_access_token("Go get the token and put it here",
# install = TRUE, overwrite = TRUE)
isos1 <- mb_isochrone(
location = c("-149.883234, 61.185765"),
profile = "driving",
time = c(5,10,15),
)
isos2 <- mb_isochrone(
location = c("-149.928200, 61.191227"),
profile = "driving",
time = c(5,10,15),
)
isos3 <- mb_isochrone(
location = c("-149.939484, 61.160192"),
profile = "driving",
time = c(5,10,15),
)
library(sf)
library(dplyr)
isocrones <- rbind(isos1,isos2,isos3)
iso_sf <- st_as_sf(isocrones)
iso_union <- iso_sf %>%
group_by(time) %>%
summarise()
isos_proj <- st_transform(iso_sf, 32615)
template <- raster(isos_proj, resolution = 100)
iso_surface <- fasterize(isos_proj, template, field = "time", fun = "min")
pal <- colorNumeric("viridis", isos_proj$time, na.color = "transparent")
leaflet() %>%
addTiles() %>%
addRasterImage(iso_surface, colors = pal, opacity = 0.5) %>%
addLegend(values = isos_proj$time, pal = pal,
title = "Minutes of Travel") %>%
addMarkers(lat = c(61.185765, 61.191227, 61.160192), lng = c(-149.883234, -149.928200, -149.939484))

How to label an individual state on the map while the others at sub-divisional level

I've managed to produce a map, however I need to add one label for a state (level 2) that includes subdivisons (level 3), instead of labeling each subdivision (for only this state). In data "newpak" rows 641-664 correspond to this state, is there any way to place only one name above this state.
library(dplyr)
library(raster)
library(sf)
library(tidyverse)
library(ggrepel)
devtools::install_github("tidyverse/ggplot2", force = TRUE)
library(ggplot2)
pak <- getData("GADM",country="PAK",level=3)
pak <- st_as_sf(pak) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
ggplot(pak) + geom_sf() + geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2)
ind <- getData("GADM",country="IND",level=3)
ind <- st_as_sf(ind) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
jnk <- subset(ind, OBJECTID >= 641 & OBJECTID <= 664 )
newpak <- rbind(pak, jnk)
regionalValues <- runif(165) # Simulate a value for each region between 0 and 1
ggplot(newpak) + geom_sf(aes(fill = regionalValues)) + geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2)
Here's a complete solution using the sf package.
library(raster)
library(sf)
library(tidyverse)
# downlaod PAK data and convert to sf
pak <- getData("GADM",country="PAK",level=3) %>%
st_as_sf()
# download IND data, convert to sf, filter out
# desired area, and add NAME_3 label
jnk <- getData("GADM",country="IND",level=3) %>%
st_as_sf() %>%
filter(OBJECTID %>% between(641, 664)) %>%
group_by(NAME_0) %>%
summarize() %>%
mutate(NAME_3 = "Put desired region name here")
regionalValues <- runif(142) # Simulate a value for each region between 0 and 1
# combine the two dataframes, find the center for each
# region, and the plot with ggplot
pak %>%
select(NAME_0, NAME_3, geometry) %>%
rbind(jnk) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]])
) %>%
ggplot() +
geom_sf(aes(fill = regionalValues)) +
geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2) +
scale_fill_distiller(palette = "Spectral")
Some notes:
I used sf::filter instead of raster::subset to get the desired subset of the IND data, because I feel it's more idiomatic tidyverse code.
To combine areas with sf you can group the different regions by a common group with group_by and then simply call summarize. This is the method I used in my solution above. There are other functions in the sf package that accomplish similar results worth looking at. They are st_combine and st_union.
Using st_centroid for the purpose of plotting the region labels is not necessarily the best method for finding a good location for region labels. I used it because it's the most convenient. You might try other methods, including manual placement of labels.
I changed the fill palette to a diverging color palette because I think it more clearly shows the difference between one region and the next. You can see some of the color palettes available with RColorBrewer::display.brewer.all()

Resources