I am able to produce a cartogram using cartogram::cartogram_cont() But then have difficulty in customising the styling.
I have used broom::tidy() and dplyr::left_join() to fortify the cartogram, but I think perhaps the tidy stage has interfered with the plotOrder. If possible, I will include the output cartograms.
I'm attempting to replicate this type of output, but within my locality. Plesae note that the dataset used for the weighting in cartogram_cont() is not particularly significant, just a proof of concept at this stage:
[R Graph Gallery][1]
[1]: https://www.r-graph-gallery.com/331-basic-cartogram/
Shapefile from: [Lle Shapefile Location][2]
[2]: http://lle.gov.wales/catalogue/item/LocalAuthorities/?lang=en
library(dplyr)
library(leaflet)
library(maptools)
library(cartogram)
library(devtools)
install_github("HanOostdijk/odataR" , build_vignettes = T)
library(odataR)
library(tidyr)
library(rgdal)
library(htmltools)
#Read in shapefile and transform shape
#dsn = folder name, layer = filename but drop the .shp
shapefile <- readOGR(dsn = "Wales Shapefile",
layer = "localauthoritiesPolygon") %>%
#Transform coordinate referencing system
spTransform(CRS("+init=epsg:4326"))
#Next step is to join an interesting dataset to the shapefile using dplyr, then pass this to the cartoram package to render.
#Gone for the teacher sickness dataset from Stats Wales. Noticed it's only up to 2017, wonder if they've stopped collecting.
teacher_sickness_data <- odataR_query('http://open.statswales.gov.wales/dataset/schw0001')
#Check values for join.
categories <- unique(teacher_sickness_data$Area_ItemName_ENG)
categories_shp <- shapefile#data$name_en
categories
categories_shp
#Teacher data has "All Welsh local authorities". Not contained in shapefile so remove.
UA_sickness_data <- teacher_sickness_data[-c(2, 4:6, 8, 9, 11:13, 15:17)] %>%
filter(Area_ItemName_ENG != "All Welsh local authorities")
#Perform join to shapefile
shapefile_1 <- shapefile %>%
merge(UA_sickness_data, by.x = "name_en", by.y = "Area_ItemName_ENG",
duplicateGeoms = TRUE)
#Shiny App will allow choice of inputs to achieve one row per polygon. However, for testing
#functionality with cartograph functions, perform test filtering.
data_filtered <- UA_sickness_data %>%
filter(Year_ItemName_ENG == 2017) %>%
filter(Type_ItemName_ENG == "Full-time") %>%
filter(Variable_ItemName_ENG == "Total days of sick leave")
test_merge <- shapefile %>%
merge(data_filtered, by.x = "name_en", by.y = "Area_ItemName_ENG")
nc_pal <- colorNumeric(palette = "Reds",
domain = log(test_merge#data$Data))
m <-test_merge %>%
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(weight = 1,
color = ~nc_pal(log(Data)),
label = ~name_en,
highlight = highlightOptions(weight = 3,
color = "crimson",
bringToFront = TRUE),
popup = ~ paste0(Variable_ItemName_ENG, "<br/>",
"<b/>",
Data))
m
wales_cart <- cartogram_cont(test_merge, "Data", itermax=5)
plot(wales_cart)
[![Wales_Cartogram][3]][3]
[3]: https://i.stack.imgur.com/2tsMC.png
library(tidyverse)
library(ggmap)
library(broom)
library(rgeos) #used for gBuffer
#Buffer allows to tidy cartogram based on factor of choice.
wales_cart_buffered <- gBuffer(wales_cart, byid=TRUE, width=0)
#tidy cartogram in order to pass to ggplot
spdf_fortified_wales <- tidy(wales_cart_buffered, region = "name_en")
#Now perform a join based on english UA names
spdf_fortified_wales_joined <- spdf_fortified_wales %>%
left_join(. , wales_cart#data, by=c("id"="name_en"))
ggplot() +
geom_polygon(data = spdf_fortified_wales_joined, aes(fill = Data, x = long, y = lat, group = "name_en") , size=0, alpha=0.9) +
coord_map() +
theme_void()
[![incorrect_ggplot][4]][4]
[4]: https://i.stack.imgur.com/as0Z4.png
ggplot() +
geom_polygon(data = spdf_fortified_wales_joined, aes(fill = Data, x = long, y = lat, group = "name_en") , size=0, alpha=0.9) +
coord_map() +
theme_void()
Success Criteria: Polygons are rendered correctly distorted and colour scale reflects weighting variable.
Related
I am having issues when plotting SST data after rasterizing the original data file. I eventually want to extract SST from a series of points for each day over a period of time, and so I noticed a shift in the raster relative to coastline after I recieved a bunch of NAs after extraction. I suspect I messed up my projection/transformation, but everything seems to work.
Query to get 1 month of SST data as a tibble. From heatwaveR documentation.
CRS for this data is EPSG:4326
according to ERDDAP. Resolution needed for later also from this step.
library(rerddap)
library(ggplot2)
library(dplyr)
library(raster)
library(rasterVis)
library(viridis)
library(purrr)
library(sf)
SST_sub_dl <- function(time_df){
SST_data_tas <- griddap(x = "NOAA_DHW_Lon0360",
url = "https://coastwatch.pfeg.noaa.gov/erddap/",
time = c(time_df$start, time_df$end),
latitude = c(-44.5, -38.5),
longitude = c(142, 150),
store=disk(),
fields = "CRW_SST")$data%>%
mutate(time = as.Date(stringr::str_remove(time, "T00:00:00Z"))) %>%
dplyr::rename(t = time, temp = CRW_SST) %>%
select(lon, lat, t, temp) %>%
na.omit()
}
#Assign years desired
tas_time <- data.frame(date_index = 1:2,
start = as.Date(c("2011-01-01")),
end = as.Date(c("2011-01-31")))
# The time this takes will vary greatly based on connection speed
#takes me 26 seconds
system.time(
SST_data_tas <- tas_time %>%
group_by(date_index) %>%
group_modify(~SST_sub_dl(.x)) %>%
ungroup() %>%
select(lon, lat, t, temp))
Now get Australia shapefile
aus.gadm<-getData("GADM",country="AUS",level=1, path=tempdir())
tas.gadm<-filter(aus.gadm,NAME_1=="Tasmania")
#clip function
gClip <- function(shp, bb){
if(class(bb) == "matrix") b_poly <- as(extent(as.vector(t(bb))), "SpatialPolygons")
else b_poly <- as(extent(bb), "SpatialPolygons")
gIntersection(shp, b_poly, byid = T)
}
#clip to remove outer islands
b<-st_bbox(c(xmin=142,xmax=150,ymin=-44.5,ymax=-38.5),crs=4326)
tasss.gadm<-gClip(tas.gadm,b)
tasss.gadm.fort<-sf::st_as_sf(tasss.gadm,"+proj=longlat +datum=WGS84 +nodefs")
Here, I plot the SST data from the dataframe and get SST data and Tasmania shapefile lined up well.
figure1<-SST_data_tas %>%
filter(t == "2011-01-01") %>%
ggplot(aes(x = lon, y = lat)) +
geom_tile(aes(fill = temp)) +
scale_fill_viridis_c(na.value="transparent") +
coord_quickmap(expand = F) +
labs(x = NULL, y = NULL, fill = "SST (°C)") +
theme(legend.position = "bottom")+
geom_sf(data=tasss.gadm.fort,inherit.aes=F,fill=NA)
Figure 1 output
Of course, since I eventually want to extract SST values underneath points, I need to convert this to a raster stack, one layer for each day.
#create blank raster
r_tas_obj<- raster(xmn=min(SST_data_tas$lon), xmx=max(SST_data_tas$lon), ymn=min(SST_data_tas$lat),
ymx=max(SST_data_tas$lat),res=c(0.04999,0.05),crs=sf::st_crs(4326)[[2]])
#rasterize SST data into a rasterstack
system.time(SST_data_tas_stack <- SST_data_tas %>%
group_split(t) %>%
purrr::map(~rasterize(x=.x[,c("lon", "lat")],y=r_tas_obj,field=.x[,4],fun=mean))%>%
stack())
#plot
figure2<-SST_data_tas_stack$layer.1%>%
gplot()+
geom_raster(aes(x = x, y = y,fill=value),stat='identity') +
scale_fill_viridis_c(na.value="transparent") +
coord_quickmap(expand = F) +
labs(x = NULL, y = NULL, fill = "SST (°C)") +
theme(legend.position = "bottom")+
geom_sf(data=tasss.gadm.fort,inherit.aes=F,fill=NA)
#CRS match...
compareCRS(SST_data_tas_stack,tasss.gadm.fort)
Figure2 output
What you will notice is that my raster result is slightly shifted compared to the original plotting of the data using geom_tile. I suspect this is due to an error at the rasterization process, but the CRS between my tasmania shapefile and SST raster match up.
I have a dataframe with three variables of interest: LGA(Location), Offence Category and Total (numeric)
What I am hoping to do, is compare the distance/similarity between each LGA, based on the Total value, in order to create a heat map or similar structure. Is this possible? And if so, what would the process be?
Here is a snippet of the data frame:
I don't really understand your question, but here is an example of a heatmap and a clustered heatmap for 'similar' data:
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
df2 <- df %>%
# format the data to "long format" for plotting
pivot_longer(cols = -c(`Premises type`)) %>%
# Change "Premises type" and "name" to factors
mutate(`Premises type` = factor(
`Premises type`, levels = unique(`Premises type`))
) %>%
mutate(name = factor(
name, levels = unique(name))
) %>%
# Remove the "Total" counts
filter(`Premises type` != "Total")
# Define colours for text (white for dark fill, black for light fill)
hcl <- farver::decode_colour(viridisLite::inferno(length(df2$value)), "rgb", "hcl")
label_col <- ifelse(hcl[, "l"] > 50, "black", "white")
# Plot the data (log scale for fill)
ggplot(df2, aes(y = fct_rev(`Premises type`),
x = name, fill = log(value))) +
geom_tile() +
geom_text(aes(label = value, color = factor(value)),
show.legend = FALSE, size = 2.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1.05),
axis.title = element_blank()) +
scale_color_manual(values = label_col) +
scale_fill_viridis_c(option = "inferno", na.value = "black")
And a clustered heatmap (similar Premises Type / Crime types cluster together):
# Load the raw data and format for pheatmap (expects a matrix)
dm <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit() %>%
column_to_rownames(var = "Premises type")
# Plot the data
pheatmap::pheatmap(as.matrix(dm), scale = "row")
Edit
I haven't used it before, so I don't know if the output is correct, but based on this SO post you can use cluster::daisy() to get the gower dissimilarity for "Premises Type" then plot using pheatmap, e.g.
library(cluster)
pheatmap::pheatmap(as.matrix(daisy(dm)))
Edit 2
You only need two variables for this heatmap (i.e. "Local government Area" (Character) and "Total" (Numeric) should be fine):
# Load libraries
library(tidyverse)
library(readxl)
library(httr)
library(cluster)
library(pheatmap)
# Find some data
url1 <- "https://www.bocsar.nsw.gov.au/Documents/lga/NewSouthWales.xlsx"
# Get the data and remove missing data points (NA's)
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(path = tf, 2L, skip = 5) %>%
na.omit()
# Select two variables, then set the Premises type as the rownames
df3 <- df %>%
select(`Premises type`, Robbery) %>%
column_to_rownames(var = "Premises type")
# (in your case, use "column_to_rownames(`Local government Area`)"
# Then plot the heatmap
pheatmap(daisy(as.matrix(df3)),
labels_row = rownames(df3),
labels_col = rownames(df3))
I am trying to make a choropleth map as an html widget using the leaflet package. I don't want to deal with shiny for this. I have covid death time series data for each state. I would like to be able to click on states and have the corresponding time series graph to popup. I have gotten so close but my problem is that the graphs that popup when you click on a state do not correctly correspond to that state. For example, if you click on Ohio a West virginia map pops up.
Shapefile data: https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2019&layergroup=States+%28and+equivalent%29
Covid data: https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36
library(tidyverse)
library(lubridate)
library(readr)
library(leaflet)
library(tigris)
library(rgdal)
library(leafpop)
states <- readOGR(dsn = "tl_2019_us_state", layer = "tl_2019_us_state")
covid_deaths<- read_csv("covid_deaths_usafacts.csv")
Clean_Deaths<- covid_deaths%>%
select(submission_date, state, tot_cases,new_case,tot_death,new_death)%>%
filter(new_death>=0)%>%
mutate(submission_date=as.Date(Clean_Deaths$submission_date, "%m/%d/%Y"))
my_list <- list()
loop<-for (i in unique(Clean_Deaths$state)) {
state<-Clean_Deaths%>% filter(state==i)
plot<-ggplot(state, aes(x = submission_date, y = new_death)) +
geom_line()+scale_x_date(date_breaks = "1 month",date_labels = "%b")+labs(title = i)
my_list[[i]] <- plot
}
m1 <- leaflet() %>%
addTiles() %>%
setView(lng = -120.5, lat = 44, zoom = 6)%>%
addPolygons(data = states,
fillColor = "red",
fillOpacity = 0.6,
color = "darkgrey",
weight = 1.5,
popup = popupGraph(my_list)
)
m1
I think you have abbreviations for state in Clean_Deaths$state (e.g., "NY") and you have full state names in states$NAME (e.g., "New York").
In your filter, you can convert from one to other. Your for loop can go through states$NAME which will match your data used in your map:
for (i in states$NAME) {
state<-Clean_Deaths%>% filter(state==state.abb[match(i, state.name)])
plot<-ggplot(state, aes(x = submission_date, y = new_death)) +
geom_line()+scale_x_date(date_breaks = "1 month",date_labels = "%b")+labs(title = i)
my_list[[i]] <- plot
}
Here is something comparable using lapply and simplified:
my_list <- lapply(states$NAME, function(i) {
Clean_Deaths %>%
filter(state == state.abb[match(i, state.name)]) %>%
ggplot(aes(x = submission_date, y = new_death)) +
geom_line() +
scale_x_date(date_breaks = "1 month",date_labels = "%b") +
labs(title = i)
})
As an aside, your mutate before this does not need the data frame referenced in the pipe:
mutate(submission_date=as.Date(submission_date, "%m/%d/%Y"))
Let me know if this addresses your problem.
What I like to do
I like to plot isochrones from multiple locations on a map so I can visually find the travel time from an arbitrary town to the closest location. It should look like a kernel density 2D plot:
library(purrr)
library(ggmap)
locations <- tibble::tribble(
~city, ~lon, ~lat,
"Hamburg", 9.992246, 53.550354,
"Berlin", 13.408163, 52.518527,
"Rostock", 12.140776, 54.088581
)
data <- map2_dfr(locations$lon, locations$lat, ~ data.frame(lon = rnorm(10000, .x, 0.8),
lat = rnorm(10000, .y, 0.7)))
ger <- c(left = min(locations$lon) - 1, bottom = min(locations$lat) - 1,
right = max(locations$lon) + 1, top = max(locations$lat) + 1)
get_stamenmap(ger, zoom = 7, maptype = "toner-lite") %>%
ggmap() +
stat_density_2d(data = data, aes(x= lon, y = lat, fill = ..level.., alpha = ..level..),
geom = "polygon") +
scale_fill_distiller(palette = "Blues", direction = 1, guide = FALSE) +
scale_alpha_continuous(range = c(0.1,0.3), guide = FALSE)
What I tried
You can easily get isochrones via osrm and plot them with leaflet. However, these isochrones are independent from each other. When I plot them they overlap each other.
library(osrm)
library(leaflet)
library(purrr)
library(ggmap)
locations <- tibble::tribble(
~city, ~lon, ~lat,
"Hamburg", 9.992246, 53.550354,
"Berlin", 13.408163, 52.518527,
"Rostock", 12.140776, 54.088581
)
isochrone <- map2(locations$lon, locations$lat,
~ osrmIsochrone(loc = c(.x, .y),
breaks = seq(0, 120, 30))) %>%
do.call(what = rbind)
isochrone#data$drive_times <- factor(paste(isochrone#data$min, "bis",
isochrone#data$max, "Minuten"))
factpal <- colorFactor("Blues", isochrone#data$drive_times, reverse = TRUE)
leaflet() %>%
setView(mean(locations$lon), mean(locations$lat), zoom = 7) %>%
addProviderTiles("Stamen.TonerLite") %>%
addPolygons(fill = TRUE, stroke = TRUE, color = "black",
fillColor = ~factpal(isochrone#data$drive_times),
weight = 0.5, fillOpacity = 0.6,
data = isochrone, popup = isochrone#data$drive_times,
group = "Drive Time") %>%
addLegend("bottomright", pal = factpal, values = isochrone#data$drive_time,
title = "Fahrtzeit")
How can I merge these isochrone so that they don't overlap?
Really cool question. What you want to do is merge the shapes by ID, so all the 0-30 minute areas are one shape, all the 30-60 minute areas are another, and so on. There are ways to do this with other spatial packages, but it seems well-suited to sf, which uses dplyr-style functions.
After you create isochrone, you can convert it to a sf object, make the same type of distance label, group by ID, and call summarise. The default when you summarize sf objects is just a spatial union, so you don't need to supply a function there.
library(sf)
library(dplyr)
iso_sf <- st_as_sf(isochrone)
iso_union <- iso_sf %>%
mutate(label = paste(min, max, sep = "-")) %>%
group_by(id, label) %>%
summarise()
I didn't have leaflet handy, so here's just the default print method:
plot(iso_union["label"], pal = RColorBrewer::brewer.pal(4, "Blues"))
I'm not sure what's up with the areas that have abrupt vertical edges, but those are in your plot as well.
I had a hard time using the map2 method you used because it does both a union as well as, I think, another set theory like function to create specific intervals. Instead, I would recommend creating a raster layer of the layers you create and apply one opacity to that one raster, like the ggmap example does. There's an excellent blog post that I stole alot of code from here (along with from user:camille).
It uses a different API that requires mapbox but it is free. Another limitation is that it won't return isocrones that are the size you like but I recreated it in another location where three points are closer together to prove the method.
I also didn't bother vectorizing the process of creating the isocrone web request so I leave that to someone smarter.
# First be sure to get your mapbox token
library(fasterize)
library(sf)
library(mapboxapi)
library(leaflet)
#mapboxapi::mb_access_token("Go get the token and put it here",
# install = TRUE, overwrite = TRUE)
isos1 <- mb_isochrone(
location = c("-149.883234, 61.185765"),
profile = "driving",
time = c(5,10,15),
)
isos2 <- mb_isochrone(
location = c("-149.928200, 61.191227"),
profile = "driving",
time = c(5,10,15),
)
isos3 <- mb_isochrone(
location = c("-149.939484, 61.160192"),
profile = "driving",
time = c(5,10,15),
)
library(sf)
library(dplyr)
isocrones <- rbind(isos1,isos2,isos3)
iso_sf <- st_as_sf(isocrones)
iso_union <- iso_sf %>%
group_by(time) %>%
summarise()
isos_proj <- st_transform(iso_sf, 32615)
template <- raster(isos_proj, resolution = 100)
iso_surface <- fasterize(isos_proj, template, field = "time", fun = "min")
pal <- colorNumeric("viridis", isos_proj$time, na.color = "transparent")
leaflet() %>%
addTiles() %>%
addRasterImage(iso_surface, colors = pal, opacity = 0.5) %>%
addLegend(values = isos_proj$time, pal = pal,
title = "Minutes of Travel") %>%
addMarkers(lat = c(61.185765, 61.191227, 61.160192), lng = c(-149.883234, -149.928200, -149.939484))
I've managed to produce a map, however I need to add one label for a state (level 2) that includes subdivisons (level 3), instead of labeling each subdivision (for only this state). In data "newpak" rows 641-664 correspond to this state, is there any way to place only one name above this state.
library(dplyr)
library(raster)
library(sf)
library(tidyverse)
library(ggrepel)
devtools::install_github("tidyverse/ggplot2", force = TRUE)
library(ggplot2)
pak <- getData("GADM",country="PAK",level=3)
pak <- st_as_sf(pak) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
ggplot(pak) + geom_sf() + geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2)
ind <- getData("GADM",country="IND",level=3)
ind <- st_as_sf(ind) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
jnk <- subset(ind, OBJECTID >= 641 & OBJECTID <= 664 )
newpak <- rbind(pak, jnk)
regionalValues <- runif(165) # Simulate a value for each region between 0 and 1
ggplot(newpak) + geom_sf(aes(fill = regionalValues)) + geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2)
Here's a complete solution using the sf package.
library(raster)
library(sf)
library(tidyverse)
# downlaod PAK data and convert to sf
pak <- getData("GADM",country="PAK",level=3) %>%
st_as_sf()
# download IND data, convert to sf, filter out
# desired area, and add NAME_3 label
jnk <- getData("GADM",country="IND",level=3) %>%
st_as_sf() %>%
filter(OBJECTID %>% between(641, 664)) %>%
group_by(NAME_0) %>%
summarize() %>%
mutate(NAME_3 = "Put desired region name here")
regionalValues <- runif(142) # Simulate a value for each region between 0 and 1
# combine the two dataframes, find the center for each
# region, and the plot with ggplot
pak %>%
select(NAME_0, NAME_3, geometry) %>%
rbind(jnk) %>%
mutate(
lon = map_dbl(geometry, ~st_centroid(.x)[[1]]),
lat = map_dbl(geometry, ~st_centroid(.x)[[2]])
) %>%
ggplot() +
geom_sf(aes(fill = regionalValues)) +
geom_text(aes(label = NAME_3, x = lon, y = lat), size = 2) +
scale_fill_distiller(palette = "Spectral")
Some notes:
I used sf::filter instead of raster::subset to get the desired subset of the IND data, because I feel it's more idiomatic tidyverse code.
To combine areas with sf you can group the different regions by a common group with group_by and then simply call summarize. This is the method I used in my solution above. There are other functions in the sf package that accomplish similar results worth looking at. They are st_combine and st_union.
Using st_centroid for the purpose of plotting the region labels is not necessarily the best method for finding a good location for region labels. I used it because it's the most convenient. You might try other methods, including manual placement of labels.
I changed the fill palette to a diverging color palette because I think it more clearly shows the difference between one region and the next. You can see some of the color palettes available with RColorBrewer::display.brewer.all()