Problem joining different SpatialPolygonsDataFrame objects in R - r

I have a shape file of towns in the north of Spain that I have to join into groups (municipalities or comarcas in Spanish). I've used st_union from the sf package to join them successfully (and each one is their own SpatialPolygonsDataFrame object with a single polygon). I plot each of the municipalities individually and they look fine.
However, once I want to combine the municipalities into a single SpatialPolygonsDataFrame object with multiple polygons, I can't for the life of me manage to do it. I've tried three approaches mostly based on this answer: https://gis.stackexchange.com/questions/155328/merging-multiple-spatialpolygondataframes-into-1-spdf-in-r and this one https://gis.stackexchange.com/questions/141469/how-to-convert-a-spatialpolygon-to-a-spatialpolygonsdataframe-and-add-a-column-t
– If I use raster::union it throws out the error
Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length
– If I use a simple rbind it throws out the error
Error in SpatialPolygonsDataFrame(pl, df, match.ID = FALSE) :
Object length mismatch:
pl has 7 Polygons objects, but df has 4 rows
Or something similar for 6/11 of the municipalities.
– If I try a lapply approach (more convoluted) it seems to work but one I plot it using leaflet the municipalities that gave the error when trying to raster::union or rbind don't look as they should/don't look as they do when I plot them individually.
** Municipalities 1 and 2 work fine. 3 and 4 for example do not. **
Here's a link to the two files needed to reproduce my code below:
– Link to shape files: https://www.dropbox.com/sh/z9632hworbbchn5/AAAiyq3f_52azB4oFeU46D5Qa?dl=0
– Link to xls file that contains the mapping from towns to municipalities: https://www.dropbox.com/s/4w3fx6neo4t1l3d/listado-comarcas-gipuzkoa.xls?dl=0
And my code:
library(tidyverse)
library(magrittr)
library(sf)
library(ggplot2)
library(lwgeom)
library(readxl)
library(raster)
#Read shapefile
mapa_municip <- readOGR(dsn = "UDALERRIAK_MUNICIPIOS/UDALERRIAK_MUNICIPIOS.shp")
mapa_municip <- spTransform(mapa_municip, CRS('+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'))
mapa_municip <- st_as_sf(mapa_municip)
#Read excel that contains mapping from town to municioalities
muni2com <- read_excel("listado-comarcas-gipuzkoa.xls",
sheet=1,
range="A1:C91",
col_names = T)
comarcas <- list()
count <- 0
for (i in unique(muni2com$Comarca)[1:4]){
count <- count + 1
for (k in unique(muni2com$Municipios[muni2com$Comarca==i])){
if (k == unique(muni2com$Municipios[muni2com$Comarca==i])[1]){ # if 1st case, keep this town
temp <- mapa_municip[mapa_municip$MUNICIPIO==k,]
}
if (k != unique(muni2com$Municipios[muni2com$Comarca==i])[1]){ # otherwise, join w previous ones
temp <- sf::st_union(temp, mapa_municip[mapa_municip$MUNICIPIO==k,])
}
}
comarcas[[count]] <- spTransform(as(temp, "Spatial"), CRS('+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'))
comarcas[[count]]#data <- data.frame(comarca = i)
}
IDs <- sapply(comarcas, function(x)
slot(slot(x, "polygons")[[1]], "ID"))
#Checking
length(unique(IDs)) == length(comarcas)
dfIDs <- data.frame(comarca = IDs)
#Making SpatialPolygons from list of polygons
comarcas2 <- SpatialPolygons(lapply(comarcas,
function(x) slot(x, "polygons")[[1]]))
# Try to coerce to SpatialPolygonsDataFrame (will throw error)
p.df <- data.frame( comarca = unique(muni2com$Comarca)[1:4])
p <- SpatialPolygonsDataFrame(comarcas2, p.df)
# Extract polygon ID's
( pid <- sapply(slot(comarcas2, "polygons"), function(x) slot(x, "ID")) )
# Create dataframe with correct rownames
( p.df <- data.frame( comarca = unique(muni2com$Comarca)[1:4], row.names = pid) )
# Try coertion again and check class
comarcas3 <- SpatialPolygonsDataFrame(comarcas2, p.df)
class(comarcas3)
#Leaflet map
leaflet( options = leafletOptions(zoomControl = F,
zoomSnap = 0.1 ,
zoomDelta = 1
),
data = comarcas3,
) %>%
addProviderTiles(provider="CartoDB.Positron") %>%
htmlwidgets::onRender("function(el, x) {
L.control.zoom({ position: 'topright' }).addTo(this)
}") %>%
clearShapes() %>%
addPolygons(fillColor = "gray",
opacity = 0.8,
weight = 0.3,
color = "white",
fillOpacity = 0.95,
smoothFactor = 0.5,
label = ~comarca,
highlight = highlightOptions(
weight = 1.5,
color = "#333333",
bringToFront = T),
layerId = ~comarca
)
** Note how if you plot comarcas[[3]] or comarcas[[4]] above instead of comarcas3 the shape of those municipalities is completely different.**
I'd really appreciate any tips you can give me, I've been at it for days and I can't solve it. I assume the problem is due to the error given by the rbind, which seems to be the most informative one, but I don't know what it means. Thank you very much in advance.

Are you absolutely positively required to use the older {sp} package workflow?
If not it may be easier to dissolve the municipalities into comarcas using a pure {sf} based workflow - grouping by a comarca column, and then summarising will do the trick.
Consider this code:
library(tidyverse)
library(sf)
library(readxl)
library(leaflet)
#Read shapefile
mapa_municip <- st_read("UDALERRIAK_MUNICIPIOS.shp") %>%
st_transform(4326)
#Read excel that contains mapping from town to municioalities
muni2com <- read_excel("listado-comarcas-gipuzkoa.xls",
sheet=1,
range="A1:C91",
col_names = T)
# dissolving comarcas using sf / dplyr based workflow
comarcas <- mapa_municip %>%
inner_join(muni2com, by = c("MUNICIPIO" = "Municipios")) %>%
group_by(Comarca) %>%
summarise() %>% # magic! :)))
ungroup()
leaflet(comarcas) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(color = "red",
label = ~ Comarca)

Related

redlistr::getAreaEOO from degree minute data input

I have been trying to calculate the EOO area for a species using the redlistr package. In the example, the authors used raster data. However, I have observation points of the species in the degree minute format.
I created a subset of data for reference:
dt <- data.frame(lon_x = c(168.36085, 151.228745, 144.984577, 144.984287, 144.984201),
lat_y = c(-46.59179, -34.005291, -37.926258, -37.919514, -37.923407),
species = "seahorse_spp1")
coords <- cbind(dt$lon_x, dt$lat_y)
dt_spdf <- SpatialPointsDataFrame(coords, dt)
# now add a coordinate reference system to the sp dataframe
prj4string <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +units=km +no_defs"
crs(dt_spdf) <- prj4string
# unit is in meter like required
# now create EOO polygon
dt.polygon <- redlistr::makeEOO(dt_spdf)
# now visually check the points and polygon to make sure they look correct
leaflet() %>%
addTiles() %>%
addCircles(data = dt_spdf, ~ lon_x, ~ lat_y, color = "red") %>%
addPolygons(data = dt.polygon)
# calculate EOO
redlistr::getAreaEOO(dt.polygon)
#> [1] 0.0003264353
And it keeps giving this very small, unrealistic value.
Does anyone have any idea where I did wrong?
Thank you!

Finding all nearest neighbour to all data points using sparklyr

I would like to use sparklyr find the nearest neighbour for each point in a dataset.
I've found sparklyr::ml_approx_nearest_neighbors() uses a key argument (a single feature vector) to find the nearest neighbour, so I guess I'd iterate over that for each point. Should I use this with lapply(), or is this inefficient?
Here's an example (I've modified from here) where I take the titanic dataset and attempt to find the nearest 2 neighbours from the same dataset using the first 700 data points. It returns the point itself, and the next closest as expected, but I suspect the entire pipeline reruns for each data point making this inefficient.
Is there a better way, please?
library(sparklyr)
library(titanic)
library(dplyr)
library(magrittr)
sc <- spark_connect(method = "databricks") # create a spark connection object
# clean dataset
df_titanic <- titanic::titanic_train %>%
dplyr::select(Survived, Pclass, Sex, Age, SibSp, Parch, Fare) %>%
dplyr::rename_all(tolower) %>% # make the col names lower case
dplyr::mutate(sex = ifelse(sex == 'male', 1, 0), id = 1:nrow(.)) %>% # turn sex to an integer
dplyr::filter_all(dplyr::all_vars(!is.na(.))) # remove NAs
sdf_titanic <- sparklyr::copy_to(sc, df_titanic, overwrite = T) # copy to spark
input_cols <- c('pclass', 'sex', 'age', 'sibsp', 'parch', 'fare') # features list
## append a vectorised list of the features we're interested in
sdf_titanic_va <- ft_vector_assembler(sdf_titanic,
input_cols = input_cols,
output_col = 'features')
brp_lsh <- sparklyr::ft_bucketed_random_projection_lsh(
sc,
input_col = 'features',
output_col = 'hash',
bucket_length = 2,
num_hash_tables = 3
)
brp_fit <- ml_fit(brp_lsh, sdf_titanic_va) ## fit the LSH to our data to get the hashes
id1_input <- sdf_titanic_va %>%
dplyr::filter(id %in% 1:700) %>%
dplyr::pull(features)
lapply(id1_input, function(x) ml_approx_nearest_neighbors(
brp_fit,
sdf_titanic_va,
key = x,
dist_col = 'dist_col',
num_nearest_neighbors = 2
))

Using a shape file to download MODIS product data for country in R

Is there any way that can be used to parse a shapefile of a country and download MODIS product data within that country using R?
I tried different approaches using the MODIStsp package (https://docs.ropensci.org/MODIStsp/) as well as the MODISTools package (https://docs.ropensci.org/MODISTools/articles/modistools-vignette.html) and they both only allow me to download MODIS product data for a defined site, but not a country.
Here's an example of how you might achieve this.
Firstly, download the MODIS data that you require, in this example I'm using MCD12Q1.006
begin_year and end_year are in the format: Year.Month.Days.
shape_file is the shapefile you're using, presumably the extent of the shapefile is the country you're after. Though, I'm only going off by the minimal information you have provided.
library(MODIS)
tifs <- runGdal(product = "MCD12Q1", collection = "006", SDSstring = "01",
extent = shape_file %>% st_buffer(dist = 10000),
begin = begin_year, end = end_year,
outDirPath = "data", job = "modis",
MODISserverOrder = "LPDAAC") %>%
pluck("MCD12Q1.006") %>%
unlist()
# rename tifs to have more descriptive names
new_names <- format(as.Date(names(tifs)), "%Y") %>%
sprintf("modis_mcd12q1_umd_%s.tif", .) %>%
file.path(dirname(tifs), .)
file.rename(tifs, new_names)
landcover <- list.files("data/modis", "^modis_mcd12q1_umd",
full.names = TRUE) %>%
stack()
# label layers with year
landcover <- names(landcover) %>%
str_extract("(?<=modis_mcd12q1_umd_)[0-9]{4}") %>%
paste0("y", .) %>%
setNames(landcover, .)
Also, if you require a particular cell size, then you could follow this procedure to get a 5x5 modis cell size.
neighborhood_radius <- 5 * ceiling(max(res(landcover))) / 2
agg_factor <- round(2 * neighborhood_radius / res(landcover))
r <- raster(landcover) %>%
aggregate(agg_factor)
r <- shape_file %>%
st_transform(crs = projection(r)) %>%
rasterize(r, field = 1) %>%
# remove any empty cells at edges
trim()
Here's an example using MODISTools to automate downloading the correct tiles for the country.
First let's generate a polygon of a country to demonstrate (using Luxembourg as an example):
library(maptools)
library(sf)
data(wrld_simpl)
world = st_as_sf(wrld_simpl)
lux = world[world$NAME=='Luxembourg',]
Now we find the location (centroid) and size of the country:
#find centroid of polygon in long-lat decimal degrees
lux.cent = st_centroid(lux)
#find width and height of country in km
lux.proj = st_transform(lux,
"+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=km +no_defs")
lux.km_lr = diff(st_bbox(lux.proj)[c(1,3)])
lux.km_ab = diff(st_bbox(lux.proj)[c(2,4)])
Using this info, we can download the correct Modis data (using leaf-area index, lai, as an example):
#download the MODIS tiles for the area we defined
library(MODISTools)
lux_lai <- mt_subset(product = "MOD15A2H",
lat = lux.cent$LAT, lon = lux.cent$LON,
band = "Lai_500m",
start = "2004-01-01", end = "2004-01-01",
km_lr = lux.km_lr, km_ab = lux.km_ab,
site_name = "Luxembourg",
internal = TRUE, progress = TRUE)
# convert to a spatial raster
lux.rast = mt_to_raster(df = lux_lai, reproject = TRUE)
lux.rast = raster::mask(lux.rast, lux)
plot(lux.rast)
plot(st_geometry(lux),add=T)

mutate cannot find function

I'm working through eBird code from this webpage:
https://github.com/CornellLabofOrnithology/ebird-best-practices/blob/master/03_covariates.Rmd
with the exception of using my own data. I have a .gpkg from gadm.org of Australia, and my own ebird data selected for Australia. I have followed out the code exactly with the exception of not using "bcr" as my dataset has no bcr codes, along with removing st_buffer(dist = 10000) from the rgdal code because this prevented me from actually downloading the MODIS data for some reason.
EDIT:I have also used the provided data from the site and still received the same error
I got stuck at this code:
lc_extract <- ebird_buff %>%
mutate(pland = map2(year_lc, data, calculate_pland, lc = landcover)) %>%
select(pland) %>%
unnest(cols = pland)
It returns this error:
Error: Problem with `mutate()` input `pland`.
x error in evaluating the argument 'x' in selecting a method for function 'exact_extract': invalid layer names
i Input `pland` is `map2(year_lc, data, calculate_pland, lc = landcover)`.)`
I can not seem to figure out how to correct it, I'm rather new to dense geo-spatial code like this.
There is a free dataset in the link, but I haven't yet tried it out, so it may be that my data is incompatible with the code? however, I have had a look at the Gis-data.gpkg provided, and my data from gadm seems fine.
The previous two codes to the one above were:
neighborhood_radius <- 5 * ceiling(max(res(landcover))) / 2
ebird_buff <- red_knot %>%
distinct(year = format(observation_date, "%Y"),
locality_id, latitude, longitude) %>%
# for 2019 use 2018 landcover data
mutate(year_lc = if_else(as.integer(year) > max_lc_year,
as.character(max_lc_year), year),
year_lc = paste0("y", year_lc)) %>%
# convert to spatial features
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
# transform to modis projection
st_transform(crs = projection(landcover)) %>%
# buffer to create neighborhood around each point
st_buffer(dist = neighborhood_radius) %>%
# nest by year
nest(data = c(year, locality_id, geometry))
calculate_pland <- function(yr, regions, lc) {
locs <- st_set_geometry(regions, NULL)
exact_extract(lc[[yr]], regions, progress = FALSE) %>%
map(~ count(., landcover = value)) %>%
tibble(locs, data = .) %>%
unnest(data)
}
This has been answered by the author of the webpage.
The solution was this code:
lc_extract <- NULL
for (yr in names(landcover)) {
# get the buffered checklists for a given year
regions <- ebird_buff$data[[which(yr == ebird_buff$year_lc)]]
# get landcover values within each buffered checklist area
ee <- exact_extract(landcover[[yr]], regions, progress = FALSE)
# count the number of each landcover class for each checklist buffer
ee_count <- map(ee, ~ count(., landcover = value))
# attach the year and locality id back to the checklists
ee_summ <- tibble(st_drop_geometry(regions), data = ee_count) %>%
unnest(data)
# bind to results
lc_extract <- bind_rows(lc_extract, ee_summ)
}
credits go to:
Matt Strimas-Mackey

dotsInPolys length mismatch using data downloaded via tidycensus

Can you help figure out the best way to resolve the length mismatch error thrown by dotsInPolys? I think it is because there are NA's or NULLs or some funk in the polygon data that makes it too long. Here's code that reproduces the error. Ultimately, I want to plot multiple races using Leaflet, but I can't produce the lat/lon needed for the random dots at this point.
require(maptools)
require(tidycensus)
person.number.divider <- 1000
census_api_key("ENTER KEY HERE", install = TRUE)
racevars <- c(White = "B02001_002", #"P005003"
Black = "B02001_003", #Black or African American alone
Latinx = "B03001_003"
)
nj.county <- get_acs(geography = "county", #tract
year = 2015,
variables = racevars,
state = "NJ", #county = "Harris County",
geometry = TRUE,
summary_var = "B02001_001")
library(sf)
st_write(nj.county, "nj.county.shp", delete_layer = TRUE)
nj <- rgdal::readOGR(dsn = "nj.county.shp") %>%
spTransform(CRS("+proj=longlat +datum=WGS84"))
nj#data <- nj#data %>%
tidyr::separate(NAME,
sep =",",
into = c("county", "state")) %>%
dplyr::select(estimat,variabl, GEOID, county) %>%
spread(key = variabl, value = estimat) %>%
mutate(county = trimws(county))
black.dots <- dplyr::select(nj#data, Black) / person.number.divider #%>%
black.dots <- dotsInPolys(nj, as.integer(black.dots$Black), f="random")
# Error in dotsInPolys(nj, as.integer(black.dots$Black), f = "random") :
# different lengths
length(nj) # 63 This seems too many, because I believe NJ has 21 counties.
length(black.dots$Black) # 21
This post (Advice on troubleshooting dotsInPolys error (maptools)) came close to helping me, but I couldn't see how to apply it to my case.
I can change the length of the nj spatialpolygonsdataframe by removing NA's and counties with a black pop greater than 0, but then the map doesn't plot multiple counties (maybe there is something wrong with the census download?).
It looks like you might have gotten this figured out, but I wanted to share another approach that uses sf::st_sample() instead of maptools::dotsInPolys(). One advantage of this is that you don't need to convert the sf object you get from tidycensus to a sp object.
In the following example I split the census data by race into a list three sf objects then perform st_sample() on each element of the list (each race). Next, I recombine the sampled points into one sf object with a new race variable for each point. Finally, I use tmap to make a map, though you could use ggplot2 or leaflet to map as well.
library(tidyverse)
library(tidycensus)
library(sf)
library(tmap)
person.number.divider <- 1000
racevars <- c(White = "B02001_002", #"P005003"
Black = "B02001_003", #Black or African American alone
Latinx = "B03001_003"
)
# get acs data with geography in "tidy" form
nj.county <- get_acs(geography = "county", #tract
year = 2015,
variables = racevars,
state = "NJ", #county = "Harris County",
geometry = TRUE,
summary_var = "B02001_001"
)
# split by race
county.split <- nj.county %>%
split(.$variable)
# randomly sample points in polygons based on population
points.list <- map(county.split, ~ st_sample(., .$estimate / person.number.divider))
# combine points into sf collections and add race variable
points <- imap(points.list, ~ st_sf(tibble(race = rep(.y, length(.x))), geometry = .x)) %>%
reduce(rbind)
# map!
tm_shape(nj.county) +
tm_borders(col = "darkgray", lwd = 0.5) +
tm_shape(points) +
tm_dots(col = "race", size = 0.01, pal = "Set2")
I don't have enough rep to post the map image directly, but here it is.

Resources