Poly2nb - How to get rid of empty geometries in r

Poly2nb - How to get rid of empty geometries in r - r

I have r code that I am using to compute Getis-ord Gstatistics. I typically create my shapefile in GIS, then import into r to use with the code. I recently needed to make an edit to my shapefile, which I did in GIS and imported into sas as usual:
tract<-st_read("CBSA2022.shp")
My issue is with the loop portion of my code and the poly2nb feature. Currently it is written as:
for (CBSA in CBSAs) {
temp <- tract[ tract$CBSAFP == CBSA, c("JOIN_ID", variable_of_int)]
names(temp)[2] <- 'black_pop'
#We create the weight matrices within each CBSA now
#We check that there are more than one tract in the CBSA
if ((nrow(temp) > 1)) {
q1<-poly2nb(temp, queen = queen)
if (self_include){ q1 <- include.self(q1) }
Before editing my shapefile in GIS, this worked perfectly with no errors. Now, I receive this error message:
Error in poly2nb(temp, queen = queen) : Empty geometries found
What do you think could be different about my shapefile that I now get this error? And/or how can I fix this? The only difference between this shapefile and the original, is I had to define my spatial join differently when joining my data to spatial polygons in GIS.
I haven't tried anything significant since I am not well versed in r. I did not create this code, but worked with a student (that is no longer available) to create it to be very user friendly for me to use. I've used it numerous times with different shapefiles before my recent edit, and it always worked, just not sure why I now have empty geometries or how to fix it.

Looking on the source code for poly2nb (https://github.com/r-spatial/spdep/blob/main/R/poly2nb.R):
poly2nb <- function(pl, row.names=NULL, snap=sqrt(.Machine$double.eps),
queen=TRUE, useC=TRUE, foundInBox=NULL) {
[...]
if (inherits(pl, "sfc")) {
[...]
if (attr(pl, "n_empty") > 0L)
stop("Empty geometries found")
sf <- TRUE
}
seems, that your temp object is class sfc however, the n_empty attribute isn't updated. Googling around we can find an example: https://github.com/r-spatial/sf/issues/1115. You can check n_empty for your geometries and replace (with 0) those which have value > 0.

Related

Read in a list of shapefiles and row bind them in R (preferably using tidy syntax and sf)

I have a directory with a bunch of shapefiles for 50 cities (and will accumulate more). They are divided into three groups: cities' political boundaries (CityA_CD.shp, CityB_CD.shp, etc.), neighborhoods (CityA_Neighborhoods.shp, CityB_Neighborhoods.shp, etc.), and Census blocks (CityA_blocks.shp, CityB_blocks.shp, etc.). They use common file-naming syntaxes, have the same set of attribute variables, and are all in the same CRS. (I transformed all of them as such using QGIS.) I need to write a list of each group of files (political boundaries, neighborhoods, blocks) to read as sf objects and then bind the rows to create one large sf object for each group. However I am running into consistent problems developing this workflow in R.
library(tidyverse)
library(sf)
library(mapedit)
# This first line succeeds in creating a character string of the files that match the regex pattern.
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
# This second line creates a list object from the files.
shapefile_list <- lapply(filenames, st_read)
# This third line (adopted from https://github.com/r-spatial/sf/issues/798) fails as follows.
districts <- mapedit:::combine_list_of_sf(shapefile_list)
Error: Column `District_I` cant be converted from character to numeric
# This fourth line fails in an apparently different way (also adopted from https://github.com/r-spatial/sf/issues/798).
districts <- do.call(what = sf:::rbind.sf, args = shapefile_list)
Error in CPL_get_z_range(obj, 2) : z error - expecting three columns;
The first error appears to be indicating that one of my shapefiles has an incorrect variable class for the common variable District_I but R provides no information to clue me into which file is causing the error.
The second error seems to be looking for a z coordinate but is only finding x and y in the geometry attribute.
I have four questions on this front:
How can I have R identify which list item it is attempting to read and bind is causing an error that halts the process?
How can I force R to ignore the incompatibility issue and coerce the variable class to character so that I can deal with the variable inconsistency (if that's what it is) in R?
How can I drop a variable entirely from the read sf objects that is causing an error (i.e. omit District_I for all read_sf calls in the process)?
More generally, what is going on and how can I solve the second error?
Thanks all as always for your help.
P.S.: I know this post isn't "reproducible" in the desired way, but I'm not sure how to make it so besides copying the contents of all my shapefiles. If I'm mistaken on this point, I'd gladly accept any wisdom on this front.
UPDATE:
I've run
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
shapefile_list <- lapply(filenames, st_read)
districts <- mapedit:::combine_list_of_sf(shapefile_list)
successfully on a subset of three of the shapefiles. So I've confirmed that there is some class conflict between the column District_I in one of the files causing the hold-up when running the code on the full batch. But again, I need the error to identify the file name causing the issue so I can fix it in the file OR need the code to coerce District_I to character in all files (which is the class I want that variable to be in anyway).
A note, particularly regarding Pablo's recommendation:
districts <- do.call(what = dplyr::rbind_all, shapefile_list)
results in an error
Error in (function (x, id = NULL) : unused argument
followed by a long string of digits and coordinates. So,
mapedit:::combine_list_of_sf(shapefile_list)
is definitely the mechanism to read from the list and merge the files, but I still need a way to diagnose the source of the column incompatibility error across shapefiles.

So after much fretting and some great guidance from Pablo (and his link to https://community.rstudio.com/t/simplest-way-to-modify-the-same-column-in-multiple-dataframes-in-a-list/13076), the following works:
library(tidyverse)
library(sf)
# Reads in all shapefiles from Directory that include the string "_CDs".
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
# Applies the function st_read from the sf package to each file saved as a character string to transform the file list to a list object.
shapefile_list <- lapply(filenames, st_read)
# Creates a function that transforms a problem variable to class character for all shapefile reads.
my_func <- function(data, my_col){
my_col <- enexpr(my_col)
output <- data %>%
mutate(!!my_col := as.character(!!my_col))
}
# Applies the new function to our list of shapefiles and specifies "District_I" as our problem variable.
districts <- map_dfr(shapefile_list, ~my_func(.x, District_I))

Rasterize() slow on large SpatialPolygonsDataFrame, alternatives?

I have a large (266,000 elements, 1.7Gb) SpatialPolygonsDataFrame that I am try to convert into 90m resolution RasterLayer (~100,000,000 cells)
The SpatialPolygonsDataFrame has 12 variables of interest to me, thus I intend to make 12 RasterLayers
At the moment, using rasterize(), each conversion takes ~2 days. So nearly a month expected for total processing time.
Can anyone suggest a faster process? I think this would be ~10-40x faster in ArcMap, but I want to do it in R to keep things consistent, and it's a fun challenge!
general code
######################################################
### Make Rasters
######################################################
##Make template
r<-raster(res=90,extent(polys_final))
##set up loop
loop_name <- colnames(as.data.frame(polys_final))
for(i in 1:length(loop_name)){
a <-rasterize(polys_final, r, field=i)
writeRaster(a, filename=paste("/Users/PhD_Soils_raster_90m/",loop_name[i],".tif",sep=""), format="GTiff")
}

I think this is a case for using GDAL, specifically the gdal_rasterize function.
You probably already have GDAL installed on your machine if you are doing a lot of spatial stuff, and you can run GDAL commands from within R using the system() command. I didn't do any tests or anything but this should be be MUCH faster than using the raster package in R.
For example, the code below creates a raster from a shapefile of rivers. This code creates an output file with a 1 value wherever a feature exists, and a 0 where no feature exists.
path_2_gdal_function <- "/Library/Frameworks/GDAL.framework/Programs/gdal_rasterize"
outRaster <- "/Users/me/Desktop/rasterized.tiff"
inVector <- "/Full/Path/To/file.shp"
theCommand <- sprintf("%s -burn 1 -a_nodata 0 -ts 1000 1000 %s %s", path_2_gdal_function, inVector, outRaster)
system(theCommand)
the -ts argument provides the size of the output raster in pixels
the -burn argument specifies what value to put in the output raster where the features exist
-a_nodata indicates which value to put where no features are found
For your case, you will want to add in the -a attribute_name argument, which specifies the name of the attribute in the input vector to be burned into the output raster. Full details on possible arguments here.
Note: that the sprintf() function is just used to format the text string that is passted to the command line using the system() function

Extract data without geometries from `sf` objects like in `sp#data`

Probably a very basic question but I found nothing in the documentation of Simple Features R package.
I'm looking for the native sf function to extract on the fly all the columns of an sf object without the geometries. Just like SP#data with sp objects.
The following function does the job but I would prefer to use a native function :
st_data <- function(SF) { SF[, colnames(SF) != attr(SF, "sf_column"), drop = TRUE]}
A typical use is when I want to merge two sf dataset by attribute (merge does not work with two sf objects) : merge(SF1, st_data(SF2)).
In that case it would be impractical to use st_geometry(SF2) <- NULL because it does not work "on the fly" and I don't want to permanently drop the geometry column and SF2[,1:5,drop=T] is impractical too because I have to look into the object to see where the geometry column is.
Using : sf_0.5-4 - R 3.4.1

We can use the st_geometry<- function and set the geometry to be NULL.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc_df <- `st_geometry<-`(nc, NULL)
class(nc_df)
[1] "data.frame"
As you can see, nc_df is a dataframe now, so I think you can do the following for your example.
merge(SF1, `st_geometry<-`(SF2, NULL))
Update
As Gilles pointed out, another function, st_set_geometry, can also achieve the same task. It is probably a better choice since using st_set_geometry does not need the use of "``" and "<-" to enclose the st_geometry function.

Previously working ggplot2 script now returning fatal error at fortify

I have previously written a script to create a colored map of the US, with each state colored based on some simulated data. The idea is to later be able to replace the simulated data with some measure. It was written to be self-contained and originally ran just fine, but now crashes when the fortify {ggplot2} command is run.
I believe this is due to a problem with the fortify command, as it returns a fatal error and restarts R at that point. Here is the code up to the point of the fatal error:
###Load libraries
library(maptools)
library(ggplot2)
library(ggmap)
library(rgdal)
library(dplyr)
#Set working directory to where you want your files to exist (or where they already exist)
#Download, read and translate coord data for shape file of US States
if(!file.exists('tl_2014_us_state.shp')){
download.file('ftp://ftp2.census.gov/geo/tiger/TIGER2014/STATE/tl_2014_us_state.zip',
'tl_2014_us_state.zip')
files <- unzip('tl_2014_us_state.zip')
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
} else {
tract <- readShapeSpatial("./tl_2014_us_state.shp") #%>% spTransform(CRS("+proj=longlat +datum=WGS84"))
}
# shape<-readShapeSpatial("./fao/World_Fao_Zones.shp")
#Download reference data for state names and abbreviations - a matter of convenience if there are
#states for which you have no data
if(!file.exists('states.csv')){
download.file('http://www.fonz.net/blog/wp-content/uploads/2008/04/states.csv',
'states.csv')
states <- read.csv('states.csv')
} else {
states <- read.csv('states.csv')
}
#simulated data for plotting values of some 'characteristic'
mydata <- data.frame(rnorm(51, 0, 1)) #51 "states" in the state dataset
names(mydata)[1] <- 'value' #give the simulated column of data a name
#Turn geo data into R dataframe
tract_geom<-fortify(tract,region="STUSPS") #STUSPS is the state abbreviation which will act as a key for merge
The script stops working at the line above and crashes R with a fatal error. I have tried a workaround described in another post, in which you place an explicit "id" column in the spatial dataframe, which fortify then uses as the key by default. With this modification the lines:
tract#data$id <- tract#data$STUSPS
tract_geom <- fortify(tract)
would replace tract_geom<-fortify(tract,region="STUSPS") in the previous code,
where STUSPS is the key for a later data merge.
Unfortunately, when I then fortify the tract data, the id column is not the state abbreviation as expected, but is instead a vector of characters between "0" and "55" (56 unique values). It appears that the state abbreviations (of which there are 56) are somehow being transformed into numbers and then into characters.
I am working on figuring out why this is happening and looking for a fix. If the fortify function worked with the region argument, that would be ideal, but if I can get the workaround to work, that would be great too. Any help would be greatly appreciated. I have looked at the documentation and at solutions to various similar problems and have come up short (even tried ArcGIS).

Try:
readOGR(..., stringsAsFactors=FALSE, ...)

I was able to solve my own question by running update.packages(). I'm not entirely sure which package was the culprit, but it could have been maptools, rgdal, or sp, as these were among the packages to be updated that may have influenced the problem.
In the end, after updates, the script runs in its original form with the line tract_geom<-fortify(tract,region="STUSPS") intact. Thank you to those who helped me work through this problem.

Creating Shapefiles in R

I'm trying to create a shapefile in R that I will later import to either Fusion Table or some other GIS application.
To start,I imported a blank shapefile containing all the census tracts in Canada. I have attached other data (in tabular format) to the shapefile based on the unique ID of the CTs, and I have mapped my results. At the moment, I only need the ones in Vancouver and I would like to export a shapefile that contains only the Vancouver CTs as well as my newly attached attribute data.
Here is my code (some parts omitted due to privacy reasons):
shape <- readShapePoly('C:/TEST/blank_ct.shp') #Load blank shapefile
shape#data = data.frame(shape#data, data2[match(shape#data$CTUID, data2$CTUID),]) #data2 is my created attributes that I'm attaching to blank file
shape1 <-shape[shape$CMAUID == 933,] #selecting the Vancouver CTs
I've seen other examples using this: writePolyShape to create the shapefile. I tried it, and it worked to an extent. It created the .shp, .dbf, and .shx files. I'm missing the .prj file and I'm not sure how to go about creating it. Are there better methods out there for creating shapefiles?
Any help on this matter would be greatly appreciated.

Use rgdal and writeOGR. rgdal will preserve the projection information
something like
library(rdgal)
shape <- readOGR(dsn = 'C:/TEST', layer = 'blank_ct')
# do your processing
shape#data = data.frame(shape#data, data2[match(shape#data$CTUID, data2$CTUID),]) #data2 is my created attributes that I'm attaching to blank file
shape1 <-shape[shape$CMAUID == 933,]
writeOGR(shape1, dsn = 'C:/TEST', layer ='newstuff', driver = 'ESRI Shapefile')
Note that the dsn is the folder containing the .shp file, and the layer is the name of the shapefile without the .shp extension. It will read (readOGR) and write (writeOGR) all the component files (.dbf, .shp, .prj etc)

Problem solved! Thank you again for those who help!
Here is what I ended up doing:
As Mnel wrote, this line will create the shapefile.
writeOGR(shape1, dsn = 'C:/TEST', layer ='newstuff', driver = 'ESRI Shapefile')
However, when I ran this line, it came back with this error:
Can't convert columns of class: AsIs; column names: ct2,mprop,mlot,mliv
This is because my attribute data was not numeric, but were characters. Luckily, my attribute data is all numbers so I ran transform() to fix this problem.
shape2 <-shape1
shape2#data <- transform(shape1#data, ct2 = as.numeric(ct2),
mprop = as.numeric(mprop),
mlot = as.numeric(mlot),
mliv = as.numeric(mliv))
I tried the writeOGR() command again, but I still didn't get the .prj file that I was looking for. The problem was I didn't specified the coordinate systems for the shapefile when I was importing the file. Since I already know what the coordinate system is, all I had to do was define it when importing.
readShapePoly('C:/TEST/blank_ct.shp',proj4string=CRS("+proj=longlat +datum=WGS84")
After that, I re-ran all the things I wanted to do with the shapefile, and the writeOGR line for exporting. And that's it!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Poly2nb - How to get rid of empty geometries in r - r

Related

Read in a list of shapefiles and row bind them in R (preferably using tidy syntax and sf)

Rasterize() slow on large SpatialPolygonsDataFrame, alternatives?

Extract data without geometries from `sf` objects like in `sp#data`

Previously working ggplot2 script now returning fatal error at fortify

Creating Shapefiles in R

Categories

Resources