Extract data without geometries from `sf` objects like in `sp#data` - r

Probably a very basic question but I found nothing in the documentation of Simple Features R package.
I'm looking for the native sf function to extract on the fly all the columns of an sf object without the geometries. Just like SP#data with sp objects.
The following function does the job but I would prefer to use a native function :
st_data <- function(SF) { SF[, colnames(SF) != attr(SF, "sf_column"), drop = TRUE]}
A typical use is when I want to merge two sf dataset by attribute (merge does not work with two sf objects) : merge(SF1, st_data(SF2)).
In that case it would be impractical to use st_geometry(SF2) <- NULL because it does not work "on the fly" and I don't want to permanently drop the geometry column and SF2[,1:5,drop=T] is impractical too because I have to look into the object to see where the geometry column is.
Using : sf_0.5-4 - R 3.4.1

We can use the st_geometry<- function and set the geometry to be NULL.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
nc_df <- `st_geometry<-`(nc, NULL)
class(nc_df)
[1] "data.frame"
As you can see, nc_df is a dataframe now, so I think you can do the following for your example.
merge(SF1, `st_geometry<-`(SF2, NULL))
Update
As Gilles pointed out, another function, st_set_geometry, can also achieve the same task. It is probably a better choice since using st_set_geometry does not need the use of "``" and "<-" to enclose the st_geometry function.

Related

Poly2nb - How to get rid of empty geometries in r

I have r code that I am using to compute Getis-ord Gstatistics. I typically create my shapefile in GIS, then import into r to use with the code. I recently needed to make an edit to my shapefile, which I did in GIS and imported into sas as usual:
tract<-st_read("CBSA2022.shp")
My issue is with the loop portion of my code and the poly2nb feature. Currently it is written as:
for (CBSA in CBSAs) {
temp <- tract[ tract$CBSAFP == CBSA, c("JOIN_ID", variable_of_int)]
names(temp)[2] <- 'black_pop'
#We create the weight matrices within each CBSA now
#We check that there are more than one tract in the CBSA
if ((nrow(temp) > 1)) {
q1<-poly2nb(temp, queen = queen)
if (self_include){ q1 <- include.self(q1) }
Before editing my shapefile in GIS, this worked perfectly with no errors. Now, I receive this error message:
Error in poly2nb(temp, queen = queen) : Empty geometries found
What do you think could be different about my shapefile that I now get this error? And/or how can I fix this? The only difference between this shapefile and the original, is I had to define my spatial join differently when joining my data to spatial polygons in GIS.
I haven't tried anything significant since I am not well versed in r. I did not create this code, but worked with a student (that is no longer available) to create it to be very user friendly for me to use. I've used it numerous times with different shapefiles before my recent edit, and it always worked, just not sure why I now have empty geometries or how to fix it.
Looking on the source code for poly2nb (https://github.com/r-spatial/spdep/blob/main/R/poly2nb.R):
poly2nb <- function(pl, row.names=NULL, snap=sqrt(.Machine$double.eps),
queen=TRUE, useC=TRUE, foundInBox=NULL) {
[...]
if (inherits(pl, "sfc")) {
[...]
if (attr(pl, "n_empty") > 0L)
stop("Empty geometries found")
sf <- TRUE
}
seems, that your temp object is class sfc however, the n_empty attribute isn't updated. Googling around we can find an example: https://github.com/r-spatial/sf/issues/1115. You can check n_empty for your geometries and replace (with 0) those which have value > 0.

Is there a way to delete existing variables in my R environment via the terminal? [duplicate]

I would like to remove some data from the workspace. I know the "Clear All" button will remove all data. However, I would like to remove just certain data.
For example, I have these data frames in the data section:
data
data_1
data_2
data_3
I would like to remove data_1, data_2 and data_3, while keeping data.
I tried data_1 <- data_2 <- data_3 <- NULL, which does remove the data (I think), but still keeps it in the workspace area, so it is not fully what I would like to do.
You'll find the answer by typing ?rm
rm(data_1, data_2, data_3)
A useful way to remove a whole set of named-alike objects:
rm(list = ls()[grep("^tmp", ls())])
thereby removing all objects whose name begins with the string "tmp".
Edit: Following Gsee's comment, making use of the pattern argument:
rm(list = ls(pattern = "^tmp"))
Edit: Answering Rafael comment, one way to retain only a subset of objects is to name the data you want to retain with a specific pattern. For example if you wanted to remove all objects whose name do not start with paper you would issue the following command:
rm(list = grep("^paper", ls(), value = TRUE, invert = TRUE))
Following command will do
rm(list=ls(all=TRUE))
In RStudio, ensure the Environment tab is in Grid (not List) mode.
Tick the object(s) you want to remove from the environment.
Click the broom icon.
You can use the apropos function which is used to find the objects using partial name.
rm(list = apropos("data_"))
Use the following command
remove(list=c("data_1", "data_2", "data_3"))
If you just want to remove one of a group of variables, then you can create a list and keep just the variable you need. The rm function can be used to remove all the variables apart from "data". Here is the script:
0->data
1->data_1
2->data_2
3->data_3
#check variables in workspace
ls()
rm(list=setdiff(ls(), "data"))
#check remaining variables in workspace after deletion
ls()
#note: if you just use rm(list) then R will attempt to remove the "list" variable.
list=setdiff(ls(), "data")
rm(list)
ls()
paste0("data_",seq(1,3,1))
# makes multiple data.frame names with sequential number
rm(list=paste0("data_",seq(1,3,1))
# above code removes data_1~data_3
If you're using RStudio, please consider never using the rm(list = ls()) approach!* Instead, you should build your workflow around frequently employing the Ctrl+Shift+F10 shortcut to restart your R session. This is the fastest way to both nuke the current set of user-defined variables AND to clear loaded packages, devices, etc. The reproducibility of your work will increase markedly by adopting this habit.
See this excellent thread on Rstudio community for (h/t #kierisi) for a more thorough discussion (the main gist is captured by what I've stated already).
I must admit my own first few years of R coding featured script after script starting with the rm "trick" -- I'm writing this answer as advice to anyone else who may be starting out their R careers.
*of course there are legitimate uses for this -- much like attach -- but beginning users will be much better served (IMO) crossing that bridge at a later date.
To clear all data:
click on Misc>Remove all objects.
Your good to go.
To clear the console:
click on edit>Clear console.
No need for any code.
Adding one more way, using ls() and remove()
ls() return a vector of character strings giving the names of the objects in the specified environment.
Create a list of objects you want to remove from the environment using ls() and then use remove() to remove it.
remove(list = ls()[ls() != "data"])
You can also use tidyverse
# to remove specific objects(s)
rm(list = ls() %>% str_subset("xxx"))
# or to keep specific object(s)
rm(list = setdiff(ls(), ls() %>% str_subset("xxx")))
Maybe this can help as well
remove(list = c(ls()[!ls() %in% c("what", "to", "keep", "here")] ) )

Read in a list of shapefiles and row bind them in R (preferably using tidy syntax and sf)

I have a directory with a bunch of shapefiles for 50 cities (and will accumulate more). They are divided into three groups: cities' political boundaries (CityA_CD.shp, CityB_CD.shp, etc.), neighborhoods (CityA_Neighborhoods.shp, CityB_Neighborhoods.shp, etc.), and Census blocks (CityA_blocks.shp, CityB_blocks.shp, etc.). They use common file-naming syntaxes, have the same set of attribute variables, and are all in the same CRS. (I transformed all of them as such using QGIS.) I need to write a list of each group of files (political boundaries, neighborhoods, blocks) to read as sf objects and then bind the rows to create one large sf object for each group. However I am running into consistent problems developing this workflow in R.
library(tidyverse)
library(sf)
library(mapedit)
# This first line succeeds in creating a character string of the files that match the regex pattern.
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
# This second line creates a list object from the files.
shapefile_list <- lapply(filenames, st_read)
# This third line (adopted from https://github.com/r-spatial/sf/issues/798) fails as follows.
districts <- mapedit:::combine_list_of_sf(shapefile_list)
Error: Column `District_I` cant be converted from character to numeric
# This fourth line fails in an apparently different way (also adopted from https://github.com/r-spatial/sf/issues/798).
districts <- do.call(what = sf:::rbind.sf, args = shapefile_list)
Error in CPL_get_z_range(obj, 2) : z error - expecting three columns;
The first error appears to be indicating that one of my shapefiles has an incorrect variable class for the common variable District_I but R provides no information to clue me into which file is causing the error.
The second error seems to be looking for a z coordinate but is only finding x and y in the geometry attribute.
I have four questions on this front:
How can I have R identify which list item it is attempting to read and bind is causing an error that halts the process?
How can I force R to ignore the incompatibility issue and coerce the variable class to character so that I can deal with the variable inconsistency (if that's what it is) in R?
How can I drop a variable entirely from the read sf objects that is causing an error (i.e. omit District_I for all read_sf calls in the process)?
More generally, what is going on and how can I solve the second error?
Thanks all as always for your help.
P.S.: I know this post isn't "reproducible" in the desired way, but I'm not sure how to make it so besides copying the contents of all my shapefiles. If I'm mistaken on this point, I'd gladly accept any wisdom on this front.
UPDATE:
I've run
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
shapefile_list <- lapply(filenames, st_read)
districts <- mapedit:::combine_list_of_sf(shapefile_list)
successfully on a subset of three of the shapefiles. So I've confirmed that there is some class conflict between the column District_I in one of the files causing the hold-up when running the code on the full batch. But again, I need the error to identify the file name causing the issue so I can fix it in the file OR need the code to coerce District_I to character in all files (which is the class I want that variable to be in anyway).
A note, particularly regarding Pablo's recommendation:
districts <- do.call(what = dplyr::rbind_all, shapefile_list)
results in an error
Error in (function (x, id = NULL) : unused argument
followed by a long string of digits and coordinates. So,
mapedit:::combine_list_of_sf(shapefile_list)
is definitely the mechanism to read from the list and merge the files, but I still need a way to diagnose the source of the column incompatibility error across shapefiles.
So after much fretting and some great guidance from Pablo (and his link to https://community.rstudio.com/t/simplest-way-to-modify-the-same-column-in-multiple-dataframes-in-a-list/13076), the following works:
library(tidyverse)
library(sf)
# Reads in all shapefiles from Directory that include the string "_CDs".
filenames <- list.files("Directory", pattern=".*_CDs.*shp", full.names=TRUE)
# Applies the function st_read from the sf package to each file saved as a character string to transform the file list to a list object.
shapefile_list <- lapply(filenames, st_read)
# Creates a function that transforms a problem variable to class character for all shapefile reads.
my_func <- function(data, my_col){
my_col <- enexpr(my_col)
output <- data %>%
mutate(!!my_col := as.character(!!my_col))
}
# Applies the new function to our list of shapefiles and specifies "District_I" as our problem variable.
districts <- map_dfr(shapefile_list, ~my_func(.x, District_I))

Select columns when reading in files with st_read

I am trying to read 39 json files into a common sf dataset in R.
Here is the method I've been trying:
path <- "~/directory"
file.names <- as.list(dir(path, pattern='.json', full.names=T))
geodata <- do.call(rbind, lapply(file.names, st_read))
The problem is in the last line: rbind cannot work because the files have different numbers of columns. However, they all have three columns in common, and which I care about: MOVEMENT_ID, DISPLAY_NAME and geometry. How could I select only these three columns when running st_read?
I've tried running geodata<-do.call(rbind, lapply(file.names, st_read,select=c('MOVEMENT_ID', 'DISPLAY_NAME', 'geometry'))) but, in this case, st_read does not seem to recognise the geometry column (error: 'no simple features geometry column pressent').
I've also tried to use fread in place of st_read but this doesn't work as fread is not adapted to spatial data.
Run lapply over a function that calls st_read and then does what you need to it, something like:
read_my_json = function(f){
s = st_read(f)
return(s[,c("MOVEMENT_ID","DISPLAY_NAME")]
}
(I'm pretty sure you don't have to select the geometry as well, you get that for free when selecting columns of an sf spatial object)
then do.call(rbind, lapply(file.names, read_my_json)) should work.
no extra packages need to be included and it has the big advantage in that you can test this function to see how it works on a single item before throwing a thousand at it.

How do I clear only a few specific objects from the workspace?

I would like to remove some data from the workspace. I know the "Clear All" button will remove all data. However, I would like to remove just certain data.
For example, I have these data frames in the data section:
data
data_1
data_2
data_3
I would like to remove data_1, data_2 and data_3, while keeping data.
I tried data_1 <- data_2 <- data_3 <- NULL, which does remove the data (I think), but still keeps it in the workspace area, so it is not fully what I would like to do.
You'll find the answer by typing ?rm
rm(data_1, data_2, data_3)
A useful way to remove a whole set of named-alike objects:
rm(list = ls()[grep("^tmp", ls())])
thereby removing all objects whose name begins with the string "tmp".
Edit: Following Gsee's comment, making use of the pattern argument:
rm(list = ls(pattern = "^tmp"))
Edit: Answering Rafael comment, one way to retain only a subset of objects is to name the data you want to retain with a specific pattern. For example if you wanted to remove all objects whose name do not start with paper you would issue the following command:
rm(list = grep("^paper", ls(), value = TRUE, invert = TRUE))
Following command will do
rm(list=ls(all=TRUE))
In RStudio, ensure the Environment tab is in Grid (not List) mode.
Tick the object(s) you want to remove from the environment.
Click the broom icon.
You can use the apropos function which is used to find the objects using partial name.
rm(list = apropos("data_"))
Use the following command
remove(list=c("data_1", "data_2", "data_3"))
If you just want to remove one of a group of variables, then you can create a list and keep just the variable you need. The rm function can be used to remove all the variables apart from "data". Here is the script:
0->data
1->data_1
2->data_2
3->data_3
#check variables in workspace
ls()
rm(list=setdiff(ls(), "data"))
#check remaining variables in workspace after deletion
ls()
#note: if you just use rm(list) then R will attempt to remove the "list" variable.
list=setdiff(ls(), "data")
rm(list)
ls()
paste0("data_",seq(1,3,1))
# makes multiple data.frame names with sequential number
rm(list=paste0("data_",seq(1,3,1))
# above code removes data_1~data_3
If you're using RStudio, please consider never using the rm(list = ls()) approach!* Instead, you should build your workflow around frequently employing the Ctrl+Shift+F10 shortcut to restart your R session. This is the fastest way to both nuke the current set of user-defined variables AND to clear loaded packages, devices, etc. The reproducibility of your work will increase markedly by adopting this habit.
See this excellent thread on Rstudio community for (h/t #kierisi) for a more thorough discussion (the main gist is captured by what I've stated already).
I must admit my own first few years of R coding featured script after script starting with the rm "trick" -- I'm writing this answer as advice to anyone else who may be starting out their R careers.
*of course there are legitimate uses for this -- much like attach -- but beginning users will be much better served (IMO) crossing that bridge at a later date.
To clear all data:
click on Misc>Remove all objects.
Your good to go.
To clear the console:
click on edit>Clear console.
No need for any code.
Adding one more way, using ls() and remove()
ls() return a vector of character strings giving the names of the objects in the specified environment.
Create a list of objects you want to remove from the environment using ls() and then use remove() to remove it.
remove(list = ls()[ls() != "data"])
You can also use tidyverse
# to remove specific objects(s)
rm(list = ls() %>% str_subset("xxx"))
# or to keep specific object(s)
rm(list = setdiff(ls(), ls() %>% str_subset("xxx")))
Maybe this can help as well
remove(list = c(ls()[!ls() %in% c("what", "to", "keep", "here")] ) )

Resources