Is there a way of reading shapefiles directly into R from an online source? - r

I am trying to find a way of loading shapefiles (.shp) from an online repository/folder/url directly into my global environment in R, for the purpose of making plots in ggplot2 using geom_sf. In the first instance I'm using my Google Drive to store these files but I'd ideally like to find a solution that works with any folder with a valid url and appropriate access rights.
So far I have tried a few options, the first 2 involving zipping the source folder on Google Drive where the shapefiles are stored and then downloading and unzipping in some way. Have included reproducable examples using a small test shapefile:
Using utils::download.file() to retrieve the compressed folder and unzipping using either base::system('unzip..') or zip::unzip() (loosely following this thread: Downloading County Shapefile from ONS):
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Download the zipped file/folder
download.file("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing", destfile = "data/test_shp.zip")
# Unzip folder using unzip (fails)
unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Unzip folder using system (also fails)
system("unzip data/test_shp.zip")
If you can't run the above code then FYI the 2 error messages are:
Warning message:
In unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", :
error 1 in extracting from zip file
AND
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of data/test_shp.zip or
data/test_shp.zip.zip, and cannot find data/test_shp.zip.ZIP, period.
Worth noting here that I can't even manually unzip this folder outside R so I think there's something going wrong with the download.file() step.
Using the googledrive package:
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Specify googledrive url:
test_shp = drive_get(as_id("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing"))
# Download zipped folder
drive_download(test_shp, path = "data/test_shp.zip")
# Unzip folder
zip::unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Load test.shp
test_shp <- read_sf("data/test_shp/test.shp")
And that works!
...Except it's still a hacky workaround, which requires me to zip, download, unzip and then use a separate function (such as sf::read_sf or st_read) to read in the data into my global environment. And, as it's using the googledrive package it's only going to work for files stored in this system (not OneDrive, DropBox and other urls).
I've also tried sf::read_sf, st_read and fastshp::read.shp directly on the folder url but those approaches all fail as one might expect.
So, my question: is there a workflow for reading shapefiles stored online directly into R or should I stop looking? If there is not, but there is a way of expanding my above solution (2) beyond googledrive, I'd appreciate any tips on that too!
Note: I should also add that I have deliberately ignored any option requiring the package rgdal due to its imminient permanent retirement and so am looking for options that are at least somewhat future-proof (I understand all packages drop off the map at some point). Thanks in advance!

I ran into a similar problem recently, having to read in shapefiles directly from Dropbox into R.
As a result, this solution only applies for the case of Dropbox.
The first thing you will need to do is create a refreshable token for Dropbox using rdrop2, given recent changes from Dropbox that limit single token use to 4 hours. You can follow this SO post.
Once you have set up your refreshable token, identify all the files in your spatial data folder on Dropbox using:
shp_files_on_db<- drop_dir("Dropbox path/to your/spatial data/", dtoken = refreshable_token) %>%
filter(str_detect(name, "adm2"))
My 'spatial data' folder contained two sets of shapefiles – adm1 and adm 2. I used the above code to choose only those associated with adm2.
Then create a vector of the names of the shp, csv, shx, dbf, cpg files in the 'spatial data' folder, as follows:
shp_filenames<- shp_files_on_db$name
I choose to read in shapefiles into a temporary directory, avoiding the need to have to store the files on my disk – also useful in a Shiny implementation. I create this temporary directory as follows:
# create a new directory under tempdir
dir.create(dir1 <- file.path(tempdir(), "testdir"))
#If needed later on, you can delete this temporary directory
unlink(dir1, recursive = T)
#And test that it no longer exists
dir.exists(dir1)
Now download the Dropbox files to this temporary directory:
for (i in 1: length(shp_filenames)){
drop_download(paste0("Dropbox path/to your/spatial data/",shp_filenames[i]),
dtoken = refreshable_token,
local_path = dir1)
}
And finally, read in your shapefile as follows:
#path to the shapefile in the temporary directory
path1_shp<- paste0(dir1, "/myfile_adm2.shp")
#reading in the shapefile using the sf package - a recommended replacement for rgdal
shp1a <- st_read(path1_shp)

Related

R waldo does not find difference that snapshot test finds in .RData file

looking for a bit of support:
I have done some expect_snapshot_file() tests from the testthat package - and the files end up not matching when being done in different RStudio sessions (I'm guessing that this is related to this).
I opened both files (the original and the .new file) and attempted to compare them using waldo::compare(old,new) - but waldo does not find any differences in the files. Both the snapshot tests and my git system also notes that these files are different, but again I don't know why/where in the file. Note: It is possible to recreate the difference when setting all this up in a git folder - commit the outcomes.RData file, then rename the outcomes.new.RData file to outcomes.RDataand then git will display a difference - but again because it's a binary file, it won't tell me where...
I cannot use the snapshot_review() function because it does not seem to work with .RData files - it immediately gives me a complicated error.
So I would like to investigate where the differences are. I presume that it might be something related to the R session that I'm working in.
I have provided both files here:
https://transfer.sh/7UwTLD/outcomes.new.RData
https://transfer.sh/xgqhL7/outcomes.RData
and then compare them:
download.file("https://transfer.sh/xgqhL7/outcomes.RData", destfile = "outcomes.RData")
download.file("https://transfer.sh/7UwTLD/outcomes.new.RData", destfile = "outcomes.new.RData")
load("outcomes.RData")
old <- outcomes
load("outcomes.new.RData")
new <- outcomes
waldo::compare(old, new)
#> v No differences
Created on 2022-02-13 by the reprex package (v0.3.0)
The original saving function always stays the same:
save_file <- function(outcomes){
path <- tempfile(fileext = ".RData")
save(outcomes, file = path)
path
}
expect_snapshot_file(save_file(outcomes), name = "outcomes.RData")
Is there any other way that snapshot uses "under the hood"? What other aspects could I consider? Are there any other tools how I could compare files under Windows in a useful manner?
Many thanks!

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

Creating zip file from folders in R

Try to create a zip file from one folder using R.
It mentioned "Rcompression" package here:
Creating zip file from folders
But I didn't find where I can download this package for Windows system.
Any suggestions? or other functions to create a zip file?
You can create a zip file with the function zip from utils package quite easily. Say you have a directory testDir and you wish to zip a file (or multiple files) inside the directory,
dir('testDir')
# [1] "cats.csv" "test.csv" "txt.txt"
zip(zipfile = 'testZip', files = 'testDir/test.csv')
# adding: testDir/test.csv (deflated 68%)
The zipped file is saved in the current working directory, unless a different path is specified in the zipfile argument. We can see its size relative to the original unzipped file with
file.info(c('testZip.zip', 'testDir/test.csv'))['size']
# size
# testZip.zip 805
# testDir/test.csv 1493
You can zip the whole directory of files (if no sub-folders) with
files2zip <- dir('testDir', full.names = TRUE)
zip(zipfile = 'testZip', files = files2zip)
# updating: testDir/test.csv (deflated 68%)
# updating: testDir/cats.csv (deflated 27%)
# updating: testDir/txt.txt (stored 0%)
And unzip it to view the files,
unzip('testZip.zip', list = TRUE)
# Name Length Date
# 1 testDir/test.csv 1493 2014-05-14 20:54:00
# 2 testDir/cats.csv 116 2014-05-14 20:54:00
# 3 testDir/txt.txt 32 2014-05-08 09:37:00
Note: From ?zip, regarding the zip argument.
On Windows, the default relies on a zip program (for example that from Rtools) being in the path.
For avoiding (a) an issue with relative paths (i.e., the zip file itself containing a folder structure with the full folder path to be zipped) and (b) for loops (well, style), you may use
my_wd<-getwd() # save your current working directory path
dest_path<-"C:/.../folder_with_files_to_be_zipped"
setwd(dest_path)
files<-list.files(dest_path)
named<-paste0(files,".zip")
mapply(zip,zipfile=named,files=files)
setwd(my_wd) # reset working directory path
Unlike R´s build-in unzip function, zip requires a zip-program like 7-zip (Windows) or the one being part of Rtools to be present in your system path.
For people still looking for this: there is now a "zip" package that does not depend on external executables.
You can install from the omegahat repos:
install.packages('Rcompression', repos = "http://www.omegahat.org/R", type = "source")
for windows you will need to jump through hoops installing zlib and bzip2 and linking appropriately.
utils::zip can be used in some cases. There are a number of issues with it. One case is that the maximum length of the string that you can use at the command prompt is 8191 characters (2047 characters on some versions) for windows. If you are zipping a directory with alot of characters for the names of directories/files this will cause issues. For example if you zip your firefox profile directory. Also I found the zip command needed to be issued relative the directory I was zipping to use relative directory names. Rcompression has a altNames argument which handles this.
That being said I have always had problems getting Rcompression to run on windows.
It's worth noting that zip() will fail silently if it cannot find a zip program.
zip returns an error code (or exit code) invisibly. That is, it will not print, unless you explicitly ask it to.
You can run print(zip(output, input)), to print the exit code, which in the case of no zip program found, will print 127
Alternatively you can do something along the lines of
#exit code 0 for success, all other codes are for failure
if (exit_code <- zip(output, input) != 0) {
stop("Zipping ", input, " failed with exit code:", exit_code)
}
Make that
#Convertir todas las carpetas en .zip
d <- "C:/Users/Eric/Documents/R/win-library/3.3"
array <- list.files(d)
for (i in 1:length(array)){
name <- paste0(array[i],".zip")
zip(name, files = paste0(d,paste0("/",array[i])))
}

Retain valid workspace reference after project transfer.

I've been working on a R project (projectA) that I want to hand over to a colleague, what would be the best way to handle workspace references in the scripts? To illustrate, let's say projectA consists of several R scripts that each read input and write output to certain directories (dirs). All dirs are contained within my local dropbox. The I/O part of the scripts look as follows:
# Script 1.
# Give input and output names and dirs:
dat1Dir <- "D:/Dropbox/ProjectA/source1/"
dat1In <- "foo1.asc"
dat2Dir <- "D:/Dropbox/ProjectA/source2/"
dat2In <- "foo2.asc"
outDir <- "D:/Dropbox/ProjectA/output1/"
outName <- "fooOut1.asc"
# Read data
setwd(dat1Dir)
dat1 <- read.table(dat1In)
setwd(dat2Dir)
dat2 <- read.table(dat2In)
# do stuff with dat1 and dat2 that result in new data foo
# Write new data foo to file
setwd(outDir)
write.table(foo, outName)
# Script 2.
# Give input and output names and dirs
dat1Dir <- "D:/Dropbox/ProjectA/output1/"
dat1In <- "fooOut1.asc"
outDir <- "D:/Dropbox/ProjectA/output2/"
outName <- "fooOut2.asc"
Etc. Each script reads and write data from/to file and subsequent scripts read the output of previous scripts. The question is: how can I ensure that the directory-strings remain valid after transfer to another user?
Let's say we copy the ProjectA folder, including subfolders, to another PC, where it is stored at, e.g., C:/Users/foo/my documents/. Ideally, I would have a function FindDir() that finds the location of the lowest common folder in the project, here "ProjectA", so that I can replace every directory string with:
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
So that:
# At my own PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "D:/Dropbox/ProjectA/source1/"
# At my colleagues PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "C:Users/foo/my documents/ProjectA/source1/"
Or perhaps there is a different way? Our work IT infrastructure currently does not allow using a shared disc. I'll put helper-functions in an 'official' R project (ie, hosted on R forge), but I'd like to use scripts when many I/O parameters are required and because the code can easily be viewed and commented.
Many thanks in advance!
You should be able to do this by using relative directory paths. This is what I do for my R projects that I have in Dropbox and that I edit/run on both my Windows and OS X machines where the Dropbox folder is D:/Dropbox and /Users/robin/Dropbox respectively.
To do this, you'll need to
Set the current working directory in R (either in the first line of your script, or interactively at the console before running), using setwd('/Users/robin/Dropbox;) (see the full docs for that command).
Change your paths to relative paths, which mean they just have the bit of the path from the current directory, in this case the 'ProjectA/source1' bit if you've set your current directory to your Dropbox folder, or just 'source1' if you've set your current directory to the ProjectA folder (which is a better idea).
Then everything should just work!
You may also be interested in an R library that I love called ProjectTemplate - it gives you really nice functionality for making self-contained projects for this sort of work in R, and they're entirely reproducible, moveable between computers and so on. I've written an introductory blog post which may be useful.

Resources