Download json files from Microsoft Azure Storage Explorer to R - r

I'm triying to download data from my blob container.
First of all, I've used function 'blob_container' to generate a blob container object as follows:
cont<blob_container('https://AccountName.blob.core.windows.net/BlobContainer',key='AccountKey')
Immediately, I've created a data frame to identify properly the path for each file.
list_files_blob<-list_blobs(cont, dir = "path where files are located")
Once I've collected all information,I've used 'multidownload_blob' function to copy that files to local path for local saving.
multidownload_blob(cont,src = list_files_blob$name ,dest = 'path to copy files',overwrite = T)
But I get this error.
Error: 'dest' must contain one name per file in 'src'
I know that there is a lot of files to trasnfer but I don't want to create a directory for each file but unique folder for all of them.
All functions are from AzureStor package.
My R version is 4.1.2 (2021-11-01)
"AzureStor": {
"Package": "AzureStor",
"Version": "3.7.0",
"Source": "Repository",
"Repository": "CRAN"
}
Thank you in advance.
Borja

Finally,I've found an answer.
First of all,I've removed dest parameter from multidownload_blob function.
So, I've put all json files in the same folder as the working directory.
Once I've all the available information, I've used the following expression to create a new data frame (df)which is containing all the json files.
df<-fs::dir_ls(path=getwd(),regexp = "json")%>%
map_df(fromJSON, .id = "ID",flatten=T)
I hope this works for all of us.
Borja

Related

Is there a way of reading shapefiles directly into R from an online source?

I am trying to find a way of loading shapefiles (.shp) from an online repository/folder/url directly into my global environment in R, for the purpose of making plots in ggplot2 using geom_sf. In the first instance I'm using my Google Drive to store these files but I'd ideally like to find a solution that works with any folder with a valid url and appropriate access rights.
So far I have tried a few options, the first 2 involving zipping the source folder on Google Drive where the shapefiles are stored and then downloading and unzipping in some way. Have included reproducable examples using a small test shapefile:
Using utils::download.file() to retrieve the compressed folder and unzipping using either base::system('unzip..') or zip::unzip() (loosely following this thread: Downloading County Shapefile from ONS):
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Download the zipped file/folder
download.file("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing", destfile = "data/test_shp.zip")
# Unzip folder using unzip (fails)
unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Unzip folder using system (also fails)
system("unzip data/test_shp.zip")
If you can't run the above code then FYI the 2 error messages are:
Warning message:
In unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", :
error 1 in extracting from zip file
AND
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of data/test_shp.zip or
data/test_shp.zip.zip, and cannot find data/test_shp.zip.ZIP, period.
Worth noting here that I can't even manually unzip this folder outside R so I think there's something going wrong with the download.file() step.
Using the googledrive package:
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Specify googledrive url:
test_shp = drive_get(as_id("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing"))
# Download zipped folder
drive_download(test_shp, path = "data/test_shp.zip")
# Unzip folder
zip::unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Load test.shp
test_shp <- read_sf("data/test_shp/test.shp")
And that works!
...Except it's still a hacky workaround, which requires me to zip, download, unzip and then use a separate function (such as sf::read_sf or st_read) to read in the data into my global environment. And, as it's using the googledrive package it's only going to work for files stored in this system (not OneDrive, DropBox and other urls).
I've also tried sf::read_sf, st_read and fastshp::read.shp directly on the folder url but those approaches all fail as one might expect.
So, my question: is there a workflow for reading shapefiles stored online directly into R or should I stop looking? If there is not, but there is a way of expanding my above solution (2) beyond googledrive, I'd appreciate any tips on that too!
Note: I should also add that I have deliberately ignored any option requiring the package rgdal due to its imminient permanent retirement and so am looking for options that are at least somewhat future-proof (I understand all packages drop off the map at some point). Thanks in advance!
I ran into a similar problem recently, having to read in shapefiles directly from Dropbox into R.
As a result, this solution only applies for the case of Dropbox.
The first thing you will need to do is create a refreshable token for Dropbox using rdrop2, given recent changes from Dropbox that limit single token use to 4 hours. You can follow this SO post.
Once you have set up your refreshable token, identify all the files in your spatial data folder on Dropbox using:
shp_files_on_db<- drop_dir("Dropbox path/to your/spatial data/", dtoken = refreshable_token) %>%
filter(str_detect(name, "adm2"))
My 'spatial data' folder contained two sets of shapefiles – adm1 and adm 2. I used the above code to choose only those associated with adm2.
Then create a vector of the names of the shp, csv, shx, dbf, cpg files in the 'spatial data' folder, as follows:
shp_filenames<- shp_files_on_db$name
I choose to read in shapefiles into a temporary directory, avoiding the need to have to store the files on my disk – also useful in a Shiny implementation. I create this temporary directory as follows:
# create a new directory under tempdir
dir.create(dir1 <- file.path(tempdir(), "testdir"))
#If needed later on, you can delete this temporary directory
unlink(dir1, recursive = T)
#And test that it no longer exists
dir.exists(dir1)
Now download the Dropbox files to this temporary directory:
for (i in 1: length(shp_filenames)){
drop_download(paste0("Dropbox path/to your/spatial data/",shp_filenames[i]),
dtoken = refreshable_token,
local_path = dir1)
}
And finally, read in your shapefile as follows:
#path to the shapefile in the temporary directory
path1_shp<- paste0(dir1, "/myfile_adm2.shp")
#reading in the shapefile using the sf package - a recommended replacement for rgdal
shp1a <- st_read(path1_shp)

Attempting to access images using R

So I am following the guide here which indicates the way to access photos is as follows:
flags <- c(
system.file("img", "flag", "au.png", package = "ggpattern"),
system.file("img", "flag", "dk.png", package = "ggpattern")
)
My goal is to now use this code for my own uses, so I saved a few images in a folder. Here is my directory:
"C:/Users/Thom/Docs/Misc/Testy"
And within the Testy folder, there is a folder called image, holding 3 images. But the following doesn't seem to work and idk why...
images <- c(
system.file("image", "image1.png", package = "ggpattern"),
system.file("image", "image2.png", package = "ggpattern")
)
system.file is for use when a file included in a package. Basically, it will look for the file starting its search path to where your R packages are installed (because this can vary from user to user). system.file will return the resolved path to the file locally
If you already know the absolute path on your local computer (i.e. "C:/Users/Thom/Docs/Misc/Testy") you can use that as just the input to a read function, e.g. readBin("C:/Users/Thom/Docs/Misc/Testy")
If you want to get a little fancy or are like me and can't ever remember which direction of a slash to use on which OS, you can also do something like this which will add in the OS specific path separator:
readBin(file.path("C:", "Users", "Thom", "Docs", "Misc", "Testy"))

Filepath error when specifying output location for write_dta

I would like to programmatically specify the location of my exports when using write.dta. I have my working directory set to a parent folder and my script is in a child folder called "Script". I want the export to be in a child folder called "Data".
setwd("~/Dropbox/Files")
file_output <- "survey"
path_out <- "./Data"
write_dta(df, paste0(file_output,".dta"), path = path_out, version = 12)
However, I keep getting an error message when R is trying to write. It says it's trying to write to the "Script" folder (where my script file is located in) rather than the desired "Data" folder.
Error: Failed to open '/Users/VancityPlanner/Dropbox/Files/Scripts' for writing
If I put the full path, I still get the same error message, whether it's a child folder or the parent folder (working directory) itself, so I don't think write permissions are an issue.
If I try not specifying the filepath, I have no error messages but it saves it to my working directory, which is not where I want it.
write_dta(df, paste0(file_output,".dta"), version = 12)
Below I show where my working directory is pointing and then I change the path of where I want to save the document in the path statement. Note it has the file path G:/ and the dataset name appended all together. I have a PC but no reason why this shouldn't work on a mac.
library(haven)
getwd()
#"C:/Users/myname/Documents"
write_dta(data = mtcars, path = "G:/mtcars.dta", version = 12)

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

Using 'R' and "aws.s3" how to push a directory to the cloud

I have a directory with subdirectories and many files that need to be pushed to Amazon S3. I am using the 'R' tool.
Is there a clean/easy way to say "push this directory and everything in it up to S3"? I am hoping to avoid pushing things up one at a time, and manually re-building the directory structures.
If you pass file names to put_object() using their full path names and then use those path names as their object keys, then you can implicitly recreate a directory structure. Basically like this (though you may want to change the filenames when using them as object keys in some way):
library("aws.s3")
lapply(dir(full.names = TRUE, recursive = TRUE), function(filename) {
put_object(file = filename, object = filename, bucket = "mybucket")
})
There is also an experimental function s3sync() that should do this for a complete file tree (but it isn't widely tested):
s3sync()

Resources