Save multiple HDF files from an ftp list in R, giving a different name to each according to the name of the ftp link - r

I have a text list of HDF files I need to download from an ftp server.
This is the (example) structure of the list:
ftp://username:password#ftppath/File#1_hh_mm_ss.HDF
ftp://username:password#ftppath/File#2_hh_mm_ss.HDF
ftp://username:password#ftppath/File#3_hh_mm_ss.HDF
...
I tried to download a single file with this basic script:
url = "ftp://username:password#ftppath/File#3_hh_mm_ss.HDF"
download.file(url, destfile = "Test1.HDF")
What I would like to do is to download multiple files at once (i.e., the ones in the list above), and save them automatically, giving at each file the name of the file as it is in the ftp link (i.e., File#1_hh_mm_ss.HDF, File#2_hh_mm_ss.HDF, File#3_hh_mm_ss.HDF)
Is anyone able to help?
Thanks!
EDIT:
I noticed that the list of files that I need to download also includes different FTP URLs (i.e.:
'ftp://username1:password1#ftppath/File#1_hh_mm_ss.HDF',
'ftp://username1:password1#ftppath/File#2_hh_mm_ss.HDF',
'ftp://username2:password2#ftppath/File#3_hh_mm_ss.HDF',
'ftp://username2:password2#ftppath/File#4_hh_mm_ss.HDF',
'ftp://username3:password3#ftppath/File#5_hh_mm_ss.HDF',
'ftp://username3:password3#ftppath/File#6_hh_mm_ss.HDF',
...
This makes everything more complicated.
Would it be possible, instead, to download all of the files per each ftp URL?
For example (simplified):
ftp://username1:password1#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder)
ftp://username2:password2#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder )
ftp://username3:password3#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder )
...
Thanks a lot for your help!

you can use for loop, try this :
list_files= list(
"ftp://username:password#ftppath/File#1_hh_mm_ss.HDF",
"ftp://username:password2#ftppath/File#2_hh_mm_ss.HDF",
"ftp://username:password3#ftppath/File#3_hh_mm_ss.HDF",
"...")
for (i in 1:length(list_files))
{
file_name=basename(list_files[[i]])
#url=paste0(base_url,file_name)
destfile = paste0("path/to/created_folder/",file_name)
download.file(list_files[[i]], destfile = destfile)
}

Related

Is there a way of reading shapefiles directly into R from an online source?

I am trying to find a way of loading shapefiles (.shp) from an online repository/folder/url directly into my global environment in R, for the purpose of making plots in ggplot2 using geom_sf. In the first instance I'm using my Google Drive to store these files but I'd ideally like to find a solution that works with any folder with a valid url and appropriate access rights.
So far I have tried a few options, the first 2 involving zipping the source folder on Google Drive where the shapefiles are stored and then downloading and unzipping in some way. Have included reproducable examples using a small test shapefile:
Using utils::download.file() to retrieve the compressed folder and unzipping using either base::system('unzip..') or zip::unzip() (loosely following this thread: Downloading County Shapefile from ONS):
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Download the zipped file/folder
download.file("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing", destfile = "data/test_shp.zip")
# Unzip folder using unzip (fails)
unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Unzip folder using system (also fails)
system("unzip data/test_shp.zip")
If you can't run the above code then FYI the 2 error messages are:
Warning message:
In unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", :
error 1 in extracting from zip file
AND
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of data/test_shp.zip or
data/test_shp.zip.zip, and cannot find data/test_shp.zip.ZIP, period.
Worth noting here that I can't even manually unzip this folder outside R so I think there's something going wrong with the download.file() step.
Using the googledrive package:
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Specify googledrive url:
test_shp = drive_get(as_id("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing"))
# Download zipped folder
drive_download(test_shp, path = "data/test_shp.zip")
# Unzip folder
zip::unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Load test.shp
test_shp <- read_sf("data/test_shp/test.shp")
And that works!
...Except it's still a hacky workaround, which requires me to zip, download, unzip and then use a separate function (such as sf::read_sf or st_read) to read in the data into my global environment. And, as it's using the googledrive package it's only going to work for files stored in this system (not OneDrive, DropBox and other urls).
I've also tried sf::read_sf, st_read and fastshp::read.shp directly on the folder url but those approaches all fail as one might expect.
So, my question: is there a workflow for reading shapefiles stored online directly into R or should I stop looking? If there is not, but there is a way of expanding my above solution (2) beyond googledrive, I'd appreciate any tips on that too!
Note: I should also add that I have deliberately ignored any option requiring the package rgdal due to its imminient permanent retirement and so am looking for options that are at least somewhat future-proof (I understand all packages drop off the map at some point). Thanks in advance!
I ran into a similar problem recently, having to read in shapefiles directly from Dropbox into R.
As a result, this solution only applies for the case of Dropbox.
The first thing you will need to do is create a refreshable token for Dropbox using rdrop2, given recent changes from Dropbox that limit single token use to 4 hours. You can follow this SO post.
Once you have set up your refreshable token, identify all the files in your spatial data folder on Dropbox using:
shp_files_on_db<- drop_dir("Dropbox path/to your/spatial data/", dtoken = refreshable_token) %>%
filter(str_detect(name, "adm2"))
My 'spatial data' folder contained two sets of shapefiles – adm1 and adm 2. I used the above code to choose only those associated with adm2.
Then create a vector of the names of the shp, csv, shx, dbf, cpg files in the 'spatial data' folder, as follows:
shp_filenames<- shp_files_on_db$name
I choose to read in shapefiles into a temporary directory, avoiding the need to have to store the files on my disk – also useful in a Shiny implementation. I create this temporary directory as follows:
# create a new directory under tempdir
dir.create(dir1 <- file.path(tempdir(), "testdir"))
#If needed later on, you can delete this temporary directory
unlink(dir1, recursive = T)
#And test that it no longer exists
dir.exists(dir1)
Now download the Dropbox files to this temporary directory:
for (i in 1: length(shp_filenames)){
drop_download(paste0("Dropbox path/to your/spatial data/",shp_filenames[i]),
dtoken = refreshable_token,
local_path = dir1)
}
And finally, read in your shapefile as follows:
#path to the shapefile in the temporary directory
path1_shp<- paste0(dir1, "/myfile_adm2.shp")
#reading in the shapefile using the sf package - a recommended replacement for rgdal
shp1a <- st_read(path1_shp)

read multiple CSV files from a Google Drive folder and append then in a single one in R

This is how csv_file object created below looks like on the console panelI have several csv files in a google drive that I would like to append as one data frame witout downloading those files in my local computer.
Usually when I call multiple files from my local computer I used the following code, where list.files put all those csv files in a list and then map_df makes one dataframe from all those files in the list.
hourly.files <- list.files(path = "Folder_path_withCSV_files",
pattern = "*.csv",
full.names = T)%>%
map_df(~read_csv(., col_types = cols(.default = "c"))) #makes one dataframe
I want to do the same but in this case the files are many more and are in a shared google drive.
Using google drive:
folder_url <- "https://drive.google.com/folder/directory" #path to the files
folder <- drive_get(as_id(folder_url)) #folder id
csv_files <- drive_ls(folder, type = "csv") #makes a list of with all the csv files
Then, I tried to create a dataframe with the following code:
create.df <- map_df(~read_csv(csv_files$id, col_types = cols(.default = "c")))
but get this error below:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
As I said, I do not want to download those files in my local computer because there are too many and my collaborators will me modifying the csv files in the google folder constantly so downloading every time is something I want to avoid.
Thank you for any help.
I think you have a syntax error. Try -
library(tidyverse)
create.df <- map_df(csv_files$id, ~read_csv(., col_types = cols(.default = "c")))
If you want to read file directly from your google drive you should download Google Drive for desktop first and then go to your shared google drive folder and copy and paste the path into your code
This url you are using is for your browser like Chrome and will not work.
folder_url <- "https://drive.google.com/folder/directory" #path to the files
Open the shared google drive folder via google drive for desktop tool and use that url path in your code. It will work.

How to write streamlit UploadedFile to temporary directory with original filename?

Streamlit has a function that allows convenient upload of multiple files.
files = st.file_uploader('File upload', type=['txt'],accept_multiple_files=True)
Then files contains a list of UploadedFile objects which are ByteIO like. Though it is not clear how to get the filenames of the original files and write the file to a temporary directory. It is also not clear if that approach would conflict with the way streamlit operates. It basically reruns the underlying script every time an action is performed.
I am using some tools that read files based on their path given as a string. They are expected to be read from the hard drive.
You can access the name of the file with files[i].name and its content with files[i].read().
It looks like this in the end:
import os
import streamlit as st
files = st.file_uploader("File upload", type=["txt"], accept_multiple_files=True)
if len(files) == 0:
st.error("No file were uploaded")
for i in range(len(files)):
bytes_data = files[i].read() # read the content of the file in binary
print(files[i].name, bytes_data)
with open(os.path.join("/tmp", files[i].name), "wb") as f:
f.write(bytes_data) # write this content elsewhere

Read in data from same subfolder in different subfolders to R

I have multiple folders in my data file such that the files all have a common directory of "~/Desktop/Data/". Each file in the data folder is different such that
/Desktop
/Data
/File1/Data1/
/File2/Data1/
/File3/Data1/
The File folders are different but they all contain a data folder that is named the same. I have .dta files in each of the data subfolders that I would like to read into R
EDIT: I should also note the contents in the File folders to be:
../Filex
/Data1 -- What I want to read from
/Data2
/Data3
/Code
with /Filex/Data1 being the main folder of interest. All File folders are structured this way.
I have consulted multiple stack overflow feeds and so far only figured out how to list them all had all the File folders been the same. However, I am unsure as to how I can read the data into R if these File folders were named slightly differently.
I have tried this thus far, but I get an empty set in return
files <- dir("~/Desktop/Data/*/Data/", recursive=TRUE, full.names=TRUE, pattern="\\.dta$")
For actual data, downloading files from ICPSR might help in replicating the issue.
EDIT: I am working on MAC OSX 10.15.5
Thank you so much for your assistance!
Try
files <- dir("~/Desktop/Data",pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
# to make sure /Data is there, as suggestted by #Martin Gal:
files[grepl("Data/",files)]
This Regex tester and this Regex cheatsheet have been very useful to come to the solution.
Tested under Windows :
files <- dir('c:/temp',pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
files[grepl("Data/",files)]
[1] "c:/temp/File1/Data/test2.dta" "c:/temp/File2/Data/test.dta"

Use R to iteratively download all tiff files from shared Google Drive folder

I would like to use R to:
1) create a list of all tif files in a shared google drive folder
2) loop through list of files
3) save each file to local drive
I've tried RGoogleDocs and RGoogleData and both seem to have stopped development and neither support downloading tif files. There is also GoogleSheets, but again, it doesn't suit my needs. Does anyone know of a way to accomplish this task?
-cherrytree
Here's a part of my code (cannot share all of it) that gets a list of urls and makes a copy on hard drive:
if (Download == TRUE) {
urls = DataFrame$productimagepath
for (url in urls) {
newName <- paste ("Academy/",basename(url), sep =" ")
download.file(url, destfile = newName, mode = "wb")
}
}

Resources