Change data source from FTP server to local directory - r

I am working with AIMS model developed by APEC Climate center. The model downloads data from ftp server and then calls the LoadCmip5DataFromAdss function from datasource.R to load data into the model.
#do.call("LoadCmip5DataFromAdss", parameters)
On github I found the source code for LoadCmip5DataFromAdss which gives the path of an ftp server to download data
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
I want to get the data from a local directory instead of downloading because that takes a lot of time. How do I do that?
Where do I find the file containing LoadCmip5DataFromAdss on my PC, because in the setup only datasource.R is given.

All that function does is copy the zip file (cmip5_daily_ + whatever you specified for NtlCode + .zip) to the directory you specified for dbdir after it downloads it then unzips it and removes the ZIP file. Here's the whole function from rSQM:
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
srcfname <- paste(adss, fname, sep="")
dstfname <- paste(dbdir, "/", fname, sep = "")
download.file(srcfname, dstfname, mode = "wb")
unzip(dstfname, exdir = dbdir)
unlink(dstfname, force = T)
cat("CMIP5 scenario data at",NtlCode,"is successfully loaded.\n")
}
You can just do something like:
unzip(YOUR_LOCAL_NtlCode_ZIP_FILE, exdir = WHERE_YOUR_dbdir_IS)
vs use that function.

Related

R:Check existence of "today's" file and if it doesn't exist, download it

I'm trying to check to see if a user has an up-to-date file version of London's Covid prevalence data in their working directory and if not, download it from here:
fileURL <-"https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=region;areaName=London&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newCasesBySpecimenDateRollingSum%22:%22newCasesBySpecimenDateRollingSum%22,%22newCasesBySpecimenDateRollingRate%22:%22newCasesBySpecimenDateRollingRate%22%7D&format=csv"
Pasting that URL into the browser downloads the csv file. using download.file(URL, "data_.csv") creates junk. Why?
So far I have:
library(data.table)
#Look for COVID file starting with "data_"
destfile <- list.files(pattern = "data_")
#Find file date
fileDate<-file.info(destfile)$ctime %>%as.Date()
if(!file.exists(destfile) | fileDate != today()){
res <- tryCatch(download.file(url = fileURL,
destfile = paste0("data_",today(),".csv"),
method = "auto"),
error=function(e) 1)
if(res!=1) COVIDdata<-data.table::fread(destfile) #This doesn't read the file
}
The function always downloads a file regardless of the date on it but it saves it an unreadable format. I've resorted to downloading the file every time as follows.
COVIDdata <- data.table::fread(fileURL)
The junk file that gets downloaded is this:
I think this an issue with encoding the result of download.file, one way could be to use fread to get the data then write it with fwrite:
#Look for COVID file starting with "data_"
destfile <- list.files(pattern = "data_")
#Find file date
fileDate <- file.info(destfile)$ctime %>% as.Date()
#>[1] "2020-11-06" "2020-11-06" "2020-11-06"
if(!length(destfile) | max(fileDate) != today()){
COVIDdata <- fread(fileURL)
fwrite(COVIDdata, file = paste0("data_",today(),".csv"))
}

i am try to download multi file from a URL in R download.file() files are download but all are not opeing

download.file(mainurl,filename,method="curl") method try with “wget”, “curl”, "auto"
url <- "https://exporter.nih.gov/ExPORTER_Catalog.aspx"
linksu <- getURL(url)
links <- getHTMLLinks(linksu)
filenames <- links[str_detect(links, ".zip")]
filenames_list <- as.list(filenames)
downloadzip <- function (mainurl,filename) {
filedetails <- str_c(mainurl,filename)
print(filename)
require(downloader)
download.file(mainurl,filename,method="curl")
}
# save the files into the current Dir.
l_ply(filenames,downloadzip,mainurl = "https://exporter.nih.gov/")
#downloaded csv file this is works
download.file("https://exporter.nih.gov/CSVs/final/RePORTER_PRJ_C_FY2019_053.zip",destfile ="test.zip" )
download success full all files but all are corrupted
downloadzip <- function (mainurl,filename) {
filedetails <- str_c(mainurl,filename)
print(filedetails)
require(downloader)
download.file(filedetails,filename,method="curl")
}
** this working for me **

How to rename multiple files inside a loop in R

I have downloaded one photo of each deputy. In total, I have 513 photos (but I hosted a file with 271 photos). Each photo was named with the ID of the deputy. I want to change the name of photo to the deputy's name. This means that "66179.jpg" file would be named "norma-ayub.jpg".
I have a column with the IDs ("uri") and their names ("name_lower"). I tried to run the code with "destfile" of download.file(), but it receives only a string. I couldn't find out how to work with file.rename().
And rename_r_to_R changes only the file extension.
I am a beginner in working with R.
CSV file:
https://gist.github.com/gabrielacaesar/3648cd61a02a3e407bf29b7410b92cec
Photos:
https://github.com/gabrielacaesar/studyingR/blob/master/chamber-of-deputies-17jan2019-files.zip
(It's not necessary to download the ZIP file; When running the code below, you also get the photos, but it takes some time to download them)
deputados <- fread("dep-legislatura56-14jan2019.csv")
i <- 1
while(i <= 514) {
tryCatch({
url <- deputados$uri[i]
api_content <- rawToChar(GET(url)$content)
pessoa_info <- jsonlite::fromJSON(api_content)
pessoa_foto <- pessoa_info$dados$ultimoStatus$urlFoto
download.file(pessoa_foto, basename(pessoa_foto), mode = "wb")
Sys.sleep(0.5)
}, error = function(e) return(NULL)
)
i <- i + 1
}
I downloaded the files you provided and directly read them into R or unzipped them into a new folder respectivly:
df <- data.table::fread(
"https://gist.githubusercontent.com/gabrielacaesar/3648cd61a02a3e407bf29b7410b92cec/raw/1d682d8fcdefce40ff95dbe57b05fa83a9c5e723/chamber-of-deputies-17jan2019",
sep = ",",
header = TRUE)
download.file("https://github.com/gabrielacaesar/studyingR/raw/master/chamber-of-deputies-17jan2019-files.zip",
destfile = "temp.zip")
dir.create("photos")
unzip("temp.zip", exdir = "photos")
Then I use list.files to get the file names of all photos, match them with the dataset and rename the photos. This runs very fast and the last bit will report if renaming the file was succesful.
photos <- list.files(
path = "photos",
recursive = TRUE,
full.names = TRUE
)
for (p in photos) {
id <- basename(p)
id <- gsub(".jpg$", "", id)
name <- df$name_lower[match(id, basename(df$uri))]
fname <- paste0(dirname(p), "/", name, ".jpg")
file.rename(p, fname)
# optional
cat(
"renaming",
basename(p),
"to",
name,
"succesful:",
ifelse(success, "Yes", "No"),
"\n"
)
}

Add a suffix to filenames based on subfolder names within a directory in R

I have a number of (sub)folders stored within a directory folder. Each subfolder contains 5-35 .jpg aerial photograph files that are named by flightline name and number (ie: bej-3-83). I would like to add a suffix to each of these files based upon the subfolder they are stored upon. For example if 'bej-3-83' is stored within 'T13N_10W' subfolder I would like my R script to rename 'bej-3-83' as 'bej-3-83-T13N_10W' and so forth for each file stored within each subfolder.
I can partially accomplish this process albeit still with more manual input than I'd like using this script:
folder = "C:\\...\\T23N_R14W"
files <- list.files(folder,pattern = "\\.jpg$",full.names = T)
files
sapply(files,FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern="_clip", paste0("_T23N_R14W"),eachPath))
})
But as you can see this script uses a manual paste input of the subfolder name which isn't useful when you're trying to create a script that does what I need in one fell swoop.
I'm seeing similar questions and answers which utilize 'pushd' and 'popd' and I've attached to of those threads below as links. I'm trying to read as much as I can on these functions but so far the process to make it work has me stuck.
How to rename files in folders to foldername using batch file
Rename Files Based On Folder Name
Sincerely,
Henry
You might have to change the dir_separator to \ on windows:
make_filename <- function(file_path) {
s <- unlist(strsplit(file_path, dir_separator))
fname <- gsub('\\.jpg$', '', s[length(s)])
parent_dir <- s[(length(s) - 1)]
new_fname <- paste0(parent_dir, "_", fname, '.jpg')
path <- paste(s[-length(s)], collapse = dir_separator)
return(paste(path, new_fname, sep = dir_separator))
}
folder = './data'
dir_separator = '/'
files <- paste0(folder, dir_separator, list.files(folder, recursive = T))
sapply(files, function(x) file.rename(from = x, to = make_filename(x)))
A recursive approach.
Pass the path to the root folder containing your files and the extension of the files you want to rename, to rename_batch.
Defaults are working directory and jpeg.
library(stringr)
# An auxiliary function
rename_file <- function(str, extra){
file_name <- tools::file_path_sans_ext(str)
file_ext <- tools::file_ext(str)
return(paste0(file_name, '-', extra, '.', file_ext))
}
rename_batch <- function(path = "./",
extension = 'jpeg'){
# Separate files from folders
l <- list.files(path)
files <- l[grepl(paste0("\\." , extension), l)]
folders <- list.dirs(path, F, F)
present_folder <-
stringr::str_extract(path, '(?<=/)([^/]+)$')
# Check if there is a / at the end of path and removes it
# for consistency
path_len <- nchar(path)
last <- substr(path, path_len, path_len)
if (last == '/') {
path <- substr(path, 1, path_len - 1)
}
if (length(files) > 0) {
file_updtate <- paste0(path, '/', files)
file.rename(file_updtate, rename_file(file_updtate, present_folder))
}
if (length(folders) > 0) {
for (i in paste0(path, '/', folders)) {
cat('Renaming in:', i, '\n')
rename_batch(i)
}
}
}

Recursively ftp download, then extract gz files

I have a multiple-step file download process I would like to do within R. I have got the middle step, but not the first and third...
# STEP 1 Recursively find all the files at an ftp site
# ftp://prism.oregonstate.edu//pub/prism/pacisl/grids
all_paths <- #### a recursive listing of the ftp path contents??? ####
# STEP 2 Choose all the ones whose filename starts with "hi"
all_files <- sapply(sapply(strsplit(all_paths, "/"), rev), "[", 1)
hawaii_log <- substr(all_files, 1, 2) == "hi"
hi_paths <- all_paths[hawaii_log]
hi_files <- all_files[hawaii_log]
# STEP 3 Download & extract from gz format into a single directory
mapply(download.file, url = hi_paths, destfile = hi_files)
## and now how to extract from gz format?
For part 1, RCurl might be helpful. The getURL function retrieves one or more URLs; dirlistonly lists the contents of the directory without retrieving the file. The rest of the function creates the next level of url
library(RCurl)
getContent <- function(dirs) {
urls <- paste(dirs, "/", sep="")
fls <- strsplit(getURL(urls, dirlistonly=TRUE), "\r?\n")
ok <- sapply(fls, length) > 0
unlist(mapply(paste, urls[ok], fls[ok], sep="", SIMPLIFY=FALSE),
use.names=FALSE)
}
So starting with
dirs <- "ftp://prism.oregonstate.edu//pub/prism/pacisl/grids"
we can invoke this function and look for things that look like directories, continuing until done
fls <- character()
while (length(dirs)) {
message(length(dirs))
urls <- getContent(dirs)
isgz <- grepl("gz$", urls)
fls <- append(fls, urls[isgz])
dirs <- urls[!isgz]
}
we could then use getURL again, but this time on fls (or elements of fls, in a loop) to retrieve the actual files. Or maybe better open a url connection and use gzcon to decompress and process on the file. Along the lines of
con <- gzcon(url(fls[1], "r"))
meta <- readLines(con, 7)
data <- scan(con, integer())
I can read the contents of the ftp page if I start R with the internet2 option. I.e.
C:\Program Files\R\R-2.12\bin\x64\Rgui.exe --internet2
(The shortcut to start R on Windows can be modified to add the internet2 argument - right-click /Properties /Target, or just run that at the command line - and obvious on GNU/Linux).
The text on that page can be read like this:
download.file("ftp://prism.oregonstate.edu//pub/prism/pacisl/grids", "f.txt")
txt <- readLines("f.txt")
It's a little more work to parse out the Directory listings, then read them recursively for the underlying files.
## (something like)
dirlines <- txt[grep("Directory <A HREF=", txt)]
## split and extract text after "grids/"
split1 <- sapply(strsplit(dirlines, "grids/"), function(x) rev(x)[1])
## split and extract remaining text after "/"
sapply(strsplit(split1, "/"), function(x) x[1])
[1] "dem" "ppt" "tdmean" "tmax" "tmin"
It's about here that this stops seeming very attractive, and gets a bit laborious so I would actually recommend a different option. There would no doubt be a better solution perhaps with RCurl, and I would recommend learning to use and ftp client for you and your user. Command line ftp, anonymous logins, and mget all works pretty easily.
The internet2 option was explained for a similar ftp site here:
https://stat.ethz.ch/pipermail/r-help/2009-January/184647.html
ftp.root <- where are the files
dropbox.root <- where to put the files
#=====================================================================
# Function that downloads files from URL
#=====================================================================
fdownload <- function(sourcelink) {
targetlink <- paste(dropbox.root, substr(sourcelink, nchar(ftp.root)+1,
nchar(sourcelink)), sep = '')
# list of contents
filenames <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\n")
filenames <- unlist(filenames)
files <- filenames[grep('\\.', filenames)]
dirs <- setdiff(filenames, files)
if (length(dirs) != 0) {
dirs <- paste(sourcelink, dirs, '/', sep = '')
}
# files
for (filename in files) {
sourcefile <- paste(sourcelink, filename, sep = '')
targetfile <- paste(targetlink, filename, sep = '')
download.file(sourcefile, targetfile)
}
# subfolders
for (dirname in dirs) {
fdownload(dirname)
}
}

Resources