R download.file() rename the downloaded file, if the filename already exists - r

In R, I am trying to download files off the internet using the download.file() command in a simple code (am complete newbie). The files are downloading properly. However, if a file already exists in the download destination, I'd wish to rename the downloaded file with an increment, as against an overwrite which seems to be the default process.
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
nse.destfile = paste0(nse.folder,"fo04FEB2016bhav.csv.zip")
download.file(nse.url,nse.destfile,mode = "wb",method = "libcurl")
Problem w.r.t to this specific code: if "fo04FEB2016bhav.csv.zip" already exists, then get say "fo04FEB2016bhav.csv(2).zip"?
General answer to the problem (and not just the code mentioned above) would be appreciated as such a bottleneck could come up in any other situations too.

The function below will automatically assign the filename based on the file being downloaded. It will check the folder you are downloading to for the presence of a similarly named file. If it finds a match, it will add an incrementation and download to the new filename.
ekstroem's suggestion to fiddle with the curl settings is probably a much better approach, but I wasn't clever enough to figure out how to make that work.
download_without_overwrite <- function(url, folder)
{
filename <- basename(url)
base <- tools::file_path_sans_ext(filename)
ext <- tools::file_ext(filename)
file_exists <- grepl(base, list.files(folder), fixed = TRUE)
if (any(file_exists))
{
filename <- paste0(base, " (", sum(file_exists), ")", ".", ext)
}
download.file(url, file.path(folder, filename), mode = "wb", method = "libcurl")
}
download_without_overwrite(
url = "https://raw.githubusercontent.com/nutterb/redcapAPI/master/README.md",
folder = "[path_to_folder]")

Try this:
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
#Get file name from url, with file extention
fname.x <- gsub(".*/(.*)", "\\1", nse.url)
#Get file name from url, without file extention
fname <- gsub("(.*)\\.csv.*", "\\1", fname.x)
#Get xtention of file from url
xt <- gsub(".*(\\.csv.*)", "\\1", fname.x)
#How many times does the the file exist in folder
exist.times <- sum(grepl(fname, list.files(path = nse.folder)))
if(exist.times){
# if it does increment by 1
fname.x <- paste0(fname, "(", exist.times + 1, ")", xt)
}
nse.destfile = paste0(nse.folder, fname.x)
download.file(nse.url, nse.destfile, mode = "wb",method = "libcurl")
Issues
This approach will not work in cases where part of the file name already exists for example you have url/test.csv.zip and in the folder you have a file testABC1234blahblah.csv.zip. It will think the file already exists, so it will save it as test(2).csv.zip.
You will need to change the #How many times does the the file exist in folder part of the code accordingly.

This is not a proper answer and shouldn't be considered as such, but the comment section above was too small to write it all.
I thought the -O -n options to curl could be used to but now that I looked at it more closely it turned out that it wasn't implemented yet. Now wget automatically increment the filename when downloading a file that already exists. However, setting method="wget" doesn't work with download.file because you are forced to set the destination file name, and once you do that you overwrite the automatic file increments.
I like the solution that #Benjamin provided. Alternatively, you can use
system(paste0("wget ", nse.url))
to get the file through the system (provided that you have wget installed) and let wget handle the increment.

Related

R renaming file extension

I have tried looking at File extension renaming in R and using the script without any luck. My question is very much the same.
I have a bunch of files with the a file extension that I want to change. I have used the following code but cannot get the last step to work.
I know similar questions have been asked before but I'm simply stuck and therefore reaching out anyway.
startingDir<-"/Users/anders/Documents/Juni 2019/DATA"
endDir<-"/Users/anders/Documents/Juni 2019/DATA/formatted"
#List over files in startingDir with the extension .zipwblibcurl that I want to replace
old_files<-list.files(startingDir,pattern = "\\.zipwblibcurl")
#View(old_files)
#Renaming the file extension and making a new list i R changing the file extension from .zipwblibcurl to .zip
new_files <- gsub(".zipwblibcurl", ".zip", old_files)
#View(new_files)
#Replacing the old files in the startingDir. Eventually I would like to move them to newDir. For simplicity I have just tried as in the other post without any luck:...
file.rename( old_files, new_files)
After running file.rename I get the output FALSE for every entry.
The full answer here, including comment from #StephaneLaurent: make sure that you have full.names = TRUE inside the list.files(); otherwise the path to the file will not be captured, just the file name.
Full working snippet:
old = list.files(startingDir,
pattern = "\\.zipwblibcurl",
full.names = TRUE) #
# replace the file names
new <- gsub(".zipwblibcurl", ".zip", old )
# Rename old files names to the new file names
file.rename(old, new)
Like #StéphaneLaurent said, it's most likely that R tries to look in the current working directory for the files and can't find them. You can correct this by adding
file.rename(paste(startingDir, old_files, sep = "/"), paste(newDir, new_files, sep = "/"))

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

How to give specific file location as an argument to writeBin() method in R?

the file which is downloaded by writeBin() is saved at the current location. but I want to save it at another location either in sub-directories in the current location or somewhere else.
writeBin(downlaod$response$content, "Inventory.csv") This is line of code. suppose i want to save the inventory.csv at location "current_directory/download_folder".
or here, I am trying to download CSV file using the following script:
url <- "https://lgloz050.lss.emc.com:58443/APG/"
dn_url <- "https://lgloz050.lss.emc.com:58443/APG/lookup/Report%20Library/Amazon%20S3/Inventory/Accounts/report.csv"
session <- html_session(url)
form <- html_form(session)[[1]]
fl_fm <- set_values(form,
j_username = "***",
j_password = "***")
main_page <- submit_form(session, fl_fm)
downlaod <- jump_to(main_page,cfig$dn_url)
writeBin(downlaod$response$content, "Inventory.csv" )
Can I use writeBin() if not then, Is there any alternative method of writeBin() or some another way to download the CSV file from https site which requires a login?
Thanks in advance for suggestions!!!
You can access the help file of a function using ?writeBin or help(writeBin). In here you see that the argument you are looking for is con:
writeBin(object, con, size = NA_integer_,
endian = .Platform$endian, useBytes = FALSE)
con A connection object or a character string naming a file or a raw vector.
Now you can just supply any location to the con argument, pretty much like you already did:
writeBin(object = downlaod$response$content,
con = "./download_folder/Inventory.csv" )
The only thing you have to keep in mind is that R is expecting absolute paths here, which means your path will look like this:
/home/user/current_directory/download_folder/Inventory.csv
On a Linux machine or like this:
C:/user/Documents/current_directory/download_folder/Inventory.csv
On Windows.
You can also use paths relative to your current working directory (and I assumed you did above) by substituting current_directory/ with ./:
./download_folder/Inventory.csv
Or even go one folder up from your directory:
../current_directory/download_folder/Inventory.csv
Or two:
../../current_directory/download_folder/Inventory.csv

Data table fread with zip file in other directory with spaces in the name

I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))

How do I get the absolute path of an input file in R

I am using Rscript to plot some figures from a given CSV file in some directory, which is not necessarily my current working directory. I can call it as follows:
./script.r ../some_directory/inputfile.csv
Now I want to output my figures in the same directory (../some_directory), but I have no idea how to do that. I tried to get the absolute path for the input file because from this I could construct the output path, but I couldn't find out how to do that.
normalizePath() #Converts file paths to canonical user-understandable form
or
library(tools)
file_path_as_absolute()
The question is very old but it still misses a working solution. So here is my answer:
Use normalizePath(dirname(f)).
The example below list all the files and directories in the current directory.
dir <- "."
allFiles <- list.files(dir)
for(f in allFiles){
print(paste(normalizePath(dirname(f)), fsep = .Platform$file.sep, f, sep = ""))
}
Where:
normalizePath(dirname(f)) gives the absolute path of the parent directory. So the individual file names should be added to the path.
.Platform is used to have an OS-portable code. (here)
file.sep gives "the file separator used on your platform: "/" on both Unix-alikes and on Windows (but not on the former port to Classic Mac OS)." (here)
Warning: This may cause some problems if not used with caution. For instance, say this is the path: A/B/a_file and the working directory is now set to B. Then the code below:
dir <- "B"
allFiles <- list.files(dir)
for(f in allFiles){
print(paste(normalizePath(dirname(f)), fsep = .Platform$file.sep, f, sep = ""))
}
would give:
> A/a_file
however, it should be:
> A/B/a_file
Here the solution:
args = commandArgs(TRUE)
results_file = args[1]
output_path = dirname(normalizePath(results_file))
To get the absolute path(s) from file(s)
Why not combine the base R function file.path() with the answer that #Marius gave. This appears marginally simpler, will work with a vector of files (files), and take care of system specific separators:
file.path(normalizePath(dirname(files)), files)
And wrapped inside a function (abspath):
abspath <- function(files)file.path(normalizePath(dirname(files)), files)
For instance:
> setwd("~/test")
> list.files()
[1] "file1.txt" "file2.txt"
And then:
> abspath(files)
[1] "/home/myself/test/file1.txt" "/home/myself/test/file2.txt"
I see that people gave pieces of the solution, but not all of it.
I have used this:
outputFile = paste(normalizePath(dirname(inputFile)),"\\", "my_file.ext", sep = "")
Hope it helps.
fs::path_abs() is my preferred way. It avoids the backslashes of normalizePath().

Resources