I'm a new R. user.
I am trying to download 7.000 files(.nc format) from ftp server ( which I got from user and password). On the website, each file is a link to download. I would like to download all the files (.nc).
I thank anyone who can help me how to run those jobs in R. Just an example what I have tried to do using Rcurl and a loop and informs me: cannot download all files.
library(RCurl)
url<- "ftp://ftp.my.link.fr/1234/"
userpwd <- userpwd="user:password"
destination <- "/Users/ME/Documents"
filenames <- getURL(url, userpwd="user:password",
ftp.use.epsv = FALSE, dirlistonly = TRUE)
for(i in seq_along(url)){
download.file(url[i], destination[i], mode="wb")
}
how can I do that?
The first thing you'd see is that the files in your directory, ie the object filenames, would be listed as one long string. To obtain an object of all file names as a character vector, you may try:
files <- unlist(strsplit(filenames, '\n'))
From here on, it's simply a matter of looping through all the files in the directory. I recommend you use the curl package, not Rcurl, to download the files, as it's easier to supply auth info for every download request.
library(curl)
h <- new_handle()
handle_setopt(h, userpwd = "user:pwd")
and then
lapply(files, function(filename){
curl_download(paste(url, filename, sep = ""), destfile = filename, handle = h)
})
Related
Want to download a csv-file from FTP-Server to R (it would be best to have the file as a dataframe in R).
Get an error-message when trying to download a csv-file from FTP-Server to R (which is local in my Mac).
url = "ftp://ftppath/www_logs/testfolder/"
download.file(URL,"test.csv", credentials = "xxx:yyyy")
Last query leads to:
Error in download.file(URL, "test.csv", credentials = "xxxyyy", :
unused arguments (credentials = "xxx:yyyy")
I think you get this error message because function download.file() has no argument named credentials.
I would try to pass the credentials as discussed here:
url = "ftp://username:password#ftppath/www_logs/testfolder/test.csv"
download.file(url, destfile = "test.csv")
If you want to load the file into an R data.frame, you could try something like this:
library(RCurl)
url <- "ftp://ftppath/www_logs/testfolder/test.csv"
text_data <- getURL(url, userpwd = "username:password", connecttimeout = 60)
df <- read.csv(text = text_data)
I have a list containing hundreds of URLs directly linking to .xlsx files for a download:
list <- c("https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=main.weeklyReport.Excel&web_report_id=980",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=main.weeklyReport.Excel&web_report_id=981",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=main.weeklyReport.Excel&web_report_id=990")
To download everything in the list, I created a loop:
for (url in list) {
download.file(url, destfile = "Rapex-Publication.xlsx", mode="wb")
}
However, it only downloads the first file and not the rest. My guess is that the program is overwriting the same destfile. What would I have to do to circumvent this issue?
Try something along the lines of:
for (i in 1:length(list)) {
download.file(list[i], destfile = paste0("Rapex-Publication-", i, ".xlsx"), mode="wb")
}
I am trying to use download.file to extract a zip file from a URL and then push all the data in each of the files into a MySQL database. I am getting stuck in the first step where I use download.file to extract the zip file
I have tried the following but to no avail
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='auto')
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
download.file(url=myURL,destfile=zippedFile, method='curl')
Please suggest where am I going wrong. Also some pointers on how to take one file at a time from the zip folder and push into a DB will be most helpful
What finally worked in AWS is the use of the package downloader
https://cran.r-project.org/web/packages/downloader/downloader.pdf
It has features to support https. Hope it will help someone
You can try this:
myURL = paste("https://onedrive.live.com/download.aspx?cid=D700ACC18C0F37E6&resid=D700ACC18C0F37E6%2118670&ithint=%2Ezip",sep = "")
dir = "zippedFile.zip"
download.file(myURL, dir, mode="wb")
destfile -- a character string with the name where the downloaded file
is saved. Tilde-expansion is performed.
In R, I am trying to download files off the internet using the download.file() command in a simple code (am complete newbie). The files are downloading properly. However, if a file already exists in the download destination, I'd wish to rename the downloaded file with an increment, as against an overwrite which seems to be the default process.
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
nse.destfile = paste0(nse.folder,"fo04FEB2016bhav.csv.zip")
download.file(nse.url,nse.destfile,mode = "wb",method = "libcurl")
Problem w.r.t to this specific code: if "fo04FEB2016bhav.csv.zip" already exists, then get say "fo04FEB2016bhav.csv(2).zip"?
General answer to the problem (and not just the code mentioned above) would be appreciated as such a bottleneck could come up in any other situations too.
The function below will automatically assign the filename based on the file being downloaded. It will check the folder you are downloading to for the presence of a similarly named file. If it finds a match, it will add an incrementation and download to the new filename.
ekstroem's suggestion to fiddle with the curl settings is probably a much better approach, but I wasn't clever enough to figure out how to make that work.
download_without_overwrite <- function(url, folder)
{
filename <- basename(url)
base <- tools::file_path_sans_ext(filename)
ext <- tools::file_ext(filename)
file_exists <- grepl(base, list.files(folder), fixed = TRUE)
if (any(file_exists))
{
filename <- paste0(base, " (", sum(file_exists), ")", ".", ext)
}
download.file(url, file.path(folder, filename), mode = "wb", method = "libcurl")
}
download_without_overwrite(
url = "https://raw.githubusercontent.com/nutterb/redcapAPI/master/README.md",
folder = "[path_to_folder]")
Try this:
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
#Get file name from url, with file extention
fname.x <- gsub(".*/(.*)", "\\1", nse.url)
#Get file name from url, without file extention
fname <- gsub("(.*)\\.csv.*", "\\1", fname.x)
#Get xtention of file from url
xt <- gsub(".*(\\.csv.*)", "\\1", fname.x)
#How many times does the the file exist in folder
exist.times <- sum(grepl(fname, list.files(path = nse.folder)))
if(exist.times){
# if it does increment by 1
fname.x <- paste0(fname, "(", exist.times + 1, ")", xt)
}
nse.destfile = paste0(nse.folder, fname.x)
download.file(nse.url, nse.destfile, mode = "wb",method = "libcurl")
Issues
This approach will not work in cases where part of the file name already exists for example you have url/test.csv.zip and in the folder you have a file testABC1234blahblah.csv.zip. It will think the file already exists, so it will save it as test(2).csv.zip.
You will need to change the #How many times does the the file exist in folder part of the code accordingly.
This is not a proper answer and shouldn't be considered as such, but the comment section above was too small to write it all.
I thought the -O -n options to curl could be used to but now that I looked at it more closely it turned out that it wasn't implemented yet. Now wget automatically increment the filename when downloading a file that already exists. However, setting method="wget" doesn't work with download.file because you are forced to set the destination file name, and once you do that you overwrite the automatic file increments.
I like the solution that #Benjamin provided. Alternatively, you can use
system(paste0("wget ", nse.url))
to get the file through the system (provided that you have wget installed) and let wget handle the increment.
I see that many examples for downloading binary files with RCurl are like such:
library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
"http://www.example.com/bfile.zip",
curl= curl,
progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)
If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory.
In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks.
Can you give a working example?
UPDATE
A user suggests using the R native download file with mode = 'wb' option for binary files.
In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists.
This is the working example:
library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f#ref)
close(f)
It will download straight to file. The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur).
Mention to CFILE is a bit terse on RCurl manual. Hopefully in the future it will include more details/examples.
For your convenience the same code is packaged as a function (and with a progress bar):
bdown=function(url, file){
library('RCurl')
f = CFILE(file, mode="wb")
a = curlPerform(url = url, writedata = f#ref, noprogress=FALSE)
close(f)
return(a)
}
## ...and now just give remote and local paths
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")
um.. use mode = 'wb' :) ..run this and follow along w/ my comments.
# create a temporary file and a temporary directory on your local disk
tf <- tempfile()
td <- tempdir()
# run the download file function, download as binary.. save the result to the temporary file
download.file(
"http://sourceforge.net/projects/peazip/files/4.8/peazip_portable-4.8.WINDOWS.zip/download",
tf ,
mode = 'wb'
)
# unzip the files to the temporary directory
files <- unzip( tf , exdir = td )
# here are your files
files