I had problems to download zipped files from a ftp server. But now I have solved the problem and because I haven't found any solution to my problem here, I'm sharing my approach.
First I tried it with
download.file()
But there was the problem that my password was ending with an "#". That's why the solution with submittign user and password within the URL wasn't working. The double # was apparently confusing R.
url <- ftp://user:password##url
You'll find the solution below.
Maybe someone has some improvements.
Maybe for someone it's usefull,
Florian
Here is my solution:
library(RCurl)
url<- "ftp://adress/"
filenames <- getURL(url, userpwd="USER:PASSWORD", ftp.use.epsv = FALSE, dirlistonly = TRUE) #reading filenames from ftp-server
destnames <- filenames <- strsplit(filenames, "\r*\n")[[1]] # destfiles = origin file names
con <- getCurlHandle( ftp.use.epsv = FALSE, userpwd="USER:PASSWORD")
mapply(function(x,y) writeBin(getBinaryURL(x, curl = con, dirlistonly = FALSE), y), x = filenames, y = paste("C:\\temp\\",destnames, sep = "")) #writing all zipped files in one directory
Hopefully for anybody it's usefull!
Regards,
Florian
If you have no particular reason to stay with Rcurl, you can use this bash-based method:
URL <- "ftp.server.ca"
USR <- "aUserName"
MDP <- "myPassword"
OUT <- "output.file"
cmd <- paste("wget -m --ftp-user=",USR," --ftp-password=",MDP, " ftp://", URL," -O ", OUT, sep="")
system(cmd)
Related
How to download single file from specific branch of GitHub private repo using R?
It can be easily done for default branch, e.g.:
require(httr)
github_path = "https://api.github.com/repos/{user}/{repo}/contents/{path_to}/{file}"
github_pat = Sys.getenv("GITHUB_PAT"))
req <- content(GET(github_path,
add_headers(Authorization = paste("token", github_pat))), as = "parsed")
tmp <- tempfile()
r1 <- GET(req$download_url, write_disk(tmp))
...but I can't figure out how to do that for specific branch.
Tried to include branch name in github_path but it didn't work (Error in handle_url(handle, url, ...)).
Since it is easy with classic curl, e.g.:
curl -s -O https://{PAT}#raw.githubusercontent.com/{user}/{repo}/{branch}/{path_to}/{file}
...I tried to do it like:
tmp <- tempfile()
curl::curl_download("https://{PAT}#raw.githubusercontent.com/{user}/{repo}/{branch}/{path_to}/{file}", tmp)
But it didn't work as well.
What am I missing?
Thanks!
You can use curl in R like this to include the auth header and the path to the desired file:
library(curl)
h <- new_handle(verbose = TRUE)
handle_setheaders(h,
"Authorization" = "token ghp_XXXXXXX"
)
con <- curl("https://raw.githubusercontent.com/username/repo/branch/path/file.R", handle = h)
readLines(con)
I've had the following attempts.
This first one results in a downloaded file in an obscure location on my hard drive, but it is damaged and unable to open.
temp <- tempfile(fileext = ".zip")
dl <- drive_download(
as_id("1yNq-CgafF-4gmi96jOII_DgNxxJkDYRB"), path = temp, overwrite = TRUE)
out <- unzip(temp, exdir = tempdir())
bank <- read.csv(out[14], sep = ",")
This next attempt has an issue with file downloading into my R environment.
temp <- tempfile(fileext = ".zip")
download.file("https://drive.google.com/uc?authuser=0&id=1yNq-CgafF-4gmi96jOII_DgNxxJkDYRB&export=download",
temp)
out <- unzip(temp, exdir = tempdir())
camp_data <- read.csv(out[14], sep = ";")
str(camp_data)
I have also tried using the googledrive library but have had no luck in accessing the file due to little documentation on the matter (at least in depth). This results in the inability to access the shared drive.
camp = shared_drive_get("Campbell")
drive_get(c("Campbell Data.csv", "Campbell Data"), shared_drive = camp)
Any tips are appreciated, thank you in advance.
You could try this:
First add your google file id as an variable like this:
id <- "add_here_your_google_file_id"
After that you can use the following code to read the csv:
library(utils)
read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download", id))
I have a a number of files named
FileA2014-03-05-10-24-12
FileB2014-03-06-10-25-12
Where the part "2014-03-05-10-24-12" means "Year/Day/Month/Hours/Minutes/Seconds/". These files reside on a ftp-server. I would like to use R to connect to the ftp-server and download whatever file is newest based on date.
I have started trying to list the content, using RCurl and dirlistonly. Next step will be to try to parse and find the newest file. Not quite there yet...
library(RCurl)
getURL("ftpserver/",verbose=TRUE,dirlistonly = TRUE)
This should work
library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)
-
times<-lapply(strsplit(filenames,"[-.]"),function(x){
time<-paste(c(substr(x[1], nchar(x[1])-3, nchar(x[1])),x[2:6]),
collapse="-")
time<-as.POSIXct(time, "%Y-%m-%d-%H-%M-%S", tz="GMT")
})
ind <- which.max(times)
dat <- try(getURL(paste(url,filenames[ind],sep=""), userpwd = userpwd))
So datis now containing the newest file
To make it reproduceable: all others can use this instead of the upper part use
filenames<-c("FileA2014-03-05-10-24-12.csv","FileB2014-03-06-10-25-12.csv")
I would like to download and install pandoc on a windows 7 machine, by running a command in R. Is that possible?
(I know I can do this manually, but when I'd show this to students - the more steps I can organize within an R code chunk - the better)
What about simply downloading the most recent version of the installer and starting that from R:
a) Identify the most recent version of Pandoc and grab the URL with the help of the XML package:
library(XML)
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pagetree <- htmlTreeParse(page, error=function(...){}, useInternalNodes = TRUE, encoding='UTF-8')
url <- xpathSApply(pagetree, '//tr[2]//td[1]//a ', xmlAttrs)[1]
url <- paste('http', url, sep = ':')
b) Or apply some regexp magic thanks to #G.Grothendieck instead (no need for the XML package this way):
page <- readLines('http://code.google.com/p/pandoc/downloads/list', warn = FALSE)
pat <- "//pandoc.googlecode.com/files/pandoc-[0-9.]+-setup.exe"
line <- grep(pat, page, value = TRUE); m <- regexpr(pat, line)
url <- paste('http', regmatches(line, m), sep = ':')
c) Or simply check the most recent version manually if you'd feel like that:
url <- 'http://pandoc.googlecode.com/files/pandoc-1.10.1-setup.exe'
Download the file as binary:
t <- tempfile(fileext = '.exe')
download.file(url, t, mode = 'wb')
And simply run it from R:
system(t)
Remove the needless file after installation:
unlink(t)
PS: sorry, only tested on Windows XP
I have a multiple-step file download process I would like to do within R. I have got the middle step, but not the first and third...
# STEP 1 Recursively find all the files at an ftp site
# ftp://prism.oregonstate.edu//pub/prism/pacisl/grids
all_paths <- #### a recursive listing of the ftp path contents??? ####
# STEP 2 Choose all the ones whose filename starts with "hi"
all_files <- sapply(sapply(strsplit(all_paths, "/"), rev), "[", 1)
hawaii_log <- substr(all_files, 1, 2) == "hi"
hi_paths <- all_paths[hawaii_log]
hi_files <- all_files[hawaii_log]
# STEP 3 Download & extract from gz format into a single directory
mapply(download.file, url = hi_paths, destfile = hi_files)
## and now how to extract from gz format?
For part 1, RCurl might be helpful. The getURL function retrieves one or more URLs; dirlistonly lists the contents of the directory without retrieving the file. The rest of the function creates the next level of url
library(RCurl)
getContent <- function(dirs) {
urls <- paste(dirs, "/", sep="")
fls <- strsplit(getURL(urls, dirlistonly=TRUE), "\r?\n")
ok <- sapply(fls, length) > 0
unlist(mapply(paste, urls[ok], fls[ok], sep="", SIMPLIFY=FALSE),
use.names=FALSE)
}
So starting with
dirs <- "ftp://prism.oregonstate.edu//pub/prism/pacisl/grids"
we can invoke this function and look for things that look like directories, continuing until done
fls <- character()
while (length(dirs)) {
message(length(dirs))
urls <- getContent(dirs)
isgz <- grepl("gz$", urls)
fls <- append(fls, urls[isgz])
dirs <- urls[!isgz]
}
we could then use getURL again, but this time on fls (or elements of fls, in a loop) to retrieve the actual files. Or maybe better open a url connection and use gzcon to decompress and process on the file. Along the lines of
con <- gzcon(url(fls[1], "r"))
meta <- readLines(con, 7)
data <- scan(con, integer())
I can read the contents of the ftp page if I start R with the internet2 option. I.e.
C:\Program Files\R\R-2.12\bin\x64\Rgui.exe --internet2
(The shortcut to start R on Windows can be modified to add the internet2 argument - right-click /Properties /Target, or just run that at the command line - and obvious on GNU/Linux).
The text on that page can be read like this:
download.file("ftp://prism.oregonstate.edu//pub/prism/pacisl/grids", "f.txt")
txt <- readLines("f.txt")
It's a little more work to parse out the Directory listings, then read them recursively for the underlying files.
## (something like)
dirlines <- txt[grep("Directory <A HREF=", txt)]
## split and extract text after "grids/"
split1 <- sapply(strsplit(dirlines, "grids/"), function(x) rev(x)[1])
## split and extract remaining text after "/"
sapply(strsplit(split1, "/"), function(x) x[1])
[1] "dem" "ppt" "tdmean" "tmax" "tmin"
It's about here that this stops seeming very attractive, and gets a bit laborious so I would actually recommend a different option. There would no doubt be a better solution perhaps with RCurl, and I would recommend learning to use and ftp client for you and your user. Command line ftp, anonymous logins, and mget all works pretty easily.
The internet2 option was explained for a similar ftp site here:
https://stat.ethz.ch/pipermail/r-help/2009-January/184647.html
ftp.root <- where are the files
dropbox.root <- where to put the files
#=====================================================================
# Function that downloads files from URL
#=====================================================================
fdownload <- function(sourcelink) {
targetlink <- paste(dropbox.root, substr(sourcelink, nchar(ftp.root)+1,
nchar(sourcelink)), sep = '')
# list of contents
filenames <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\n")
filenames <- unlist(filenames)
files <- filenames[grep('\\.', filenames)]
dirs <- setdiff(filenames, files)
if (length(dirs) != 0) {
dirs <- paste(sourcelink, dirs, '/', sep = '')
}
# files
for (filename in files) {
sourcefile <- paste(sourcelink, filename, sep = '')
targetfile <- paste(targetlink, filename, sep = '')
download.file(sourcefile, targetfile)
}
# subfolders
for (dirname in dirs) {
fdownload(dirname)
}
}