Import File from FTP to R - r

Want to download a csv-file from FTP-Server to R (it would be best to have the file as a dataframe in R).
Get an error-message when trying to download a csv-file from FTP-Server to R (which is local in my Mac).
url = "ftp://ftppath/www_logs/testfolder/"
download.file(URL,"test.csv", credentials = "xxx:yyyy")
Last query leads to:
Error in download.file(URL, "test.csv", credentials = "xxxyyy", :
unused arguments (credentials = "xxx:yyyy")

I think you get this error message because function download.file() has no argument named credentials.
I would try to pass the credentials as discussed here:
url = "ftp://username:password#ftppath/www_logs/testfolder/test.csv"
download.file(url, destfile = "test.csv")
If you want to load the file into an R data.frame, you could try something like this:
library(RCurl)
url <- "ftp://ftppath/www_logs/testfolder/test.csv"
text_data <- getURL(url, userpwd = "username:password", connecttimeout = 60)
df <- read.csv(text = text_data)

Related

Downloading files in r

I'm trying to download a spreadsheet from the Australian Bureau of Statistics using download.file. But I'm getting a corrupted file back and when I go to open it using readxl my session is crashing.
target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
dest = 'downloaded_file.xlsx'
download.file(url = target, destfile = dest)
Any pointers would be great.
Looks like that file is an xls file not using the newer xlsx format. Remove the 'x' at the end of the filename so readxl knows to use the right format. Note also that I'm pretty sure xls is a binary format, so you should use binary mode to write the file.
target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
dest = 'downloaded_file.xls'
download.file(url = target, destfile = dest, mode='wb')

Can't read csv into Spark using spark_read_csv()

I'm trying to use sparklyr to read a csv file into R. I can read the .csv into R just fine using read.csv(), but when I try to use spark_read_csv() it breaks down.
accidents <- spark_read_csv(sc, name = 'accidents', path = '/home/rstudio/R/Shiny/accident_all.csv')
However, when I attempt to execute this code I receive the following error:
Error in as.hexmode(xx) : 'x' cannot be coerced to class "hexmode"
I haven't found much by Googling that error. Can anyone shed some light onto what is going on here?
Yes, local .csv files can be read easily in Spark Data frame using spark_read_csv(). I have a .csv file in Documents directory and I have read it using the following code snippet. I thing there is no need to use file:// prefix. Below is the snippet:
Sys.setenv(SPARK_HOME = "C:/Spark/spark-2.0.1-bin-hadoop2.7/")
library(SparkR, lib.loc = "C:/Spark/spark-2.0.1-bin-hadoop2.7/R/lib")
library(sparklyr)
library(dplyr)
library(data.table)
library(dtplyr)
sc <- spark_connect(master = "local", spark_home = "C:/Spark/spark-2.0.1-bin-hadoop2.7/", version = "2.0.1")
Credit_tbl <- spark_read_csv(sc, name = "credit_data", path = "C:/Users/USER_NAME/Documents/Credit.csv", header = TRUE, delimiter = ",")
You can see the dataframe just by calling the object name Credit_tbl.

R Downloading multiple file from FTP using Rcurl

I'm a new R. user.
I am trying to download 7.000 files(.nc format) from ftp server ( which I got from user and password). On the website, each file is a link to download. I would like to download all the files (.nc).
I thank anyone who can help me how to run those jobs in R. Just an example what I have tried to do using Rcurl and a loop and informs me: cannot download all files.
library(RCurl)
url<- "ftp://ftp.my.link.fr/1234/"
userpwd <- userpwd="user:password"
destination <- "/Users/ME/Documents"
filenames <- getURL(url, userpwd="user:password",
ftp.use.epsv = FALSE, dirlistonly = TRUE)
for(i in seq_along(url)){
download.file(url[i], destination[i], mode="wb")
}
how can I do that?
The first thing you'd see is that the files in your directory, ie the object filenames, would be listed as one long string. To obtain an object of all file names as a character vector, you may try:
files <- unlist(strsplit(filenames, '\n'))
From here on, it's simply a matter of looping through all the files in the directory. I recommend you use the curl package, not Rcurl, to download the files, as it's easier to supply auth info for every download request.
library(curl)
h <- new_handle()
handle_setopt(h, userpwd = "user:pwd")
and then
lapply(files, function(filename){
curl_download(paste(url, filename, sep = ""), destfile = filename, handle = h)
})

minimize code for base64 decoding of rds object

I have R object (saved as .rds file) in base64-encoded string:
encoded <- "H4sIAAAAAAAABoVQywrCMBBc8zi0ICiCPyHme7yVrU0hkEZIA/XofwvWTUwQc/Ewy2SyOzvJpQUABpwxYJwoP1DZEHYECQKaeD5nwnHqa+1LTgCCpfGPIB1Oes5eReSD8VVf42+LKr3bmOdBZV3XZ214tTgXQ5bFdsCAKmBv9Y8yenKsDPbKuKC9Q6tmbUevRxKPGa+MP2mlcYO+0/6YFOrLrksT6RkyI36nSInJsCx6A7sXoh15AQAA"
I need to load this object in R. Following this SO question ("Base64 encoding a .Rda file"), I came to the following code:
library("base64enc")
conb64 <- file('obj.b64', 'w+b')
write(encoded, conb64);
close(conb64)
base64decode(file='obj.b64', output = 'obj.rds')
myobj <- readRDS('obj.rds')
This works fine but I would like to minimize the code and ideally manage without creating disk files, something like myobj <- readRDS(base64decode(encoded)). Is there any way to remove at least some operations?
It seems for me that there's a bug in base64enc package. Can be reproduced by simple executing base64decode(what='anything', output = 'any.name') - gives an error:
Error in base64decode(what="anything", output = "any.name") :
argument "file" is missing, with no default
apparently it happens because base64decode() use file as an argument but also calls a file() function. When I changed the source of the function (replaced file to filename), everything worked correctly and code decoded <- base64decode(encoded, what = 'raw') gives correct binary rds file. While the bug is not corrected, one can use the function of the same name from caTools package: decoded <- base64decode(z = encoded, what = 'raw'). However, I failed to feed this to readRDS() function:
library('caTools')
decoded <- base64decode(encoded, what = 'raw')
con1 <- rawConnection(object = dec, open = 'rb')
myobj <- readRDS(con1)
# Error in readRDS(con1) : unknown input format

trying to use fread() on .csv file but getting internal error "ch>eof"

I am getting an error from fread:
Internal error: ch>eof when detecting eol
when trying to read a csv file downloaded from an https server, using R 3.2.0. I found something related on Github, https://github.com/Rdatatable/data.table/blob/master/src/fread.c, but don't know how I could use this, if at all. Thanks for any help.
Added info: the data was downloaded from here:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
then I used
download.file(fileURL, "Idaho2006.csv", method = "Internal")
The problem is that download.file doesn't work with https with method=internal unless you're on Windows and set an option. Since fread uses download.file when you pass it a URL and not a local file, it'll fail. You have to download the file manually then open it from a local file.
If you're on Linux or have either of the following already then do method=wget or method=curl instead
If you're on Windows and don't have either and don't want to download them then do setInternet2(use = TRUE) before your download.file
http://www.inside-r.org/r-doc/utils/setInternet2
For example:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
download.file(fileURL, tempf, method = "curl")
DT <- fread(tempf)
unlink(tempf)
Or
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
setInternet2 = TRUE
download.file(fileURL, tempf)
DT <- fread(tempf)
unlink(tempf)
fread() now utilises curl package for downloading files. And this seems to work just fine atm:
require(data.table) # v1.9.6+
fread(fileURL, showProgress = FALSE)
The easiest way to fix this problem in my experience is to just remove the s from https. Also remove the method you don't need it. My OS is Windows and i have tried the following code and works.
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileURL, "Idaho2006.csv")

Resources