Portable method to download files within R script? - r

I currently use download.file function to download file, I run my scripts in Linux and so I use this command:
download.file(url="https://example.com/zipfile.zip", destfile= "zipfile.zip", method="wget")
This comand does not work in Windows because wget is not there by default. If I use method=auto I get this error, which means download.file does not support https protocol.
Error in download.file(url="https://example.com/zipfile.zip", destfile= "zipfile.zip", method="auto") : unsupported URL scheme
How do I make this file download part works on all major OSes: Windows, Linux, Mac?
From the accepted answer, I looked at the documentation of RCurl and followed this example:
u = "http://www.omegahat.org/RCurl/data.gz"
if(url.exists(u)) {
content = getBinaryURL(u)
if(require(Rcompression)) {
x = gunzip(content)
read.csv(textConnection(x))
} else {
tmp = tempfile()
write(content, file = tmp)
read.csv(gzfile(tmp))
}

1 Method curl or libcurl:
# try capabilities see if it is supported on your build.
capabilities("libcurl")
download.file(url="https://example.com/zipfile.zip", destfile= "zipfile.zip", method="libcurl")
2 Maybe Try RCurl package
library(RCurl)

Related

downloading csv file works via libcurl yet does not via curl method

OS: Win 7 64 bit
RStudio Version 1.1.463
As per Getting and Cleaning Data course, I attempted to download a csv file with method = curl:
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./cameras.csv", method = "curl")
Error in download.file(fileUrl, destfile = "./cameras.csv", method =
"curl") : 'curl' call had nonzero exit status
However, method = libcurl resulted a successful download:
download.file(fileUrl, destfile = "./cameras.csv", method = "libcurl")
trying URL
'https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD'
downloaded 9443 bytes
changing from *http***s** to http produced exactly the same results for curl and libcurl, respectively.
Is there anyway to make this download work via method = curl as per the course?
Thanks
As you can see from ?download.file:
For methods "wget" and "curl" a system call is made to the tool given
by method, and the respective program must be installed on your system
and be in the search path for executables. They will block all other
activity on the R process until they complete: this may make a GUI
unresponsive.
Therefore, you should install curlfirst.
See this How do I install and use curl on Windows? to learn how.
Best!
I believe there were a few issues here:
Followed the steps in the link quoted by #JonnyCrunch
a) Reinstalled Git for windows;
b) added C:\Program Files\Git\mingw64\bin\ to the 'PATH' variable;
c) Disabled Use Internet Explorer library/proxy for HTTP in RStudio in: Tools > Options > Packages
d) Attempted steps in 'e)' below and added data.baltimorecity.gov
website to exclusions as per Kaspersky anti-virus' prompt;
e) Then in RStudio:
options(download.file.method = "curl")
download.file(fileUrl, destfile="./data/cameras.csv")
Success!
Thank you

Reading an online xlsx file into R

I am trying to download spreadsheets from AQR data library into R directly.
I have this link: http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx which prompts a download. However, when trying the following code:
> url1<-"http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
> download.file(url1,destfile="example.xlsx")
I get this error
trying URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
Error in download.file(url1, destfile = "example.xlsx") : cannot open URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
https://www.aqr.com/library/data-sets/value-and-momentum-everywhere-portfolios-monthly is the page from which I am trying to download data(under full set data link).
Could you provide some guidance?
It looks like that link redirects to https, which download.file does not support by default. If you have wget or curl installed you can use
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "wget")
or
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "curl")
These and other options are discussed at Download a file from HTTPS using download.file()
I'm not quite sure what is causing the problem for you, but the following worked for me:
library(XLConnect)
##
con <- "http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
download.file(con,"xlsxFile.xlsx",mode="wb")
##
newWB <- loadWorkbook(
file="xlsxFile.xlsx",
create=F)
##
R> getSheets(newWB)
[1] "VME Portfolios" "Definitions" "Data Sources" "Disclosures"
and here's a screenshot of the downloaded file:

R Gist script gives error in RGui console but works fine in RStudio console - Windows 8 R3.1.2(64 bit) [duplicate]

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

Create an remote directory using SFTP / RCurl

Is it possible to create a directory on an SFTP site using the RCurl package? I found the sftp_create_dirs function, but I could not find an example how to use it.
I tried to set the ftp.create.missing.dirs option to TRUE, as in
library(RCurl)
opts <- list(ftp.create.missing.dirs=TRUE,
ssh.public.keyfile = mypubkey, ssh.private.keyfile = myprivatekey)
ftpUpload("myfile.txt", "sftp://me#www.host.org/newdir/myfile.txt", .opts=opts)
This works if newdir exists, but fails if it does not exists.
Any hint appreciated!

Sourcing R script over HTTPS

Is there some way to source an R script from the web?
e.g. source('http://github.com/project/R/file.r')
Reason: I currently have a project that I'd like to make available for use but isn't ready to be packaged yet. So it would be great to give people a single file to source from the web (that will then source all the individual function files).
On closer inspection, the problem appears to be https. How would I source this file?
https://raw.github.com/hadley/stringr/master/R/c.r
You can use the source_url in the devtools library
library(devtools)
source_url('https://raw.github.com/hadley/stringr/master/R/c.r')
This is a wrapper for the RCurl method by #ROLO
Yes you can, try running this R tutorial:
source("http://www.mayin.org/ajayshah/KB/R/tutorial.R")
(Source)
Https is only supported on Windows, when R is started with the --internet2 command line option (see FAQ):
> source("https://pastebin.com/raw.php?i=zdBYP5Ft")
> test()
[1] "passed"
Without this option, or on linux, you will get the error "unsupported URL scheme". In that case resort to the solution suggested by #ulidtko, or:
Here is a way to do it using RCurl, which also supports https:
library(RCurl)
eval( expr =
parse( text = getURL("http://www.mayin.org/ajayshah/KB/R/tutorial.R",
ssl.verifypeer=FALSE) ))
(You can remove the ssl.verifypeer if the ssl certificate is valid)
Yes, it is possible and worked for me right away.
R> source("http://pastebin.com/raw.php?i=zdBYP5Ft")
R> test()
[1] "passed"
Regarding the HTTPS part, it isn't supported by internal R code. However, R can use external utilities like wget or curl to fetch https:// URLs. One will need to write additional code to be able to source the files.
Sample code might be like this:
wget.and.source <- function(url) {
fname <- tempfile()
download.file(url, fname, method="wget")
source(fname)
unlink(fname)
}
There is a Windows-only solution too: start R with --internet2 commandline option. This will switch all the internet code in R to using IE, and consequently HTTPS will work.
Windows:
If Internet Explorer is configured to access the web using your organization's proxy, you can direct R to use these IE settings instead of the default R settings. This change can be made once by the following steps:
Save your work and close all R sessions you may have open.
Edit the following file. (Note: Your exact path will differ based on your R installation)
C:\Program Files\R\R-2.15.2\etc\Rprofile.site
Open this "Rprofile.site" file in Notepad and add the following line on a new line at the end of the file:
utils::setInternet2(TRUE)
You may now open a new R session and retry your "source" command.
Linux alikes:
Use G. Grothendieck's suggestion. At the command prompt within R type:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url/here.r")))
You may get an error saying:
cannot verify certificate - - - - Self-signed certificate encountered.
At this point it is up to you to decide whether you trust the person issuing the self-signed certificate and proceed or to stop.
If you decide to proceed, you can connect insecurely as follows:
source(pipe(paste("wget -O -", "https://github.com/enter/your/url.r", "--no-check-certificate")))
For more details, see the following:
See section 2.19
CRAN R Documentation 2.19
wget documentation section 2.8 for "no-check-certificate"
Similar questions here:
Stackoverflow setInternet2 discussion
Stackoverflow Proxy configuration discussion
The methods here were giving me the following error from github:
OpenSSL: error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)
I used the following function to resolve it:
github.download = function(url) {
fname <- tempfile()
system(sprintf("curl -3 %s > %s", url, fname))
return(fname)
}
source(github.download('http://github.com/project/R/file.r'))
Hope that helps!
This is working for me on windows:
library(RCurl)
# load functions and scripts from github ----------------------------
fn1 <- getURL("https://raw.githubusercontent.com/SanjitNarwekar/Advanced-R-Programming/master/fn_factorial_loop.R", ssl.verifypeer = FALSE)
eval(parse(text = fn1))

Resources