R: download data securely using TLS/SSL - r

Official Statements
In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:
All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...
The download.file() help still says:
Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.
which (by the way includes cookies and headers management).
Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:
getURL(url, cainfo="CA bundle")
where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem
Current status
Tests are based on Windows platforms
For many HTTPS websites download.file() works as stated:
download.file(url="https://www.google.com", destfile="google.html")
download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:
library(RCurl)
getURL("https://www.google.com", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# SSL certificate problem: unable to get local issuer certificate
In this instance, simply removing the reference to the certificate bundle solves the problem:
getURL("https://www.google.com") # works
getURL("https://www.google.com", ssl.verifypeer=TRUE) # works
ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.
However, in other instances, the connection fails:
getURL("https://curl.haxx.se/ca/cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
And similarly, using the previously downloaded bundle:
getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
The same error happens even when suppressing the security:
getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE)
# same error as above
Questions
How to use HTTPS properly in RCurl?
As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
Is RCurl become obsolete and should we opt for curl?
Update
The issue persists as of
R version 3.4.1 (2017-06-30) under Windows 10.

openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2
Yes, curl package is OK
Or you can use httr package which is a wrapper for the curl package
> library("httr")
> GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1))
Response [https://curl.haxx.se/ca/cacert.pem]
Date: 2017-08-16 17:07
Status: 200
Content-Type: application/x-pem-file
Size: 256 kB
<BINARY BODY>

Related

Is getURL deprecated? [duplicate]

ETA: Per https://github.com/hiratake55/RForcecom/issues/42, it looks like the author of the rforcecom package has updated rforcecom to use httr instead of RCurl (as of today, to be uploaded to CRAN tomorrow, 7/1/16), so my particular issue will be solved at that point. However, the general case (implementing TLS 1.1 / 1.2 in RCurl) may still be worth pursuing for other packages. Or everyone may just switch to the more recent curl package instead of RCurl.
Background: I've been using the rforcecom package to communicate with Salesforce for several months. Salesforce recently disabled support for TLS v1.0 and is requiring TLS v1.1 or higher in their Sandboxes; this update will take place for Production environments in March 2017.
rforcecom uses RCurl to communicate with salesforce.com servers. Generally the curlPerform method is used, which is implemented something like this (this example is from rforcecom.login.R):
h <- basicHeaderGatherer()
t <- basicTextGatherer()
URL <- paste(loginURL, rforcecom.api.getSoapEndpoint(apiVersion), sep="")
httpHeader <- c("SOAPAction"="login","Content-Type"="text/xml")
curlPerform(url=URL, httpheader=httpHeader, postfields=soapBody, headerfunction = h$update, writefunction = t$update, ssl.verifypeer=F)
This has been working for me for a while, as I mentioned. Now that Salesforce has disabled TLS v1.0 on Sandboxes, though, it fails with the following error:
UNSUPPORTED_CLIENT: TLS 1.0 has been disabled in this organization. Please use TLS 1.1 or higher when connecting to Salesforce using https.
I've modified and sourced (without implementing it in the local copy of the package, as I'm not experienced enough to do this) a change to my local copy of the login module for RForcecom, and I've discovered through experimentation that I can specify any of the existing enumerated values for SSLVERSION successfully by adding sslversion=SSLVERSION_TLSv1, sslversion=SSLVERSION_SSLv3, etc. to the curlPerform options where it is called. However, all of these give me the same error as above. When I attempt to use one of the options that is implemented in libcurl but not in RCurl (SSLVERSION_TLSv1.1, SSLVERSION_TLSv1.2), I get the following error:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.1' not found
or:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.2' not found
I've verified with curlVersion() that my libcurl version is 7.40.0, which according to https://curl.haxx.se/libcurl/c/CURLOPT_SSLVERSION.html does support those options. However, I'm unable to get RCurl to recognize them.
At this point what I'm looking for is a way to get RCurl to use TLS v1.1 or TLS v1.2, and I would very much appreciate any assistance I can get with that. I apologize for any problems/issues with my question as this is my first time asking one myself, I've previously always been able to muddle through by reading other people's questions and answers.
RCurl is an interface to libcurl and what it supports depends on the latter. It is possible that your libcurl was built with an older version of OpenSSL which does not have support for TLS v1.1 or v.1.2. Your can determine your SSL version from R like this:
RCurl::curlVersion()$ssl_version
I think that by default (e.g. ssl option CURL_SSLVERSION_DEFAULT), during the SSL handshake, the server and client would agree on the newest version that they both support. To make it work you would have to update OpenSSL to a newer version, recompile libcurl with it and rebuild RCurl so that it registers the updates.
That said, you can enforce a particular ssl version not defined in RCurl by passing the integer value of the required option yourself. The numbers you are looking for can be deduced from the C enum defined in the curl header file on GitHub:
enum {
CURL_SSLVERSION_DEFAULT, // 0
CURL_SSLVERSION_TLSv1, /* TLS 1.x */ // 1
CURL_SSLVERSION_SSLv2, // 2
CURL_SSLVERSION_SSLv3, // 3
CURL_SSLVERSION_TLSv1_0, // 4
CURL_SSLVERSION_TLSv1_1, // 5
CURL_SSLVERSION_TLSv1_2, // 6
CURL_SSLVERSION_TLSv1_3, // 7
CURL_SSLVERSION_LAST /* never use, keep last */ // 8
};
For example:
# the data of you post request
nameValueList = list(data1 = "data1", data2 = "data2")
CURL_SSLVERSION_TLSv1_1 <- 5L
CURL_SSLVERSION_TLSv1_2 <- 6L
# TLS 1.1
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_1, ...)
# TLS 1.2
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_2, ...)
# finally, POST the data
RCurl::postForm(URL, .params = nameValueList, .opts = opts)
In general, this may not be a good practice because the authors of cURL could decide to change the values (although I believe chances are low) those options have.

TLS v1.1 / TLS v1.2 support in RCurl

ETA: Per https://github.com/hiratake55/RForcecom/issues/42, it looks like the author of the rforcecom package has updated rforcecom to use httr instead of RCurl (as of today, to be uploaded to CRAN tomorrow, 7/1/16), so my particular issue will be solved at that point. However, the general case (implementing TLS 1.1 / 1.2 in RCurl) may still be worth pursuing for other packages. Or everyone may just switch to the more recent curl package instead of RCurl.
Background: I've been using the rforcecom package to communicate with Salesforce for several months. Salesforce recently disabled support for TLS v1.0 and is requiring TLS v1.1 or higher in their Sandboxes; this update will take place for Production environments in March 2017.
rforcecom uses RCurl to communicate with salesforce.com servers. Generally the curlPerform method is used, which is implemented something like this (this example is from rforcecom.login.R):
h <- basicHeaderGatherer()
t <- basicTextGatherer()
URL <- paste(loginURL, rforcecom.api.getSoapEndpoint(apiVersion), sep="")
httpHeader <- c("SOAPAction"="login","Content-Type"="text/xml")
curlPerform(url=URL, httpheader=httpHeader, postfields=soapBody, headerfunction = h$update, writefunction = t$update, ssl.verifypeer=F)
This has been working for me for a while, as I mentioned. Now that Salesforce has disabled TLS v1.0 on Sandboxes, though, it fails with the following error:
UNSUPPORTED_CLIENT: TLS 1.0 has been disabled in this organization. Please use TLS 1.1 or higher when connecting to Salesforce using https.
I've modified and sourced (without implementing it in the local copy of the package, as I'm not experienced enough to do this) a change to my local copy of the login module for RForcecom, and I've discovered through experimentation that I can specify any of the existing enumerated values for SSLVERSION successfully by adding sslversion=SSLVERSION_TLSv1, sslversion=SSLVERSION_SSLv3, etc. to the curlPerform options where it is called. However, all of these give me the same error as above. When I attempt to use one of the options that is implemented in libcurl but not in RCurl (SSLVERSION_TLSv1.1, SSLVERSION_TLSv1.2), I get the following error:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.1' not found
or:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.2' not found
I've verified with curlVersion() that my libcurl version is 7.40.0, which according to https://curl.haxx.se/libcurl/c/CURLOPT_SSLVERSION.html does support those options. However, I'm unable to get RCurl to recognize them.
At this point what I'm looking for is a way to get RCurl to use TLS v1.1 or TLS v1.2, and I would very much appreciate any assistance I can get with that. I apologize for any problems/issues with my question as this is my first time asking one myself, I've previously always been able to muddle through by reading other people's questions and answers.
RCurl is an interface to libcurl and what it supports depends on the latter. It is possible that your libcurl was built with an older version of OpenSSL which does not have support for TLS v1.1 or v.1.2. Your can determine your SSL version from R like this:
RCurl::curlVersion()$ssl_version
I think that by default (e.g. ssl option CURL_SSLVERSION_DEFAULT), during the SSL handshake, the server and client would agree on the newest version that they both support. To make it work you would have to update OpenSSL to a newer version, recompile libcurl with it and rebuild RCurl so that it registers the updates.
That said, you can enforce a particular ssl version not defined in RCurl by passing the integer value of the required option yourself. The numbers you are looking for can be deduced from the C enum defined in the curl header file on GitHub:
enum {
CURL_SSLVERSION_DEFAULT, // 0
CURL_SSLVERSION_TLSv1, /* TLS 1.x */ // 1
CURL_SSLVERSION_SSLv2, // 2
CURL_SSLVERSION_SSLv3, // 3
CURL_SSLVERSION_TLSv1_0, // 4
CURL_SSLVERSION_TLSv1_1, // 5
CURL_SSLVERSION_TLSv1_2, // 6
CURL_SSLVERSION_TLSv1_3, // 7
CURL_SSLVERSION_LAST /* never use, keep last */ // 8
};
For example:
# the data of you post request
nameValueList = list(data1 = "data1", data2 = "data2")
CURL_SSLVERSION_TLSv1_1 <- 5L
CURL_SSLVERSION_TLSv1_2 <- 6L
# TLS 1.1
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_1, ...)
# TLS 1.2
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_2, ...)
# finally, POST the data
RCurl::postForm(URL, .params = nameValueList, .opts = opts)
In general, this may not be a good practice because the authors of cURL could decide to change the values (although I believe chances are low) those options have.

RCurl getURL SSL error

Related questions:
RCurl errors when fetching ssl endpoint
R: Specify SSL version in Rcurl getURL statement
I am looking at the following:
url = https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39
Then,
getURL(url)
gives the following error:
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
Adding the followinf curl option, suggested in one of the related questions,
getURL(url, ssl.verifypeer = TRUE,sslversion=3L)
returns
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
Any help would be greatly appreciated.
The RCurl package is unmaintained and defunct. Use the curl package or httr instead:
library(curl)
con <- curl("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39")
readLines(con)
Or:
library(httr)
req <- GET("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39")
stop_for_status(req)
content(req)
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
This is a known bug in old OpenSSL versions and was fixed about 4 years ago. Thus upgrading your local OpenSSL should help.
If this is not possible you might disable the use of the affected ciphers by setting the ssl.cipher.list option of RCurl to HIGH:!ECDH.
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
The server does not support SSL 3.0 so trying to enforce this version will fail.

oauth Handshake error using twitteR in R

I have been trying to download some tweets from using the tweeteR package in R
The code for my oauth credentials is
cred<-AuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
When i try to run the following for handshake
cred$handshake(cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"))
I am getting this error
Error in function (type, msg, asError = TRUE) :
Could not resolve host: api.twitter.com; No data record of requested type
I am running the code in a windows machine. (I have included the code for downloading cacert.pem)
I worked out the solution for this, for people who are behind a proxy they have to set the proxy options in RCurl too(setting the proxy for the R is not enough). This command works
options( RCurlOptions = list(verbose = TRUE,proxy = "host:port"))

R: Specify SSL version in Rcurl getURL statement

Aim
I am trying to write a statement to connect to a sharepoint online list to retrieve the data for use in R.
Background
This is my first time using RCurl/curl/libcurl and I've attempted to read the documentation but it's beyond me and doesn't have any relevant examples.
Using
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")
Resulted in an error
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to example.sharepoint.com:443
A combination of google and deciphering of cryptic libcurl documentation identified that the issue was due to the SSL type. I used curl in shell to verify and I got a positive response from the sharepoint server as a result (well, except the need to add the user name and password)
curl https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary -v -SSLv3
Problem
I cannot work out how to include the -SSLv3 argument in my getURL() function after RTFMs
Requested solution
Amend this statement to utilise SSLv3
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")
For me this works:
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary", ssl.verifypeer=TRUE, sslversion=4L)

Resources