R - Error when using getURL from curl after site was changed - r

I have been using getURL from curl (in R) to read from https://fantasy.premierleague.com/drf/bootstrap-static
Example code:
print(getURL("https://fantasy.premierleague.com/drf/bootstrap-static"))
No problem until a few days ago. But now getting the error:
Error in function (type, msg, asError = TRUE) :
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Have upgraded to latest R (3.4.0) and curl package: RCurl_1.95-4.8
I have a workaround (to use GET from httr) but can anyone help me get it working with getURL?
I believe the server change is that they only now support TLS 1.2. I have tried the following to fix but now get a new error. May relate to needing newer OpenSSL?
CURL_SSLVERSION_TLSv1_2 <- 6L
opt <- RCurl::curlOptions(verbose = TRUE, sslversion =
CURL_SSLVERSION_TLSv1_2)
print( RCurl::getURL("https://fantasy.premierleague.com/drf/bootstrap-static", .opts = opt))
New error is:
Unsupported SSL protocol version

I think if you remove the getURL part of the code it should work ok
url = "https://fantasy.premierleague.com/drf/bootstrap-static"
json = fromJSON((url))

Related

Accessing Twitter streaming API - authentication does not work in R

I am using the following code to connect to Twitter streaming API to download tweets
#install.packages("streamR")
#install.packages("ROAuth")
library(ROAuth)
library(streamR)
#create your OAuth credential
credential <- OAuthFactory$new(consumerKey='**CONSUMER KEY**',
consumerSecret='**CONSUMER SECRETY KEY**',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='https://api.twitter.com/oauth/authorize')
#authentication process
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
credential$handshake(cainfo="cacert.pem")
It throws this error:
Error in function (type, msg, asError = TRUE) :
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
This code was perfectly working a couple of years ago, can someone please guide me, what do I need to change?
P.S. I am working latest versions of R and R studio.
Thanks!!
The only thing that has changed on the Twitter side in “a couple of years” is that we now enforce TLS 1.2 for connections. The error you’re seeing is an SSL/TLS issue. I’m not sure how to resolve that with Rstudio but I’d recommend looking at the libraries and settings related to that area.

xml2 gives an SSL error when connecting to a governmental webpage

I'm receiving an error on SSL Certificates when I try to use the function xml2::read_html() on a Brazilian Government's webpage.
When I try to access
page = xml2::read_html("https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos")
I receive the following error:
Error in open.connection(x, "rb") :
SSL certificate problem: unable to get local issuer certificate
I found another SO question with 3 possible solutions:
httr::set_config(config(ssl_verifypeer = 0L)) #1
httr::set_config(config(ssl_verifypeer = FALSE)) #2
Sys.setenv(LIBCURL_BUILD="winssl") #3
None of them solved my problem, then I tried running the code on a Kaggle Notebook, and I received the same error message so I could see the problem isn't on my PC.
from https://curl.haxx.se/docs/sslcerts.html:
Certificate Verification
...
Tell libcurl to not verify the peer. With libcurl you disable this with curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, FALSE);
With the curl command line tool, you disable this with -k/--insecure.
So, from the command line (or terminal) the following does work:
curl -k https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos
the following, as a workaround, also works (using curl library):
url <- "https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos"
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
curl::curl_download(url=url, destfile = "file_test.html", handle = h)
I couldn't find a way to set the insecure option within the xml2 package options, which would be the right answer to this question.
Funnily, though, the following also "works", but only to download the html file, to parse it directly, no luck.
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
xml2::download_html(url, handle = h)
xml2::read_xml(url, handle = h) #doesnt work
xml2::read_html(url, handle = h) #doesnt work
edit:
actually, following the info here, option 181
#> 181 ssl_verifypeer CURLOPT_SSL_VERIFYPEER integer
should be what you've tried and didn't work. Might be a bug, since it is the same option that works from the command line.

R: download data securely using TLS/SSL

Official Statements
In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:
All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...
The download.file() help still says:
Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.
which (by the way includes cookies and headers management).
Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:
getURL(url, cainfo="CA bundle")
where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem
Current status
Tests are based on Windows platforms
For many HTTPS websites download.file() works as stated:
download.file(url="https://www.google.com", destfile="google.html")
download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:
library(RCurl)
getURL("https://www.google.com", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# SSL certificate problem: unable to get local issuer certificate
In this instance, simply removing the reference to the certificate bundle solves the problem:
getURL("https://www.google.com") # works
getURL("https://www.google.com", ssl.verifypeer=TRUE) # works
ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.
However, in other instances, the connection fails:
getURL("https://curl.haxx.se/ca/cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
And similarly, using the previously downloaded bundle:
getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
The same error happens even when suppressing the security:
getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE)
# same error as above
Questions
How to use HTTPS properly in RCurl?
As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
Is RCurl become obsolete and should we opt for curl?
Update
The issue persists as of
R version 3.4.1 (2017-06-30) under Windows 10.
openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2
Yes, curl package is OK
Or you can use httr package which is a wrapper for the curl package
> library("httr")
> GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1))
Response [https://curl.haxx.se/ca/cacert.pem]
Date: 2017-08-16 17:07
Status: 200
Content-Type: application/x-pem-file
Size: 256 kB
<BINARY BODY>

RCurl getURL SSL error

Related questions:
RCurl errors when fetching ssl endpoint
R: Specify SSL version in Rcurl getURL statement
I am looking at the following:
url = https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39
Then,
getURL(url)
gives the following error:
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
Adding the followinf curl option, suggested in one of the related questions,
getURL(url, ssl.verifypeer = TRUE,sslversion=3L)
returns
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
Any help would be greatly appreciated.
The RCurl package is unmaintained and defunct. Use the curl package or httr instead:
library(curl)
con <- curl("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39")
readLines(con)
Or:
library(httr)
req <- GET("https://www.veilingbiljet.nl/resultaten-ajax.asp?order=datum&direction=D&page=1&field=0&regio=39")
stop_for_status(req)
content(req)
error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list
This is a known bug in old OpenSSL versions and was fixed about 4 years ago. Thus upgrading your local OpenSSL should help.
If this is not possible you might disable the use of the affected ciphers by setting the ssl.cipher.list option of RCurl to HIGH:!ECDH.
Unknown SSL protocol error in connection to www.veilingbiljet.nl:443
The server does not support SSL 3.0 so trying to enforce this version will fail.

oauth Handshake error using twitteR in R

I have been trying to download some tweets from using the tweeteR package in R
The code for my oauth credentials is
cred<-AuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
When i try to run the following for handshake
cred$handshake(cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"))
I am getting this error
Error in function (type, msg, asError = TRUE) :
Could not resolve host: api.twitter.com; No data record of requested type
I am running the code in a windows machine. (I have included the code for downloading cacert.pem)
I worked out the solution for this, for people who are behind a proxy they have to set the proxy options in RCurl too(setting the proxy for the R is not enough). This command works
options( RCurlOptions = list(verbose = TRUE,proxy = "host:port"))

Resources