xml2 gives an SSL error when connecting to a governmental webpage - r

I'm receiving an error on SSL Certificates when I try to use the function xml2::read_html() on a Brazilian Government's webpage.
When I try to access
page = xml2::read_html("https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos")
I receive the following error:
Error in open.connection(x, "rb") :
SSL certificate problem: unable to get local issuer certificate
I found another SO question with 3 possible solutions:
httr::set_config(config(ssl_verifypeer = 0L)) #1
httr::set_config(config(ssl_verifypeer = FALSE)) #2
Sys.setenv(LIBCURL_BUILD="winssl") #3
None of them solved my problem, then I tried running the code on a Kaggle Notebook, and I received the same error message so I could see the problem isn't on my PC.

from https://curl.haxx.se/docs/sslcerts.html:
Certificate Verification
...
Tell libcurl to not verify the peer. With libcurl you disable this with curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, FALSE);
With the curl command line tool, you disable this with -k/--insecure.
So, from the command line (or terminal) the following does work:
curl -k https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos
the following, as a workaround, also works (using curl library):
url <- "https://www.gov.br/planalto/pt-br/acompanhe-o-planalto/discursos"
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
curl::curl_download(url=url, destfile = "file_test.html", handle = h)
I couldn't find a way to set the insecure option within the xml2 package options, which would be the right answer to this question.
Funnily, though, the following also "works", but only to download the html file, to parse it directly, no luck.
curl::handle_setopt(h, ssl_verifyhost = 0, ssl_verifypeer=0)
xml2::download_html(url, handle = h)
xml2::read_xml(url, handle = h) #doesnt work
xml2::read_html(url, handle = h) #doesnt work
edit:
actually, following the info here, option 181
#> 181 ssl_verifypeer CURLOPT_SSL_VERIFYPEER integer
should be what you've tried and didn't work. Might be a bug, since it is the same option that works from the command line.

Related

API Request - Could not load PEM Certificate? OpenSSL Error ,HTTR R, CURL, [duplicate]

I'm trying to make an API request and am passing my SSL cert to the config() parameter of GET. I initially got this working for a few weeks but then had to reinstall R. I did a clean install, deleting all folders, installing R, RTools, RStudio. In this new instance of R the same script no longer works. I've uninstalled/reinstalled HTTR, curl, openssl and no luck still (I've also reinstalled R multiple times).
This is the error I get:
Error in curl::curl_fetch_memory(url, handle = handle) :
could not load PEM client certificate, OpenSSL error error:02001002:system library:fopen:No such file or directory, (no key found, wrong pass phrase, or wrong file format?)
This is the get request code:
conn <- GET(url = "testurl",
add_headers(header2),
config(sslcert = "my_cert.pem", sslkey = "my_key.pem"),
content_type_json(),
accept_json())
Where my_cert.pem and my_key.pem were parsed from openssl by:
cert <- openssl::read_p12(file = "Data/certificate.pfx", password = "somepassword")
my_cert.pem <- write_pem(cert$cert)
my_key.pem <- write_pem(cert$key)
Any help with this would be much appreciated.
Thank you
Following the answer provided by #MrFlick I was able to resolve the issue.
Writing the files to disk before the get() request like so:
write_pem(cert$cert, "my_cert.pem")
write_pem(cert$key, "my_key.pem")
Resulted in a successful response and I no longer received the error of:
Error in curl::curl_fetch_memory(url, handle = handle) : could not load PEM client certificate, OpenSSL error error:02001002:system library:fopen:No such file or directory, (no key found, wrong pass phrase, or wrong file format?)

HTTR R, CURL, Could not load PEM Certificate?

I'm trying to make an API request and am passing my SSL cert to the config() parameter of GET. I initially got this working for a few weeks but then had to reinstall R. I did a clean install, deleting all folders, installing R, RTools, RStudio. In this new instance of R the same script no longer works. I've uninstalled/reinstalled HTTR, curl, openssl and no luck still (I've also reinstalled R multiple times).
This is the error I get:
Error in curl::curl_fetch_memory(url, handle = handle) :
could not load PEM client certificate, OpenSSL error error:02001002:system library:fopen:No such file or directory, (no key found, wrong pass phrase, or wrong file format?)
This is the get request code:
conn <- GET(url = "testurl",
add_headers(header2),
config(sslcert = "my_cert.pem", sslkey = "my_key.pem"),
content_type_json(),
accept_json())
Where my_cert.pem and my_key.pem were parsed from openssl by:
cert <- openssl::read_p12(file = "Data/certificate.pfx", password = "somepassword")
my_cert.pem <- write_pem(cert$cert)
my_key.pem <- write_pem(cert$key)
Any help with this would be much appreciated.
Thank you
Following the answer provided by #MrFlick I was able to resolve the issue.
Writing the files to disk before the get() request like so:
write_pem(cert$cert, "my_cert.pem")
write_pem(cert$key, "my_key.pem")
Resulted in a successful response and I no longer received the error of:
Error in curl::curl_fetch_memory(url, handle = handle) : could not load PEM client certificate, OpenSSL error error:02001002:system library:fopen:No such file or directory, (no key found, wrong pass phrase, or wrong file format?)

downloading csv file works via libcurl yet does not via curl method

OS: Win 7 64 bit
RStudio Version 1.1.463
As per Getting and Cleaning Data course, I attempted to download a csv file with method = curl:
fileUrl <- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./cameras.csv", method = "curl")
Error in download.file(fileUrl, destfile = "./cameras.csv", method =
"curl") : 'curl' call had nonzero exit status
However, method = libcurl resulted a successful download:
download.file(fileUrl, destfile = "./cameras.csv", method = "libcurl")
trying URL
'https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD'
downloaded 9443 bytes
changing from *http***s** to http produced exactly the same results for curl and libcurl, respectively.
Is there anyway to make this download work via method = curl as per the course?
Thanks
As you can see from ?download.file:
For methods "wget" and "curl" a system call is made to the tool given
by method, and the respective program must be installed on your system
and be in the search path for executables. They will block all other
activity on the R process until they complete: this may make a GUI
unresponsive.
Therefore, you should install curlfirst.
See this How do I install and use curl on Windows? to learn how.
Best!
I believe there were a few issues here:
Followed the steps in the link quoted by #JonnyCrunch
a) Reinstalled Git for windows;
b) added C:\Program Files\Git\mingw64\bin\ to the 'PATH' variable;
c) Disabled Use Internet Explorer library/proxy for HTTP in RStudio in: Tools > Options > Packages
d) Attempted steps in 'e)' below and added data.baltimorecity.gov
website to exclusions as per Kaspersky anti-virus' prompt;
e) Then in RStudio:
options(download.file.method = "curl")
download.file(fileUrl, destfile="./data/cameras.csv")
Success!
Thank you

oauth Handshake error using twitteR in R

I have been trying to download some tweets from using the tweeteR package in R
The code for my oauth credentials is
cred<-AuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
When i try to run the following for handshake
cred$handshake(cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"))
I am getting this error
Error in function (type, msg, asError = TRUE) :
Could not resolve host: api.twitter.com; No data record of requested type
I am running the code in a windows machine. (I have included the code for downloading cacert.pem)
I worked out the solution for this, for people who are behind a proxy they have to set the proxy options in RCurl too(setting the proxy for the R is not enough). This command works
options( RCurlOptions = list(verbose = TRUE,proxy = "host:port"))

RGoogleDocs (or RCurl) giving SSL certificate problem

I was using one of my favorite R packages today to read data from a google spreadsheet. It would not work. This problem is occurring on all my machines (I use windows) and it appears to be a new problem. I am using Version: 0.4-1 of RGoogleDocs
library(RGoogleDocs)
ps <-readline(prompt="get the password in ")
sheets.con = getGoogleDocsConnection(getGoogleAuth("fxxxh#gmail.com", ps, service ="wise"))
ts2=getWorksheets("OnCall",sheets.con)
And this is what I get after running the last line.
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
I did some reading and came across some interesting, but not useful to me at least, information.
When I try to interact with a URL via https, I get an error of the form
Curl: SSL certificate problem, verify that the CA cert is OK
I got the very big picture message but did not know how to implement the solution in my script. I dropped the following line before getWorksheets.
x = getURLContent("https://www.google.com", ssl.verifypeer = FALSE)
That did not work so I tried
ts2=getWorksheets("OnCall",sheets.con,ssl.verifypeer = FALSE)
That also did not work.
Interestingly enough, the following line works
getDocs(sheets.con,folders = FALSE)
What do you suggest I try to get it working again? Thanks.
I no longer have this problem. I do not quite remember the timeline of exactly when I overcame the problem and cannot remember who helped me get here but here is a typical session which works.
library(RGoogleDocs)
if(exists("ps")) print("got password, keep going") else ps <-readline(prompt="get the password in ") #conditional password asking
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
sheets.con = getGoogleDocsConnection(getGoogleAuth("fjh#gmail.com", ps, service ="wise"))
#WARNING: this would prevent curl from detecting a 'man in the middle' attack
ts2=getWorksheets("name of workbook here",sheets.con)
names(ts2)
sheet.1 <-sheetAsMatrix(ts2$"Sheet 1",header=TRUE, as.data.frame=TRUE, trim=TRUE) #Get one sheet
other <-sheetAsMatrix(ts2$"whatever name of tab",header=TRUE, as.data.frame=TRUE, trim=TRUE) #Get other sheet
Does it help you?
Maybe you don't have the certificate bundle installed. I installed those on OS X. You can also find them on the curl site

Resources