R: Specify SSL version in Rcurl getURL statement - r

Aim
I am trying to write a statement to connect to a sharepoint online list to retrieve the data for use in R.
Background
This is my first time using RCurl/curl/libcurl and I've attempted to read the documentation but it's beyond me and doesn't have any relevant examples.
Using
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")
Resulted in an error
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to example.sharepoint.com:443
A combination of google and deciphering of cryptic libcurl documentation identified that the issue was due to the SSL type. I used curl in shell to verify and I got a positive response from the sharepoint server as a result (well, except the need to add the user name and password)
curl https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary -v -SSLv3
Problem
I cannot work out how to include the -SSLv3 argument in my getURL() function after RTFMs
Requested solution
Amend this statement to utilise SSLv3
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")

For me this works:
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary", ssl.verifypeer=TRUE, sslversion=4L)

Related

How can I read a json file from a remote server in R?

So I have a collection on json files located on my local machine that I am reading in currently using the command
file <- tbl_df(ndjson::stream_in("path/to/file.json")
I have copied these files to a linux server (using WinSCP) and I want to stream them in to my R session just like I did in the above code with ndjson. When searching for ways to do this I came across one method using RCurl that looked like this
file <- scp(host = "hostname", "path/to/file.json", "pass", "user")
but that returned an error
Error in function (type, msg, asError = TRUE) : Authentication failure
but either way I want to avoid copying my passphrase into my Rscript as other will see this script. I also came across a method suggesting this
d <- read.table(pipe('ssh -l user host "cat path/to/file.json"'))
however this command returned the error
no lines available in input
and I believe read.table would cause me issues anyways. Does anyone know I way I could read new line delimited json files from a remote server into an R session? Thank you in advance! Let me know if I can make my question more clear.

Using package RCurl ftpUpload with # symbol in the username

I have been using the RCurl package to pull data into memory from an SFTP and am trying to upload the transformed data into a different SFTP. The problem I am having is that the username assigned on the new SFTP has an # sign in it. When I try to run the code below (sensitive info removed):
ftpUpload(what = file,
to = "sftp://user#school.edu:password#site.net/incoming/subfolder/data.csv")
The following error appears:
Error in function (type, msg, asError = TRUE) :
Failed to connect to school.edu port 22: Timed out
The # sign is creating an issue where the file is attempting to upload to the wrong location(school.edu as opposed to site.net). Unfortunately, I’m unable to alter the username as I’m told the site automatically generates usernames and will always use an # sign. I really don’t know that much about SFTPs, so any help would be appreciated, even if that means working outside of R for a solution.
Perhaps a safer way to pass the username and password is via the userpwd= parameter. For example
ftpUpload(what = file,
to = "sftp://site.net/incoming/subfolder/data.csv",
userpwd="user#school.edu:password")

R: download data securely using TLS/SSL

Official Statements
In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:
All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...
The download.file() help still says:
Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.
which (by the way includes cookies and headers management).
Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:
getURL(url, cainfo="CA bundle")
where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem
Current status
Tests are based on Windows platforms
For many HTTPS websites download.file() works as stated:
download.file(url="https://www.google.com", destfile="google.html")
download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:
library(RCurl)
getURL("https://www.google.com", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# SSL certificate problem: unable to get local issuer certificate
In this instance, simply removing the reference to the certificate bundle solves the problem:
getURL("https://www.google.com") # works
getURL("https://www.google.com", ssl.verifypeer=TRUE) # works
ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.
However, in other instances, the connection fails:
getURL("https://curl.haxx.se/ca/cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
And similarly, using the previously downloaded bundle:
getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
The same error happens even when suppressing the security:
getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE)
# same error as above
Questions
How to use HTTPS properly in RCurl?
As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
Is RCurl become obsolete and should we opt for curl?
Update
The issue persists as of
R version 3.4.1 (2017-06-30) under Windows 10.
openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2
Yes, curl package is OK
Or you can use httr package which is a wrapper for the curl package
> library("httr")
> GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1))
Response [https://curl.haxx.se/ca/cacert.pem]
Date: 2017-08-16 17:07
Status: 200
Content-Type: application/x-pem-file
Size: 256 kB
<BINARY BODY>

R & RCurl: Error 54 in libcurl

I am trying to get some data in json format in using RCurl.
I have to use POST to enter the username and password such as:
postForm(url1, user=x$USERNAME, pass=x$PASSWORD)
I get the following error:
Error in function (type, msg, asError = TRUE) :
SSL read: error:00000000:lib(0):func(0):reason(0), errno 54
If I researched the correct error number 54 from the libcurl site:
CURLE_SSL_ENGINE_SETFAILED (54)
Failed setting the selected SSL crypto engine as default!
If this is the correct error how would I select the SSL engine?
Sorry, but the conclusion about this error is wrong. The output you see is from lib/ssluse.c in libcurl's source code and the "errno" mentioned there is not the libcurl error code but the actual errno variable at that time. I'm not sure what help we can get by knowing it, but 54 in my system equals EXFULL.
You should rather check the return code from the libcurl function that fails to get the proper libcurl error code but I guess you might see CURLE_RECV_ERROR simply because libcurl got tis SSL problem when receiving data.
Unfortunately that error string from openssl is not very helpful and I can't tell why you got it!

"Object Moved" error in using the RCurl getURL function in order to access an ASP Webpage

I am trying to use the getURL function of RCurl Package in order to access an ASP Webpage as:
my_url <- "http://www.my_site.org/my_site/main.asp?ID=11&REFID=33"
webpage <- getURL(my_url)
but I get an Object Moved redirection error message like:
"<head><title>Object moved</title></head>\n<body><h1>Object Moved</h1>
This object may be found here.</body>\n"
I followed various suggestions like using the curlEscape URL encoding function or by setting the CURLOPT_FOLLOWLOCATION and CCURLOPT_SSL_VERIFYHOST Parameters via the curlSetOpt Function as listed in the php ssl curl : object moved error link, but the later 2 were not recognized as valid RCurl options.
Any suggestions how to overcome the issue?
Use the followlocation curl option:
getURL(u,.opts=curlOptions(followlocation=TRUE))
with added cookiefile goodness - its supposed to be a file that doesnt exist, but I'm not sure how you can be sure of that:
w=getURL(u,.opts=curlOptions(followlocation=TRUE,cookiefile="nosuchfile"))

Resources