In R how do you authenticate with an NTLM proxy when using the curl package - r

The R4.2.0 update made libcurl the default for the download.file function (and removed the wininet method for some types of url schemes). The same is true for the download.packages function.
My work uses a proxy.pac file in conjunction with NTLM authentication for the proxy. As a result, the changes made to the download.file make it not work with this type of proxy unless you specifically choose the wininet option (but this only works for some url schemes).
One of the consequences of this confluence of configurations is that users have to manually change the default package manager to be an internal mirror of CRAN to use the package installer at all in RStudio.
I've managed to use the curl library to be able to download files through the proxy:
base_directory <- "<save_path>"
handle <- curl::new_handle()
url <- "url_page_to_download"
handle_opts <- list(
proxy = curl::ie_get_proxy_for_url(url), # figures out the relevant proxy string for the url (or url stub you are using)
proxyauth = 8, # value of CURLAUTH_NTLM from curl_symbols()
proxyuserpwd = ":" # specifically set to “:” to force windows to do the negotiation. This string is the same as sending no username and password
)
handle_setopt(handle,.list=handle_opts)
data_file_location <- paste0(base_directory,"state.txt")
curl_download(url, destfile=data_file_location, quiet=FALSE, handle=handle)
I would like to find a way to do this that uses the libcurl method of the download.packages and download.file functions, which involves setting some environment variables (from what I can tell), but I can't find the set of parameters to place in the environment variables to get it to work.
How can I replicate this with the built in functions?

Related

Is getURL deprecated? [duplicate]

ETA: Per https://github.com/hiratake55/RForcecom/issues/42, it looks like the author of the rforcecom package has updated rforcecom to use httr instead of RCurl (as of today, to be uploaded to CRAN tomorrow, 7/1/16), so my particular issue will be solved at that point. However, the general case (implementing TLS 1.1 / 1.2 in RCurl) may still be worth pursuing for other packages. Or everyone may just switch to the more recent curl package instead of RCurl.
Background: I've been using the rforcecom package to communicate with Salesforce for several months. Salesforce recently disabled support for TLS v1.0 and is requiring TLS v1.1 or higher in their Sandboxes; this update will take place for Production environments in March 2017.
rforcecom uses RCurl to communicate with salesforce.com servers. Generally the curlPerform method is used, which is implemented something like this (this example is from rforcecom.login.R):
h <- basicHeaderGatherer()
t <- basicTextGatherer()
URL <- paste(loginURL, rforcecom.api.getSoapEndpoint(apiVersion), sep="")
httpHeader <- c("SOAPAction"="login","Content-Type"="text/xml")
curlPerform(url=URL, httpheader=httpHeader, postfields=soapBody, headerfunction = h$update, writefunction = t$update, ssl.verifypeer=F)
This has been working for me for a while, as I mentioned. Now that Salesforce has disabled TLS v1.0 on Sandboxes, though, it fails with the following error:
UNSUPPORTED_CLIENT: TLS 1.0 has been disabled in this organization. Please use TLS 1.1 or higher when connecting to Salesforce using https.
I've modified and sourced (without implementing it in the local copy of the package, as I'm not experienced enough to do this) a change to my local copy of the login module for RForcecom, and I've discovered through experimentation that I can specify any of the existing enumerated values for SSLVERSION successfully by adding sslversion=SSLVERSION_TLSv1, sslversion=SSLVERSION_SSLv3, etc. to the curlPerform options where it is called. However, all of these give me the same error as above. When I attempt to use one of the options that is implemented in libcurl but not in RCurl (SSLVERSION_TLSv1.1, SSLVERSION_TLSv1.2), I get the following error:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.1' not found
or:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.2' not found
I've verified with curlVersion() that my libcurl version is 7.40.0, which according to https://curl.haxx.se/libcurl/c/CURLOPT_SSLVERSION.html does support those options. However, I'm unable to get RCurl to recognize them.
At this point what I'm looking for is a way to get RCurl to use TLS v1.1 or TLS v1.2, and I would very much appreciate any assistance I can get with that. I apologize for any problems/issues with my question as this is my first time asking one myself, I've previously always been able to muddle through by reading other people's questions and answers.
RCurl is an interface to libcurl and what it supports depends on the latter. It is possible that your libcurl was built with an older version of OpenSSL which does not have support for TLS v1.1 or v.1.2. Your can determine your SSL version from R like this:
RCurl::curlVersion()$ssl_version
I think that by default (e.g. ssl option CURL_SSLVERSION_DEFAULT), during the SSL handshake, the server and client would agree on the newest version that they both support. To make it work you would have to update OpenSSL to a newer version, recompile libcurl with it and rebuild RCurl so that it registers the updates.
That said, you can enforce a particular ssl version not defined in RCurl by passing the integer value of the required option yourself. The numbers you are looking for can be deduced from the C enum defined in the curl header file on GitHub:
enum {
CURL_SSLVERSION_DEFAULT, // 0
CURL_SSLVERSION_TLSv1, /* TLS 1.x */ // 1
CURL_SSLVERSION_SSLv2, // 2
CURL_SSLVERSION_SSLv3, // 3
CURL_SSLVERSION_TLSv1_0, // 4
CURL_SSLVERSION_TLSv1_1, // 5
CURL_SSLVERSION_TLSv1_2, // 6
CURL_SSLVERSION_TLSv1_3, // 7
CURL_SSLVERSION_LAST /* never use, keep last */ // 8
};
For example:
# the data of you post request
nameValueList = list(data1 = "data1", data2 = "data2")
CURL_SSLVERSION_TLSv1_1 <- 5L
CURL_SSLVERSION_TLSv1_2 <- 6L
# TLS 1.1
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_1, ...)
# TLS 1.2
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_2, ...)
# finally, POST the data
RCurl::postForm(URL, .params = nameValueList, .opts = opts)
In general, this may not be a good practice because the authors of cURL could decide to change the values (although I believe chances are low) those options have.

R: download data securely using TLS/SSL

Official Statements
In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:
All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...
The download.file() help still says:
Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.
which (by the way includes cookies and headers management).
Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:
getURL(url, cainfo="CA bundle")
where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem
Current status
Tests are based on Windows platforms
For many HTTPS websites download.file() works as stated:
download.file(url="https://www.google.com", destfile="google.html")
download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:
library(RCurl)
getURL("https://www.google.com", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# SSL certificate problem: unable to get local issuer certificate
In this instance, simply removing the reference to the certificate bundle solves the problem:
getURL("https://www.google.com") # works
getURL("https://www.google.com", ssl.verifypeer=TRUE) # works
ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.
However, in other instances, the connection fails:
getURL("https://curl.haxx.se/ca/cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
And similarly, using the previously downloaded bundle:
getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE) :
# error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
The same error happens even when suppressing the security:
getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE)
# same error as above
Questions
How to use HTTPS properly in RCurl?
As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
Is RCurl become obsolete and should we opt for curl?
Update
The issue persists as of
R version 3.4.1 (2017-06-30) under Windows 10.
openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2
Yes, curl package is OK
Or you can use httr package which is a wrapper for the curl package
> library("httr")
> GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1))
Response [https://curl.haxx.se/ca/cacert.pem]
Date: 2017-08-16 17:07
Status: 200
Content-Type: application/x-pem-file
Size: 256 kB
<BINARY BODY>

TLS v1.1 / TLS v1.2 support in RCurl

ETA: Per https://github.com/hiratake55/RForcecom/issues/42, it looks like the author of the rforcecom package has updated rforcecom to use httr instead of RCurl (as of today, to be uploaded to CRAN tomorrow, 7/1/16), so my particular issue will be solved at that point. However, the general case (implementing TLS 1.1 / 1.2 in RCurl) may still be worth pursuing for other packages. Or everyone may just switch to the more recent curl package instead of RCurl.
Background: I've been using the rforcecom package to communicate with Salesforce for several months. Salesforce recently disabled support for TLS v1.0 and is requiring TLS v1.1 or higher in their Sandboxes; this update will take place for Production environments in March 2017.
rforcecom uses RCurl to communicate with salesforce.com servers. Generally the curlPerform method is used, which is implemented something like this (this example is from rforcecom.login.R):
h <- basicHeaderGatherer()
t <- basicTextGatherer()
URL <- paste(loginURL, rforcecom.api.getSoapEndpoint(apiVersion), sep="")
httpHeader <- c("SOAPAction"="login","Content-Type"="text/xml")
curlPerform(url=URL, httpheader=httpHeader, postfields=soapBody, headerfunction = h$update, writefunction = t$update, ssl.verifypeer=F)
This has been working for me for a while, as I mentioned. Now that Salesforce has disabled TLS v1.0 on Sandboxes, though, it fails with the following error:
UNSUPPORTED_CLIENT: TLS 1.0 has been disabled in this organization. Please use TLS 1.1 or higher when connecting to Salesforce using https.
I've modified and sourced (without implementing it in the local copy of the package, as I'm not experienced enough to do this) a change to my local copy of the login module for RForcecom, and I've discovered through experimentation that I can specify any of the existing enumerated values for SSLVERSION successfully by adding sslversion=SSLVERSION_TLSv1, sslversion=SSLVERSION_SSLv3, etc. to the curlPerform options where it is called. However, all of these give me the same error as above. When I attempt to use one of the options that is implemented in libcurl but not in RCurl (SSLVERSION_TLSv1.1, SSLVERSION_TLSv1.2), I get the following error:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.1' not found
or:
Error in merge(list(...), .opts) : object 'SSLVERSION_TLSv1.2' not found
I've verified with curlVersion() that my libcurl version is 7.40.0, which according to https://curl.haxx.se/libcurl/c/CURLOPT_SSLVERSION.html does support those options. However, I'm unable to get RCurl to recognize them.
At this point what I'm looking for is a way to get RCurl to use TLS v1.1 or TLS v1.2, and I would very much appreciate any assistance I can get with that. I apologize for any problems/issues with my question as this is my first time asking one myself, I've previously always been able to muddle through by reading other people's questions and answers.
RCurl is an interface to libcurl and what it supports depends on the latter. It is possible that your libcurl was built with an older version of OpenSSL which does not have support for TLS v1.1 or v.1.2. Your can determine your SSL version from R like this:
RCurl::curlVersion()$ssl_version
I think that by default (e.g. ssl option CURL_SSLVERSION_DEFAULT), during the SSL handshake, the server and client would agree on the newest version that they both support. To make it work you would have to update OpenSSL to a newer version, recompile libcurl with it and rebuild RCurl so that it registers the updates.
That said, you can enforce a particular ssl version not defined in RCurl by passing the integer value of the required option yourself. The numbers you are looking for can be deduced from the C enum defined in the curl header file on GitHub:
enum {
CURL_SSLVERSION_DEFAULT, // 0
CURL_SSLVERSION_TLSv1, /* TLS 1.x */ // 1
CURL_SSLVERSION_SSLv2, // 2
CURL_SSLVERSION_SSLv3, // 3
CURL_SSLVERSION_TLSv1_0, // 4
CURL_SSLVERSION_TLSv1_1, // 5
CURL_SSLVERSION_TLSv1_2, // 6
CURL_SSLVERSION_TLSv1_3, // 7
CURL_SSLVERSION_LAST /* never use, keep last */ // 8
};
For example:
# the data of you post request
nameValueList = list(data1 = "data1", data2 = "data2")
CURL_SSLVERSION_TLSv1_1 <- 5L
CURL_SSLVERSION_TLSv1_2 <- 6L
# TLS 1.1
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_1, ...)
# TLS 1.2
opts <- RCurl::curlOptions(verbose = TRUE,
sslversion = CURL_SSLVERSION_TLSv1_2, ...)
# finally, POST the data
RCurl::postForm(URL, .params = nameValueList, .opts = opts)
In general, this may not be a good practice because the authors of cURL could decide to change the values (although I believe chances are low) those options have.

Proxy setting for R

I am facing problem while conecting R with internet in my office. May be this due to LAN settings. I tried the almost all possible ways I come across in the web (see below) but still in vain.
Method1: Invoking R using --internet2
Method2: Invoking R by setting ~/Rgui.exe http_proxy=http:/999.99.99.99:8080/ http_proxy_user=ask
Method3: Setting Setinternet2=TRUE
Method4:
curl <- getCurlHandle()
curlSetOpt(.opts = list(proxy = '999.99.99.99:8080'), curl = curl)
Res <- getURL('http://www.cricinfo.com', curl = curl)
In above all methods I can able to load packages directly from CRAN also able to download files using download.file command
But using getURL(RCurl), readHTMLTable(XML), htmlTreeParse(XML) commands I am unable to extract web data. I am getting ~<HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD>~ error.
How to set LAN proxy settings for XML package in R?
On Mac OS, I found the best solution here. Quoting the author, two simple steps are:
1) Open Terminal and do the following:
export http_proxy=http://staff-proxy.ul.ie:8080
export HTTP_PROXY=http://staff-proxy.ul.ie:8080
2) Run R and do the following:
Sys.setenv(http_proxy="http://staff-proxy.ul.ie:8080")
double-check this with:
Sys.getenv("http_proxy")
I am behind university proxy, and this solution worked perfectly. The major issue is to export the items in Terminal before running R, both in upper- and lower-case.
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
And then close the Rstudio, furthermore you have to:
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents. Note that if it does not exist you can creat it just by writing this command in RStudio:
file.edit('~/.Renviron')
Add these two lines to the initials of the file:
options(internet.info = 0)
http_proxy="http://user_id:password#your_proxy:your_port"
And that's it..??!!!
The problem is with your curl options – the RCurl package doesn't seem to use internet2.dll.
You need to specify the port separately, and will probably need to give your user login details as network credentials, e.g.,
opts <- list(
proxy = "999.999.999.999",
proxyusername = "mydomain\\myusername",
proxypassword = "mypassword",
proxyport = 8080
)
getURL("http://stackoverflow.com", .opts = opts)
Remember to escape any backslashes in your password. You may also need to wrap the URL in a call to curlEscape.
I had the same problem at my office and I solved it adding the proxy in the destination of the R shortcut; clik on right button of the R icon, preferences, and in the destination field add
"C:\Program Files\R\your_R_version\bin\Rgui.exe" http_proxy=http://user_id:passwod#your_proxy:your_port/
Be sure to put the directory where you have the R program installed. That works for me. Hope this help.
This post pertains to R proxy issues on *nix. You should know that R has many libraries/methods to fetch data over internet.
For 'curl', 'libcurl', 'wget' etc, just do the following:
Open a terminal. Type the following command:
sudo gedit /etc/R/Renviron.site
Enter the following lines:
http_proxy='http://username:password#abc.com:port/'
https_proxy='https://username:password#xyz.com:port/'
Replace username, password, abc.com, xyz.com and port with these settings specific to your network.
Quit R and launch again.
This should solve your problem with 'libcurl' and 'curl' method. However, I have not tried it with 'httr'. One way to do that with 'httr' only for that session is as follows:
library(httr)
set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))
You need to substitute settings specific to your n/w in relevant fields.
Inspired by all the responses related on the internet, finally I've found the solution to correctly configure the Proxy for R and Rstudio.
There are several steps to follow, perhaps some of the steps are useless, but the combination works!
Add environment variables http_proxy and https_proxy with proxy details.
variable name: http_proxy
variable value: https://user_id:password#your_proxy:your_port/
variable name: https_proxy
variable value: https:// user_id:password#your_proxy:your_port
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties)
e.g."C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents.
Note that: if it does not exist you can create it just by writing this command in R:
file.edit('~/.Renviron')
Then add these six lines to the initials of the file:
options(internet.info = 0)
http_proxy = https:// user_id:password#your_proxy:your_port
http_proxy_user = user_id:password
https_proxy = https:// user_id:password0#your_proxy:your_port
https_proxy_user = user_id:password
ftp_proxy = user_id:password#your_proxy:your_port
Restart R. Type the following commands in R to assure that the configuration above works well:
Sys.getenv("http_proxy")
Sys.getenv("http_proxy_user")
Sys.getenv("https_proxy")
Sys.getenv("https_proxy_user")
Sys.getenv("ftp_proxy")
Now you can install the packages as you want by using the command like:
install.packages("mlr",method="libcurl")
It's important to add method="libcurl", otherwise it won't work.
On Windows 7 I solved this by going into my environment settings (try this link for how) and adding user variables http_proxy and https_proxy with my proxy details.
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties) e.g.
"C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
Simplest way to get everything working in RStudio under Windows 10:
Open up Internet Explorer, select Internet Options:
Open editor for Environment variables:
Add a variable HTTP_PROXY in form:
HTTP_PROXY=http://username:password#localhost:port/
Example:
HTTP_PROXY=http://John:JohnPassword#localhost:8080/
RStudio should work:
Tried all of these and also the solutions using netsh, winhttp etc.
Geek On Acid's answer helped me download packages from the server but none of these solutions worked for using the package I wanted to run (twitteR package).
The best solution is to use a software that let's you configure system-wide proxy.
FreeCap (free) and Proxifier (trial) worked perfectly for me at my company.
Please note that you need to remove proxy settings from your browser and any other apps that you have configured to use proxy as these tools provide system-wide proxy for all network traffic from your computer.
Find your R home with R.home("home")
Add following lines to Renviron.site in your R home
http_proxy=http://proxy.dom.com/
http_proxy_user=user:passwd
https_proxy=https://proxy.dom.com/
https_proxy_user=user:passwd
Open R -> R reads Renviron.site in its home -> it should work :)
My solution on a Windows 7 (32bit). R version 3.0.2
Sys.setenv(http_proxy="http://proxy.*_add_your_proxy_here_*:8080")
setInternt2
updateR(2)

How do I tell the R interpreter how to use the proxy server?

I'm trying to get R (running on Windows) to download some packages from the Internet, but the download fails because I can't get it to correctly use the necessary proxy server. The output text when I try the Windows menu option Packages > Install package(s)... and select a CRAN mirror is:
> utils:::menuInstallPkgs()
--- Please select a CRAN mirror for use in this session ---
Warning: unable to access index for repository http://cran.opensourceresources.org/bin/windows/contrib/2.12
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.12
Error in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) :
no packages were specified
In addition: Warning message:
In open.connection(con, "r") :
cannot open: HTTP status was '407 Proxy Authentication Required'
I know the address and port of the proxy, and I also know the address of the automatic configuration script. I don't know what the authentication is called, but when using the proxy (in a browser and some other applications), I enter a username and password in a dialog window that pops up.
To set the proxy, I tried each of the following:
Sys.setenv(http_proxy="http://proxy.example.com:8080")
Sys.setenv("http_proxy"="http://proxy.example.com:8080")
Sys.setenv(HTTP_PROXY="http://proxy.example.com:8080")
Sys.setenv("HTTP_PROXY"="http://proxy.example.com:8080")
For authentication, I similarly tried setting the http_proxy_user environment variable to:
ask
user:passwd
Leaving it untouched
Am I using the right commands in the right way?
You have two options:
Use --internet2 or setInternet2(TRUE) and set the proxy details in the control panel, in Internet Options
Do not use either --internet2 or setInternet2(FALSE), but specify the environment variables
EDIT: One trick is, you cannot change your mind between 1 and 2, after you have tried it in a session, i.e. if you run the command setInternet2(TRUE) and try to use it e.g. install.packages('reshape2'), should this fail, you cannot then call setInternet2(FALSE). You have to restart the R session.
As of R version 3.2.0, the setInternet2 function can set internet connection settings and change them within the same R session. No need to restart.
When using option 2, one way (which is nice and compact) to specify the username and password is http_proxy="http://user:password#proxy.example.com:8080/"
In the past, I have had most luck with option 2
If you want internet2 to be used everytime you use R you could add the following line to the Rprofile.site file which is located in R.x.x\etc\Rprofile.site
utils::setInternet2(TRUE)
I've solved my trouble editing the file .Renviron as documented in Proxy setting for R.
EDITED
The solutions based on the setInternet2 statement do not work with the recent R versions because setInternet2 is declared defunct.
I'm using the 4.2.1 (on Win 11Pro) while I never had any problems in previous versions .
So to solve the problem need to modify some config files in order to fix the proxy issue not only for packages installation but, in general, also to acced to a remote resource (ie. boundary maps in my case).
The question "Proxy setting for R" collect a lot of solutions. I've found that this one has solved both my problems (packages installation and remote resources) explaining step-by-step how to edit the file .Renviron
Other solutions based on the customization of the file Renviron.site for me doesn't work
install.packages("RCurl")
that will solve your problem.

Resources