How to set proxy for XML package in R - r

How to setup proxy for XML package?
The Rcurl package is working fine if I set:
options(RCurlOptions = list(proxy = "111.22.33.44.333", proxyport = 0000))
But id doesn't work for XML package functions.
If I set the explorer setup still nothing:
setInternet2(TRUE)
I have also added the setInternet2(TRUE) to .Rprofile but still is failing to take on the proxy. So how to assure the proxy is set globally or rather how to make it work for XML package function such as readHTMLTable etc.

As RCurl is working
try to download the HTML-Page via getURL and then parse it with readHTMLTable:
require(RCurl)
require(XML)
url <- "yourURL.com"
doc_raw <- getURL(url)
tab <- readHTMLTable(doc_raw)

Related

In R how do you authenticate with an NTLM proxy when using the curl package

The R4.2.0 update made libcurl the default for the download.file function (and removed the wininet method for some types of url schemes). The same is true for the download.packages function.
My work uses a proxy.pac file in conjunction with NTLM authentication for the proxy. As a result, the changes made to the download.file make it not work with this type of proxy unless you specifically choose the wininet option (but this only works for some url schemes).
One of the consequences of this confluence of configurations is that users have to manually change the default package manager to be an internal mirror of CRAN to use the package installer at all in RStudio.
I've managed to use the curl library to be able to download files through the proxy:
base_directory <- "<save_path>"
handle <- curl::new_handle()
url <- "url_page_to_download"
handle_opts <- list(
proxy = curl::ie_get_proxy_for_url(url), # figures out the relevant proxy string for the url (or url stub you are using)
proxyauth = 8, # value of CURLAUTH_NTLM from curl_symbols()
proxyuserpwd = ":" # specifically set to “:” to force windows to do the negotiation. This string is the same as sending no username and password
)
handle_setopt(handle,.list=handle_opts)
data_file_location <- paste0(base_directory,"state.txt")
curl_download(url, destfile=data_file_location, quiet=FALSE, handle=handle)
I would like to find a way to do this that uses the libcurl method of the download.packages and download.file functions, which involves setting some environment variables (from what I can tell), but I can't find the set of parameters to place in the environment variables to get it to work.
How can I replicate this with the built in functions?

Attempting to download files from SFTP using R

I'm trying to implement R in the workplace and save a bit of time from all the data churning we do.
A lot of files we receive are sent to us via SFTP as they contain sensitive information.
I've looked around on StackOverflow & Google but nothing seems to work for me. I tried using the RCurl Library from an example I found online but it doesn't allow me to include the port(22) as part of the login details.
library(RCurl)
protocol <- "sftp"
server <- "hostname"
userpwd <- "user:password"
tsfrFilename <- "Reports/Excelfile.xlsx"
ouptFilename <- "~/Test.xlsx"
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd=userpwd)
I end up getting the error code
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
Any help would be greatly appreciated as this will save us loads of time!
Thanks,
Shan
Looks like a similar situation here: Using R to download SAS file from ftp-server
I'm no expert in r but there it looks like getBinaryUrl() worked instead of getURL() in the example given.
Hope that helps
M
Note that there are two packages, RCurl and rcurl. For RCurl, I used successfully keyfiles to connect via sftp:
opts <- list(
ssh.public.keyfile = pubkey, # file name
ssh.private.keyfile = privatekey, # filename
keypasswd <- keypasswd # optional password
)
RCurl::getURL(url=uri, .opts = opts, curl = RCurl::getCurlHandle())
For this to work, you need two create the keyfiles e.g. via putty or similar.
I too was having problems specifying the port options when using the getURI() and getURL() functions.
In order to specify the port, you simply add the port as port = #### instead of port(####). For example:
data <- getURI(url = url,
userpwd = userpwd,
port = 22)
Now, like #MarkThomas pointed out, whenever you get an encodoing error, try getBinaryURL() instead of getURI(). In most cases, this will allow you to download SAS files as well as .csv files econded in UTF-8 or LATIN1!!

Reading an online xlsx file into R

I am trying to download spreadsheets from AQR data library into R directly.
I have this link: http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx which prompts a download. However, when trying the following code:
> url1<-"http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
> download.file(url1,destfile="example.xlsx")
I get this error
trying URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
Error in download.file(url1, destfile = "example.xlsx") : cannot open URL 'http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx'
https://www.aqr.com/library/data-sets/value-and-momentum-everywhere-portfolios-monthly is the page from which I am trying to download data(under full set data link).
Could you provide some guidance?
It looks like that link redirects to https, which download.file does not support by default. If you have wget or curl installed you can use
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "wget")
or
download.file("https://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx",
"example.xlsx",
method = "curl")
These and other options are discussed at Download a file from HTTPS using download.file()
I'm not quite sure what is causing the problem for you, but the following worked for me:
library(XLConnect)
##
con <- "http://www.aqr.com/~/media/files/data-sets/value-and-momentum-everywhere-portfolios-monthly.xlsx"
download.file(con,"xlsxFile.xlsx",mode="wb")
##
newWB <- loadWorkbook(
file="xlsxFile.xlsx",
create=F)
##
R> getSheets(newWB)
[1] "VME Portfolios" "Definitions" "Data Sources" "Disclosures"
and here's a screenshot of the downloaded file:

Create an remote directory using SFTP / RCurl

Is it possible to create a directory on an SFTP site using the RCurl package? I found the sftp_create_dirs function, but I could not find an example how to use it.
I tried to set the ftp.create.missing.dirs option to TRUE, as in
library(RCurl)
opts <- list(ftp.create.missing.dirs=TRUE,
ssh.public.keyfile = mypubkey, ssh.private.keyfile = myprivatekey)
ftpUpload("myfile.txt", "sftp://me#www.host.org/newdir/myfile.txt", .opts=opts)
This works if newdir exists, but fails if it does not exists.
Any hint appreciated!

Connect to the Twitter Streaming API using R

I just started playing around with the Twitter Streaming API and using the command line, redirect the raw JSON reponses to a file using the command below:
curl https://stream.twitter.com/1/statuses/sample.json -u USER:PASSWORD -o "somefile.txt"
Is it possible to stay completely within R and leverage RCurl to do the same thing? Instead of just saving the output to a file, I would like to parse each response that is returned. I have parsed twitter search results in the past, but I would like to do this as each response is received. Essentially, apply a function to each JSON response.
Thanks in advance.
EDIT: Here is the code that I have tried in R (I am on Windows, unfortunately). I need to include the reference to the .pem file to avoid the error. However, the code just "runs" and I can not seem to see what is returned. I have tried print, cat, etc.
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
getURL("https://stream.twitter.com/1/statuses/sample.json",
userpwd="USER:PWD",
cainfo = "cacert.pem")
I was able to figure out the basics, hopefully this helps.
#==============================================================================
# Streaming twitter using RCURL
#==============================================================================
library(RCurl)
library(rjson)
# set the directory
setwd("C:\\")
#### redirects output to a file
WRITE_TO_FILE <- function(x) {
if (nchar(x) >0 ) {
write.table(x, file="Twitter Stream Capture.txt", append=T, row.names=F, col.names=F)
}
}
### windows users will need to get this certificate to authenticate
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
### write the raw JSON data from the Twitter Firehouse to a text file
getURL("https://stream.twitter.com/1/statuses/sample.json",
userpwd=USER:PASSWORD,
cainfo = "cacert.pem",
write=WRITE_TO_FILE)
Try the twitter api package for R.
install.packages('twitteR')
library(twitteR)
I think this is what you need.

Resources