R tools to read/write files between local and remote machines - r

I want to work on a cluster from my local Rstudio. So far, I'm using the following code to read a file:
to_read <- read.table(
pipe('ssh cluster_A "cat /path_on_cluster/file.txt"'),
h=T)
The autologin on cluster_A works fine as ssh directly gets my ~/.ssh/id_rsa file key.
There's 2 issues:
It doesn't work with fread, so can be quite slow to read
I haven't found a way to write files, only read
I was hoping to use scp as a workaround to these issues with something like that:
library(RCurl)
scp(host = "cluster_A",
path = "/path_on_cluster/file.txt",
keypasswd = NA, user = "user_name", rsa = TRUE,
key = "~/.ssh/id_rsa.pub")
But I can't find a way to make it work, as scp from R runs into some issues ("Protocol "scp" not supported or disabled in libcurl"). I can't find any good answer from a google search.
If anyone has a method to use fread and write.table (or other) between my local machine and a remote cluster, that would be very helpful!

Related

How can I read a json file from a remote server in R?

So I have a collection on json files located on my local machine that I am reading in currently using the command
file <- tbl_df(ndjson::stream_in("path/to/file.json")
I have copied these files to a linux server (using WinSCP) and I want to stream them in to my R session just like I did in the above code with ndjson. When searching for ways to do this I came across one method using RCurl that looked like this
file <- scp(host = "hostname", "path/to/file.json", "pass", "user")
but that returned an error
Error in function (type, msg, asError = TRUE) : Authentication failure
but either way I want to avoid copying my passphrase into my Rscript as other will see this script. I also came across a method suggesting this
d <- read.table(pipe('ssh -l user host "cat path/to/file.json"'))
however this command returned the error
no lines available in input
and I believe read.table would cause me issues anyways. Does anyone know I way I could read new line delimited json files from a remote server into an R session? Thank you in advance! Let me know if I can make my question more clear.

Use R's file.copy() to copy to a network (shared) drive

I need to be able to copy files from my local drive to a network drive using R. The syntax I'm using is:
orig_dir <- "C:/Users/bshelto1/Documents/testtoremove1/pcr"
target_dir <- "//nasgwss0075pn/payment_intergrity/Fraud Business Intelligence/Operations Documents/Tableau Server/Source Data/PCR Dashboards/tableau_data"
file.copy(paste0(orig_dir, "/total_pcr_tins.csv"), target_dir)
There is no error message. It just prints "TRUE", like it worked.
When I change the target_dir to another local folder, file.copy() works great. I didn't forsee this being an issue since I can write and download from the network drive in R without any issues.
What am I missing?

How to read a file that is located on a linux server using R

I have a CSV file and I want to work on it I've tried to read it by using this code
d = read.table( pipe( 'ssh don#140.184.134.189 "cat cluster.csv"' ), header = T )
But I get no result and get this message:
"error in read table"
Without asking about my password.
Also, how do you run an R script fes.r that is located on the same server?
You can first try this, continuing along the lines you are on:
> d <- read.table(pipe('ssh -l don 140.184.134.189 "cat cluster.csv"'))
don#140.184.134.189 password: # type password here
If you don't get prompted for a password, then there is likely a configuration problem with your ssh. Please note that ssh has to be installed and in your $PATH (meaning R can invoke it from anywhere it is running).
If this option doesn't work, then you can try using scp from the RCurl package.
Try the following:
x = scp("140.184.134.189", "cluster.csv", "PASSPHRASE", user="don")
Here you should replace "PASSPHRASE" with the password of your local SSH key.
One other thing to check is whether "cluster.csv" is really the correct path to your file on the remote server. But it seems that you are not even getting this far, so fix the ssh problem first.
Hat tip to this Stack Overflow post for inspiration.
You could take a different approach and install Rstudio server on your remote linux machine.
Rstudio server
You can avoid the password problem by setting up an ssh key pair, and adding your public key to the ~/.ssh/authorized_keys file on the server.
You can see how to run an R script from command line here: Run R script from command line

Copying file to sharepoint library in R

I am trying to copy files from a network drive location to a sharepoint library in R. The sharepoint library location requires user authentication and I was wondering how I can copy these files and pass authentication in code. A simple file.copy does not work. I was attempting to use the getURL() function from RCurl library but that hasn't worked either. I was wondering how I can accomplish this task - copying files while passing authentication.
Here are some code snippets that I have tried so far:
library(RCurl)
from <- "filename"
to <- "\\\\sharepoint.company.com\\Directory" #First attempt with just sharepoint location
to <- "file://sharepoint.company.com/Directory" #Another attempt with different format
h = getCurlHandle(header = TRUE, userpwd = "username:password")
getURL(to, verbose = TRUE, curl = h)
status <- file.copy(from, to)
Thank you!
Not the most elegant solution but if you're looking to save into a single library on SharePoint, you can first map that library as a drive on your local machine.
Simply use setwd() to point to whatever drive letter you mapped the library to. You can then treat that Sharepoint library as if it were any other shared drive location, reading and writing files from/to it.
I just use the following function to copy files to SharePoint. The only issue will be the file that was transferred will remain checked-out until the File is manually Checked-in for others to use.
saveToSharePoint <- function(fileName)
{
cmd <- paste("curl --max-time 7200 --connect-timeout 7200 --ntlm --user","username:password",
"--upload-file /home/username/FolderNameWhereTheFileToTransferExists/",fileName,
"teamsites.OrganizationName.com/sites/PageTitle/Documents/UserDocumentation/FolderNameWhereTheFileNeedsToBeCopied/",fileName, sep = " ")
system(cmd)
}
saveToSharePoint("SomeFileName.Ext")
If you have SharePoint online, you can navigate to that library and click on the "Sync to Computer" button (has an icon with arrows and a computer). Then you can have this as a oneDrive and write directly to it.

Proxy setting for R

I am facing problem while conecting R with internet in my office. May be this due to LAN settings. I tried the almost all possible ways I come across in the web (see below) but still in vain.
Method1: Invoking R using --internet2
Method2: Invoking R by setting ~/Rgui.exe http_proxy=http:/999.99.99.99:8080/ http_proxy_user=ask
Method3: Setting Setinternet2=TRUE
Method4:
curl <- getCurlHandle()
curlSetOpt(.opts = list(proxy = '999.99.99.99:8080'), curl = curl)
Res <- getURL('http://www.cricinfo.com', curl = curl)
In above all methods I can able to load packages directly from CRAN also able to download files using download.file command
But using getURL(RCurl), readHTMLTable(XML), htmlTreeParse(XML) commands I am unable to extract web data. I am getting ~<HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD>~ error.
How to set LAN proxy settings for XML package in R?
On Mac OS, I found the best solution here. Quoting the author, two simple steps are:
1) Open Terminal and do the following:
export http_proxy=http://staff-proxy.ul.ie:8080
export HTTP_PROXY=http://staff-proxy.ul.ie:8080
2) Run R and do the following:
Sys.setenv(http_proxy="http://staff-proxy.ul.ie:8080")
double-check this with:
Sys.getenv("http_proxy")
I am behind university proxy, and this solution worked perfectly. The major issue is to export the items in Terminal before running R, both in upper- and lower-case.
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
And then close the Rstudio, furthermore you have to:
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents. Note that if it does not exist you can creat it just by writing this command in RStudio:
file.edit('~/.Renviron')
Add these two lines to the initials of the file:
options(internet.info = 0)
http_proxy="http://user_id:password#your_proxy:your_port"
And that's it..??!!!
The problem is with your curl options – the RCurl package doesn't seem to use internet2.dll.
You need to specify the port separately, and will probably need to give your user login details as network credentials, e.g.,
opts <- list(
proxy = "999.999.999.999",
proxyusername = "mydomain\\myusername",
proxypassword = "mypassword",
proxyport = 8080
)
getURL("http://stackoverflow.com", .opts = opts)
Remember to escape any backslashes in your password. You may also need to wrap the URL in a call to curlEscape.
I had the same problem at my office and I solved it adding the proxy in the destination of the R shortcut; clik on right button of the R icon, preferences, and in the destination field add
"C:\Program Files\R\your_R_version\bin\Rgui.exe" http_proxy=http://user_id:passwod#your_proxy:your_port/
Be sure to put the directory where you have the R program installed. That works for me. Hope this help.
This post pertains to R proxy issues on *nix. You should know that R has many libraries/methods to fetch data over internet.
For 'curl', 'libcurl', 'wget' etc, just do the following:
Open a terminal. Type the following command:
sudo gedit /etc/R/Renviron.site
Enter the following lines:
http_proxy='http://username:password#abc.com:port/'
https_proxy='https://username:password#xyz.com:port/'
Replace username, password, abc.com, xyz.com and port with these settings specific to your network.
Quit R and launch again.
This should solve your problem with 'libcurl' and 'curl' method. However, I have not tried it with 'httr'. One way to do that with 'httr' only for that session is as follows:
library(httr)
set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))
You need to substitute settings specific to your n/w in relevant fields.
Inspired by all the responses related on the internet, finally I've found the solution to correctly configure the Proxy for R and Rstudio.
There are several steps to follow, perhaps some of the steps are useless, but the combination works!
Add environment variables http_proxy and https_proxy with proxy details.
variable name: http_proxy
variable value: https://user_id:password#your_proxy:your_port/
variable name: https_proxy
variable value: https:// user_id:password#your_proxy:your_port
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties)
e.g."C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents.
Note that: if it does not exist you can create it just by writing this command in R:
file.edit('~/.Renviron')
Then add these six lines to the initials of the file:
options(internet.info = 0)
http_proxy = https:// user_id:password#your_proxy:your_port
http_proxy_user = user_id:password
https_proxy = https:// user_id:password0#your_proxy:your_port
https_proxy_user = user_id:password
ftp_proxy = user_id:password#your_proxy:your_port
Restart R. Type the following commands in R to assure that the configuration above works well:
Sys.getenv("http_proxy")
Sys.getenv("http_proxy_user")
Sys.getenv("https_proxy")
Sys.getenv("https_proxy_user")
Sys.getenv("ftp_proxy")
Now you can install the packages as you want by using the command like:
install.packages("mlr",method="libcurl")
It's important to add method="libcurl", otherwise it won't work.
On Windows 7 I solved this by going into my environment settings (try this link for how) and adding user variables http_proxy and https_proxy with my proxy details.
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties) e.g.
"C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
Simplest way to get everything working in RStudio under Windows 10:
Open up Internet Explorer, select Internet Options:
Open editor for Environment variables:
Add a variable HTTP_PROXY in form:
HTTP_PROXY=http://username:password#localhost:port/
Example:
HTTP_PROXY=http://John:JohnPassword#localhost:8080/
RStudio should work:
Tried all of these and also the solutions using netsh, winhttp etc.
Geek On Acid's answer helped me download packages from the server but none of these solutions worked for using the package I wanted to run (twitteR package).
The best solution is to use a software that let's you configure system-wide proxy.
FreeCap (free) and Proxifier (trial) worked perfectly for me at my company.
Please note that you need to remove proxy settings from your browser and any other apps that you have configured to use proxy as these tools provide system-wide proxy for all network traffic from your computer.
Find your R home with R.home("home")
Add following lines to Renviron.site in your R home
http_proxy=http://proxy.dom.com/
http_proxy_user=user:passwd
https_proxy=https://proxy.dom.com/
https_proxy_user=user:passwd
Open R -> R reads Renviron.site in its home -> it should work :)
My solution on a Windows 7 (32bit). R version 3.0.2
Sys.setenv(http_proxy="http://proxy.*_add_your_proxy_here_*:8080")
setInternt2
updateR(2)

Resources