How to easily execute R commands on remote server? - r

I use Excel + R on Windows on a rather slow desktop. I have a full admin access to very fast Ubuntu-based server. I am wondering: how to remotely execute commands on the server?
What I can do is to save the needed variables with saveRDS, and load them on server with loadRDS, execute the commands on server, and then save the results and load them on Windows.
But it is all very interactive and manual, and can hardly be done on regular basis.
Is there any way to do the stuff directly from R, like
Connect with the server via e.g. ssh,
Transfer the needed objects (which can be specified manually)
Execute given code on the server and wait for the result
Get the result.
I could run the whole R remotely, but then it would spawn a network-related problems. Most R commands I do from within Excel are very fast and data-hungry. I just need to remotely execute some specific commands, not all of them.

Here is my setup.
Copy your code and data over using scp. (I used github, so I clone my code from github. This has the benefit of making sure that my work is reproducible)
(optional) Use sshfs to mount the remote folder on your local machine. This allows you to edit the remote files using your local text editor instead of ssh command line.
Put all things you want to run in an R script (on the remote server), then run it via ssh in R batch mode.

There are a few options, the simplest is to exchange secure keys to avoid entering SSH/SCP passwords manually all the time. After this, you can write a simple R script that will:
Save necessary variables into a data file,
Use scp to upload the data file to ubuntu server
Use ssh to run remote script that will process the data (which you have just uploaded) and store the result in another data file
Again, use scp command to transfer the results back to your workstation.
You can use R's system command to run scp and ssh with necessary options.
Another option is to set up cluster worker at the remote machine, then you can export the data using clusterExport and evaluate expressions using clusterEvalQ and clusterApply.

There are a few more options:
1) You can do the stuff directly from R by using Rserve. See: https://rforge.net/
Keep in mind that Rserve can accept connections from R clients, see for example how to connect to Rserve with an R client.
2) You can set up cluster on your linux machine and then use these cluster facilities from your windows client. The simplest is to use Snow, https://cran.r-project.org/package=snow, also see foreach and many other cluster libraries.

Related

How to run Python files with Airflow using Samba connection

I have the following problem. I have a data pipeline at work that transforms raw data and loads it to a cloud database, for various projects. There are Python scripts for the project-based transformations, but everything must be done manually (defining the transformer's project-based inputs, run the transformer, load the data).
I want to automate this process with Airflow. I created the above steps as tasks in Python. The Airflow instance is running on some computer, which must reach a network drive, where the raw data and the transformer scripts are located. The required connection type is Samba.
I managed to connect to the drive and create a SambaHook object:
samba_file_share: Final[object] = SambaHook(connection_id, file_share_name)
In one task, I need to call and run the transformer script. With a former solution (without Samba) I used Popen, which worked fine. However, I must use Samba now, and I face the following problem.
I have the path of the transformer script by reading out the root folder of the file share from the Samba object, and join the path of the transformer to it:
samba_file_share._join_path(transformer_path)
If I print this out, the path is correct, and the network is available. If I fed it as a string to Popen (or byte string or path-like object) I got the error "No such file or directory".
Can anyone help with it? How can I fed it to Popen to run the script; or should I use something else, not Popen, to run it? The Samba documentation is totally incomplete, I could not found anything there so far.
Thanks,
Marci
This automated Airflow solution works perfectly if I connect from a machine that easily access the network drive.
However, that is only for development, and in production it must run in some other machine which has no direct access to the drive. I must use Samba to connect to it, and it breaks everything.

Is it possible to run a unix script using oozie outside hadoop cluster?

We have written a unix batch script and it is hosted on a unix server outside Hadoop Cluster. So is it possible to run that script via oozie?
If it is possible then how can this be achieved?
What is the script doing? If the script just needs to run regulary you can as well use a cronjob or something like that.
Besides this, Oozie has a action for SSH Actions on Remote hosts.
https://oozie.apache.org/docs/3.2.0-incubating/DG_SshActionExtension.html
Maybe you can work something out with that by loging into the remotehost, run the script, wait for completetion and work on from there.

Example SFTP batch upload script for AS400 server to upload to a Unix SFTP server

Suppose I have a file called helloworld.txt on an AS400 and I want to write a script to automate the daily upload of the source file helloworld.txt on an AS400 server to upload to a Unix SFTP server, say sftp://exampleunixsftp.com?
Does someone have such a script? Is OpenSSH the only tool that can be used on the AS400 to get this accomplished or are there any other methods? If LFTP could be installed on the AS400, that would be an easy solution but since it's only for Unix/Linux/Win/Mac then I don't have this option. I read Scott Klement's article at:
http://systeminetwork.com/article/ssh-scp-and-sftp-tools-openssh
I just need an example basic script to get this done. I'd appreciate this.
The MidrangeWiki topic on SSH has some information and examples that may be helpful.
sftp is intended primarily as an interactive utility. scp performs the same function non-interactively.
The command, once authentication is set up, would simply be:
scp helloworld.txt username#exampleunixsftp.com:<destination path>
LFTP can be installed with some preparation in the PASE environment. See Open Source Binaries. Currently offline. Use the Google cache.

deleteing multiple files in remote box via sh

Requirement
Several files in remote machines ought to be deleted via sh. Name of the files to be deleted are know
Approach
1) script was written with ftp (requires credential) and delete command. File names were passed as array(iterated via for loop-with ftp+delete commands enclosed within for loop). files were not getting deleted by this approach
2) another approach attempted was to pass temp.ftp(which contains delete command) to ftp command and rm the temp.ftp file eg.ftp <
Request
require pointers to delete muliple files in remote machine via shell script
I recommend using ssh instead of ftp for interfacing with the remote unix machine.
SSH allows you to run remote commands easily and securely.
Read this article for more info.

Pass commands to a running R-Runtime

Is there a way to pass commands (from a shell) to an already running R-runtime/R-GUI, without copy and past.
So far I only know how to call R via shell with the -f or -e options, but in both cases a new R-Runtime will process the R-Script or R-Command I passed to it.
I rather would like to have an open R-Runtime waiting for commands passed to it via whatever connection is possible.
What you ask for cannot be done. R is single threaded and has a single REPL aka Read-eval-print loop which is, say, attached to a single input as e.g. the console in the GUI, or stdin if you pipe into R. But never two.
Unless you use something else as e.g. the most excellent Rserve which (when hosted on an OS other than Windoze) can handle multiple concurrent requests over tcp/ip. You may however have to write your custom connection. Examples for Java, C++ and R exist in the Rserve documentation.
You can use Rterm (under C:\Program Files\R\R-2.10.1\bin in Windows and R version 2.10.1). Or you can start R from the shell typing "R" (if the shell does not recognize the command you need to modify your path).
You could try simply saving the workspace from one session and manually loading it into the other one (or any kind of variation on this theme, like saving only the objects you share between the 2 sessions with saveRDS or similar). That would require some extra load and save commands but you could automatise this further by adding some lines in your .RProfile file that is executed at the beginning of every R session. Here is some more detailed information about R on startup. But I guess it all highly depends on what are you doing inside the R sessions. hth

Resources