How to setup AWS cluster to work with openCPU?

How to setup AWS cluster to work with openCPU? - r

I have two EC2 machines: master and slave. SSH keys are generated for user ubuntu and saved to ~/.ssh/authorized_keys on both machines. Thus I can use the cluster from master node as ubuntu user like this:
library(doSNOW)
cluster_options <-
c(rep(list(
list(host = "ec2-xx-xx-xx-xx.compute-1.amazonaws.com",
snowlib = "/usr/local/lib/R/site-library")), 2))
cl <- makeCluster(cluster_options, type = "SOCK")
clusterApply(cl, 1:2, get("+"), 3)
stopCluster(cl)
But when I call it via openCPU it gives permission denied message.
Currently I'm thinking about two possible solutions:
Add SSH keys for opencpu user. But I don't have idea how to do it as opencpu is non-interactive user
Make slaves accessible by master without any SSH keys
I'd prefer the first way and definitely need help here. But second way is also ok.

Finally I ended up with solution. It has several aspects:
Hostbased Authentication should be configured between two EC2 nodes. Good tutorial can be found here: https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Host-based_Authentication
OpenCPU should be installed on both nodes.
SSH keys should be generated for www-data user (R process is executed with this user). Delicate aspect here is that www-data is non-interactive user, so we need to make it interactive (edit /etc/passwd), generate SSH keypair for www-data, add public key to server node and make www-data non-interactive again.
Not so elegant, but it works :)

Related

Preserve environment variables when spawning shiny processes within a container

I have a running Docker container with the shiny server from the slightly modified rocker/shiny image.
The default shiny-server.conf file sets the shiny user as the one under
# Define the user we should use when spawning R Shiny processes
run_as shiny;
means the server is running as root by default, but the worker processes for shiny apps are run as user shiny
The apps themselves use a data warehouse connection to the SQL server, initialized via RODBC. While we did not want to put the entire connection details string (including DB host and password) into the codebase, we wanted to read them from the environment variables with which the container is created by running the following routine
HOST <- Sys.getenv("host")
DB <- Sys.getenv("db")
UID <- Sys.getenv("uid")
PWD <- Sys.getenv("pwd")
conn<-paste0("driver={ODBC Driver 17 for SQL Server};server=",HOST,";database=",DB,";uid=",UID,";pwd=",PWD)
dbhandle<-odbcDriverConnect(conn)
The problem is, that those env variables are empty when the worker process is spawned within a container as the user shiny.
If I try to run the same code in the interactive R console (as both root, or shiny user) I am getting the env variables as expected.
Any input would be much appreciated. Please note I do not intend to use docker secrets as I am not running the app within a docker swarm cluster, just a standalone Rancher OS host.
EDIT:
While .Renviron file might be a viable alternative to solving that particular problem, it would entail putting the variables into the codebase, which we are trying to avoid here.

I added the following in shiny-server.sh start script which is the docker container's CMD, as suggested by Ralf Stubner
env > /home/shiny/.Renviron
chown shiny.shiny /home/shiny/.Renviron

R - Connect via ssh and execute a command

I would like to connect via ssh to certain equipment in a network.
The requisites are:
It must run a command and capture the output of the ssh session in R (or in bash, or any other programming language, but I would prefer it in R language)
It must enter a plain-text password (as this equipment hasn't been accessed before, and can't be changed with a rsa keypair), so the ssh.utils package doesn't meet this requirement
sshpass can't be used, as I have noticed that it doesn't work for some devices I tested.
I've read all this posts but I can't find an effective way to perform it: link 1, link 2, link 3, link 4
I know the requirements are hard to accomplish, but thank you for your effort!
EDIT:
Sorry if I didn't make myself understandable. I mean I work locally in R and I want to connect to +3000 devices in all of my network via ssh. It is Ubiquiti equipment, and the only open ports are 80 and 22.
If ssh doesn't work, I will use the RSelenium package for R and extract info from port 80. But first I will try with ssh pory 22 as it is a lot more efficient than opening an emulated browser.
The big problem in all these Ubiquiti equipment is that they have a password to log in. That's why requisite No.2 is needed. When I must enter a server that I know, I spend time setting up the rsa keypair so that I don't have to enter a password everytime I connect to a specific server, but it's impossible (or at least, for me it's impossible) to configure all +3000 Ubiquiti equipment with these keypairs.
That's why I don't use snmp, for example, as this equipment maybe they have it activated or not, or the snmp configuration is mistaken. I mean, I have to use something that's activated by default, and in a way, ordered. And only port 80 and port 22 are activated and I know all the user's and password's equipment.
And sshpass is an utility in UNIX/Linux like this link explains that works for servers but doesn't work for Ubiquiti equipment, as long as I've tested it. So I can't use it.
The command I need to extract the output from is mca-status. Simply by entering that into the console makes it print some stats I will like to get from the Ubiquiti equipment.
Correct me, please, if I am wrong in something I've posted. Thanks.

I think you have this wrong. I also have no idea what you are trying to say in point 2, and I have not idea what point 3 is supposed to say.
Now: ssh is a authentication mechanism allowing you (trusted) access to another machine and the ability to run a command. This can be as simple as
edd#max:~$ ssh bud Rscript -e '2+2'
[1] 4
edd#max:~$
where I invoke R (or rather, Rscript) on the machine 'bud' (my desktop) from a session on the machine 'max' (my server). That command could be anything including something which writes to temporary or permanent files. You can then retrieve those files via scp.
Authentication is handled independently -- on Unix we often use ssh-agent which run in the background and against you authenticate on login.

Finally I solved it using the rPython package and the python's paramiko module, as there was no way to do it purely via R.
library(rPython)
python.exec(python.code = c("import paramiko",
"ssh = paramiko.SSHClient()",
"ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())",
sprintf('ssh.connect("%s", username="USER", password="PASSWORD") ', IP),
'stdin, stdout, stderr = ssh.exec_command("mca-status")',
'stats = stdout.readlines()'))

Using Erlang SSH Application to execute commands on remote UNIX Servers

I have always used the os:cmd/1 method to call operating system routines. Now, i know that erlang has an ssh application. I would like to know how i can use this module to ssh into a SOLARIS server, run a command and collect the reply. I believe that such an operation would be handled asynchronously. I need an example using the ssh application built into Erlang doing this:
Now, at times we setup SSH KEYS between servers to prevent password prompt especially if one is using a script to execute tasks on remote servers. i am intending to write many Erlang programs or escripts that will interact with many remote servers within our environment. i need a complete example and explanation on how ssh with and/or without password prompt can be handled using erlang ssh application. NOTE: In the screen shot above, the two servers had SSH KEYS set up and so there is no password prompt when ssh is initiated from any of the two.

The correct erlang native API to achieve this is not ssh, which only implements a user-interactive shell for ssh, but instead use ssh_connection. Take a look at ssh_connection:exec/4
To be more complete, use ssh:connect to establish a connection and then using the handler returned from it to connect with ssh_connection:exec/4

I didn't try it myself and can't provide a complete example but the documentation seems to be a good starting point.

Using snow (and snowfall) with AWS for parallel processing in R

In relation to my earlier similar SO question , I tried using snow/snowfall on AWS for parallel computing.
What I did was:
In the sfInit() function, I provided the public DNS to socketHosts parameter like so
sfInit(parallel=TRUE,socketHosts =list("ec2-00-00-00-000.compute-1.amazonaws.com"))
The error returned was Permission denied (publickey)
I then followed the instructions (I presume correctly!) on http://www.imbi.uni-freiburg.de/parallel/ in the 'Passwordless Secure Shell (SSH) login' section
I just cat the contents of the .pem file that I created on AWS into the ~/.ssh/authorized_keys of the AWS instance I want to connect to from my master AWS instance and for the master AWS instance as well
Is there anything I am missing out ?
I would be very grateful if users can share their experiences in the use of snow on AWS.
Thank you very much for your suggestions.
UPDATE:
I just wanted to update the solution I found to my specific problem:
I used StarCluster to setup my AWS cluster : StarCluster
Installed package snowfall on all the nodes of the cluster
From the master node issued the following commands
hostslist <- list("ec2-xxx-xx-xxx-xxx.compute-1.amazonaws.com","ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com")
sfInit(parallel=TRUE, cpus=2, type="SOCK",socketHosts=hostslist)
l <- sfLapply(1:2,function(x)system("ifconfig",intern=T))
lapply(l,function(x)x[2])
sfStop()
The ip information confirmed that the AWS nodes were being utilized

Looks not that bad but the pem file is wrong. But it is sometimes not that simple and many people have to fight with this issues. A lot of tips you can find in this post:
https://forums.aws.amazon.com/message.jspa?messageID=241341
Or check google for other posts.
From my experience most people have problems in these steps:
Can you log onto the machines via ssh? (ssh ec2-00-00-00-000.compute-1.amazonaws.com). Try to use the public DNS, not the public IP to connect.
You should check your "Security groups" in AWS if the 22 port is open for all machines!
If you plan to start more than 10 worker machines you should work on a MPI installation on your machines (much better performance!)
Markus from cloudnumbers.com :-)

I believe #Anatoliy is correct: you're using an X.509 certificate. For the precise steps to take to add the SSH keys, look at the "Types of credentials" section of the EC2 Starters Guide.
To upload your own SSH keys, take a look at this page from Alestic.
It is a little confusing at first, but you'll want to keep clear which are your access keys, your certificates, and your key pairs, which may appear in text files with DSA or RSA.

Erlang: starting a remote node programmatically

I am aware that nodes can be started from the shell. What I am looking for is a way to start a remote node from within a module. I have searched, but have been able to find nothing.
Any help is appreciated.

There's a pool(3) facility:
pool can be used to run a set of
Erlang nodes as a pool of
computational processors. It is
organized as a master and a set of
slave nodes..
pool:start/1,2 starts a new pool.
The file .hosts.erlang is read to
find host names where the pool nodes
can be started. The slave nodes are
started with slave:start/2,3,
passing along Name and, if provided,
Args. Name is used as the first
part of the node names, Args is used
to specify command line arguments.
With pool you get load distribution facility for free.
Master node may be started this way:
erl -sname poolmaster -rsh ssh
Key -rsh here specifies an alternative to rsh for starting a slave node on a remote host. We used SSH here. Make sure your box have working SSH keys, and you can authenticate to the remote hosts using these keys.
If there are no hosts in the file .hosts.erlang, then no slave nodes are started, and you can use slave:start/2,3 to start slave nodes manually passing arguments if needed.
You could, for example start a remote node:
Arg = "-mnesia_dir " ++ M,
slave:start(H, Name, Arg).
Ensure epmd(1) is up and running on the remote boxes in order to start Erlang nodes.
Hope that helps.

A bit more low level that pool is the slave(3) module. Pool builds upon the functionality in slave.
Use slave:start to start a new slave.
You should probably also specify -rsh ssh on the command-line.
So use pool if you need the kind of functionality it offers, if you need something different you can build it yourself out of slave.