OK so I have paramiko v2.2.1 and I am trying to login to a machine and restart a service. Inside the service scripts it basically starts a process via nohup. However if I allow paramiko to disconnect as soon as it is done the process started terminates with a PIPE signal when it writes to stdout.
If I start the service by ssh'ing into the box and manually starting it there is no issue and it runs in the background fine. Also if I add long sleep 10 before disconnecting (close) paramiko it also seems to work just fine.
The service is started via a init.d script via a line like this:
env LD_LIBRARY_PATH=$bin_path nohup $bin_path/ServerLoop.sh \
"$bin_path/Service service args" "$#" &
Where ServerLoop.sh simply calls the service forever in a loop like this so it will never die:
SERVER=$1
shift
ARGS=$#
logger $ARGS
while [ 1 ]; do
$SERVER $ARGS
logger "$SERVER terminated with exit code: $STATUS. Server has been restarted"
sleep 1
done
I have noticed when I start the service by ssh'ing into the box I get a nohup.out file written to the root. However when I run through paramiko I get no nohup.out written anywhere on the system ... ie this after I manually ssh into the box and start the service:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
/nohup.out
And this is after I run through paramiko:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
As I understand it nohup will only redirect the output to nohup.out if "If standard output is a terminal" (from the manual), otherwise it thinks it is saving the output to a file so it does not redirect. Hence I tried the following:
In [43]: import paramiko
In [44]: paramiko.__version__
Out[44]: '2.2.1'
In [45]: ssh = paramiko.SSHClient()
In [46]: ssh.set_missing_host_key_policy(AutoAddPolicy())
In [47]: ssh.connect(ip, username='root', password=not_for_so_sorry, look_for_keys=False, allow_agent=False)
In [48]: stdin, stdout, stderr = ssh.exec_command("tty")
In [49]: stdout.read()
Out[49]: 'not a tty\n'
So I am thinking that nohup is not redirecting to nohup.out when I run it through paramiko because tty is not returning a terminal. I don't know why adding a sleep(10) would fix this though as the service if run on the command line is quite verbose.
I have also noticed that if the service is started from a manual ssh its tty in the ps ax output is still set to the ssh tty ... however if the process is started by paramiko its tty in the ps ax output is set to "?" .. since both processes are run through nohup I would have expected this to be the same.
If the problem is that nohup is indeed not redirecting the output to nohup.out because of the tty is there a way to force this to happen or a better way to run this sort of command via paramiko?
Thanks all, any help with this would be great :)
I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall
My workflow is to send commands from an emacs buffer to an R session in emacs via the ESS package.
a=0;
system("ssh remotehost ls")
a = a+1;
When I run the three lines above in rapid succession (i.e. submit them to the R buffer), the value of a at the end is 0. When I run them slowly, a is 1.
I've only had this issue running an ssh command via system. In all other cases, the commands queue up and all run sequentially.
My colleagues have the exact same issue with their R/vim setup. But we don't have the same issue in RStudio.
Any suggestions here would be great .
ssh eats up any stdin during the system() command. If you paste it line by line then ssh terminates before you submit a=a+1 and thus it gets passed to R instead of ssh. Use system("ssh .. < /dev/null") or system(..., input="") if you don't want terminal input to be eaten by the subprocess.
Working in R 2.14.1, on Windows 7
Using the package parallel in R, I'm trying to take advantage of cores outside of my local machine available on my network, where all remote hosts I am connecting to are identical Windows machines.
The basic form of the commands are as such to make the connection.
library(parallel)
#assume 8 cores per machine
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)))
Of course, trying to debug these things can be pretty tricky, but here is where I'm at with it.
If I specify the manual = TRUE flag as below
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)), manual=TRUE)
there are no problems connecting to the remote host, and running a parallel process. The computers have identical setups to the one that I am working on. Yet, when this manual flag is not set, the connection command hangs.
This seems to indicate to me that since the manual flag bypasses ssh to make the connection to the host, that ssh is the problem when manual=FALSE.
It is not guaranteed at the moment that the remote computers have ssh on them. The question is, given that I have all the pertinent windows login information for my remote hosts, and that I cannot change the settings on the remote computers, how would I connect to cores on remote machines with the package parallel in R without specifying manual = true?
Alternatively, if ssh must be installed for this to happen, let's assume all computers have ssh on them. How would I connect to cores on the remote machines without circumventing ssh?
If you need any more information please let me know, I appreciate the time.
UPDATE 1
8-26-14
Thanks to Steve Weston for his insights. I will provide an update with the exact tools and setup I use to get my system working when it's up and running.
Feel free to comment or post if you have anything else to add as to what may be the best route to go in remote connecting to a windows machine from a windows machine via makePSOCKcluster, where the manual flag is set to FALSE.
When creating a PSOCK cluster with manual=FALSE, the only way to start a worker on a remote machine is with "ssh", "rsh", or something command-line compatible, such as "plink" from PuTTY. The reason is that makePSOCKcluster starts the remote workers using the "system" function to execute commands of the form:
ssh -l user otherhost '/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=myhost PORT=10187 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
You can confirm this by looking at the source code for the newPSOCKnode function in the file snowSOCK.R from the parallel package.
For this to work, the ssh-compatible command must be available on the local machine and a corresponding ssh daemon must be running on each of the remote machines, otherwise makePSOCKcluster will simply hang. I've found that installing a good, working ssh daemon is the difficult part on Windows.
Unfortunately, manual=TRUE is generally the easiest way to create a PSOCK cluster on multiple Windows machines.
Helle everyone, I had the same problem and I managed to solve it. It is June 2018 when I'm writing this answer, my OS is windows 10 and the R version is 3.2.2. It is surprising to see this problem still exists after 4 years. I hope it can be fixed in the following release.
Before you move on, please make sure you can access the server in cmd using ssh. I didn't put any password in my code because I have the private key, you don't need to do that and you will see the reason later.
Fixing The problem
File directory
Since the function makePSOCKcluster works when manually start the workers, my first trying is to let manual=TRUE, and see what's the output. Here is my result:
machineAddresses <-list(list(host='192.168.1.220',user='jeff'))
cl <- makePSOCKcluster(spec,manual = F)
> Manually start worker on 192.168.1.220 with
"C:/PROGRA~1/R/R-32~1.2/bin/x64/Rscript" -e
"parallel:::.slaveRSOCK()" MASTER=DESKTOP-U5JA32O PORT=11756
OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
Ok, Here is the first problem. The Rscript location is incorrect(The location of Rscript in the server). Generally, it locates in C:\Program Files. In my server is C:\Program Files\R\R-3.2.2\bin. So we need to correct them by adding more option to tell this stupid code where the Rscript is:
machineAddresses <-list(list(host='192.168.1.220',
user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript"))
CMD problem
Once you fix the directory problem, you will find that the code still hangs forever. Then we need to check if we can manually access the server in R, my code is:
system("ssh jeff#192.168.1.220")
> GetConsoleMode on STD_INPUT_HANDLE failed with 6
I honestly don't know what does this error mean, but we just need to fix that. Inspired by #Steve Weston, I decide to use PuTTY, so I install it, and change my code to:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
The option -pw means the password. Because I'm a newbie to PuTTY, I don't know how to let the private key automatically work in PuTTY. Therefore, I use the easiest way to deal with that: put your password! The above code is equivalent to the following in cmd:
plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
And this is exactly what we will do if we manually create the workers. For those who are new like me, you need to add the PuTTY directory in PATH in your environmental variables to run plink. Here are my final codes:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
cl <- makePSOCKcluster(machineAddresses,manual = F)
I run it with no problem at all. In summary, the function makePSOCKcluster makes two mistakes:
Assuming a wrong R directory in the server(At least it should assume the same directory as my local computer, but it didn't! I don't know where that strange directory comes from)
Using ssh command to start the connection, which does not work in R. It works well in cmd, but not in R. I don't know the reason.
If you are still not able to use makePSOCKcluster, here is one trick: Try to connect to the server in R using system function first. It can give you some error code, that may instruct you where the problem is. Here is my debugging code:
system("plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE")
On my hosting account I run chat in Node.js. All works fine however my hosting timeout processes every 12 hours. Apparently when the process is deamonized it will not timeout and so I tried to demonize with:
using Forever.js - running forever start chat.js . Running forever list confirms it runs and ps -ef command shows ? in TTY column
tried nohup node chat.js - running ps -ef TTY column shows pts/0 and PPID is 1
I tried to disconnect stdin, stdout, and stderr, and make it ignore the hangup signal (SIGHUP) so nohup ./myscript 0<&- &> my.admin.log.file & with no luck. ps -ef TTY column is pts/0 and PPID is anything but 1
I tried (nohup ./myscript 0<&- &>my.admin.log.file &) with no luck again. ps -ef TTY column is pts/0 and PPID is 1
After all this process always timouts in about 12hrs.
Now I tried (nohup ./myscript 0<&- &>my.admin.log.file &) & and am waiting, but do not keep my hopes up and need someones help.
Hosting guys claim that daemon processes do not timeout but how can I make sure my process is a daemon? Noting I tried seems to work even though with my limited understanding ps -ef seems to suggest process is deamonized.
What shall I do to demonize the process without moving to much more expensive hosting plans? Can I argue with hosting that after all this porcess is a daemon and they just got it wrong somewhere?
Upstart is a really easy way to daemonize processes
http://upstart.ubuntu.com/
There's some info on using it with node and monit, which will restart Node for you if it crashes
http://howtonode.org/deploying-node-upstart-monit