I have a shell script that starts an ssh session to a remote host and pipes the output to another, local script, like so:
#!/bin/sh
ssh user#host 'while true ; do get-info ; sleep 1 ; done' | awk -f parse-info.awk
It works fine. I run it under the 'supervise' program from djb's daemontools. The only problem is shutting down the daemon. If I terminate the process for this shell script, the ssh and awk processes continue running as orphans. Normally I would solve this problem with exec to replace the supervising shell process, but the two processes run in their own subshells and can't replace the shell process.
What I would like to do is have the supervising shell script 'forward' any signals it receives to at least one of the child processes, so that I can break the pipe and shut down cleanly. Is there an easy way to do this?
Inter process communications.
You should be looking at pipes, etc.
Related
Below is a simple example of what I'm trying to accomplish. I'm trying to force an ssh script to not wait for all child processes to exit before returning. The purpose is to launch a daemon process on a remote host via ssh.
test.sh
#!/bin/bash
(
sleep 2
echo "done"
) &
When I run the script on the console it returns immediately, with "done" appearing 2 seconds later.
When I run the script as an ssh script, the ssh command . It appears to wait until all child processes have terminated until ssh exits.
ssh example
$ ssh mike#127.0.0.1 /home/mike/test.sh
(2 seconds)
done
standard terminal example
$ ./test.sh
$
(2 seconds)
done
How can I make ssh return when the parent/main process has terminated?
EDIT:
I'm aware of the -f option to ssh to run the process in the background . It leaves the ssh process and connection open on the source host. For my purposes this is unsuitable.
ssh mike#127.0.0.1 /home/mike/test.sh
When you run ssh in this fashion, the remote ssh server will create a set of pipes (or socketpairs) which become the standard input, output, and error for the process which you requested it to run, in this case the script process. The ssh server doesn't end the session based on when the script process exits. Instead, it ends the session when it reads and end-of-file indication on the script process's standard output and standard error.
In your case, the script process creates a child process which inherits the script's standard input, output, and error. A pipe (or socketpair) only returns EOF when all possible writers have exited or closed their end of the pipe. As long as the child process is running and has a copy of the standard output/error file descriptors, the ssh server won't read an EOF indication on those descriptors and it won't close the session.
You can get around this by redirecting standard input and standard output in the command that you pass to the remote server:
ssh mike#127.0.0.1 '/home/mike/test.sh > /dev/null 2>&1'
(note the quotes are important)
This avoids passing the standard output and standard error created by the ssh server to the script process or the subprocesses that it creates.
Alternately, you could add a redirection to the script:
#!/bin/bash
(
exec > /dev/null 2>&1
sleep 2
echo "done"
) &
This causes the script's child process to close its copies of the original standard output and standard error.
I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall
I often have to relaunch a server to see if my changes are fine. I keep this server opened in a shell, so I have a quick access to current logs. So here is what I type in my shell: ^C!!⏎. That is send SIGINT, and then relaunch last event in history.
So what I would like is to type, say ^R, and have the same result.
(Note: I use zsh)
I tried the following:
relaunch-function() {
kill -INT %% && !!
}
zle -N relaunch-widget relaunch-function
bindkey "^R" relaunch-widget
But it seems that while running my server, ^R won't be passed tho the shell but to the server which doesn't notice the shell. So I can't see a generic solution, while testing return value and process name should be feasible.
As long as the job is running in the foreground, keys will not be passed to the shell. So setting a key binding for killing a foreground process and starting it again won't work.
But as you could start your server in an endless loop, so that it restarts automatically. Assuming the name of the command is run_server you can start it like this on the shell:
(TRAPINT(){};while sleep .5; do run_server; done)
The surrounding parentheses start a sub-shell, TRAPINT(){} disables SIGINT for this shell. The while loop will keep restarting run_server until sleep exits with an exit status that is not zero. That can be achieved by interrupting sleep with ^C. (Without setting TRAPINT, interrupting run_server could also interrupt the loop)
So if you want to restart your server, just press ^C and wait for 0.5 seconds. If you want to stop your server without restarting, press ^C twice in 0.5 seconds.
To save some typing you can create a function for that:
doloop() {(
TRAPINT(){}
while sleep .5
do
echo running \"$#\"
eval $#
done
)}
Then call it with doloop run_server. Note: You still need the additional surrounding () as functions do not open a sub-shell by themselves.
eval allows for shell constructs to be used. For example doloop LANG=C locale. In some cases you may need to use (single):
$ doloop echo $RANDOM
running "echo 242"
242
running "echo 242"
242
running "echo 242"
242
^C
$ doloop 'echo $RANDOM'
running "echo $RANDOM"
10988
running "echo $RANDOM"
27551
running "echo $RANDOM"
8910
^C
From the rsync manual documentation I see that by using the option rsync-path, it is possible to specify what program is to be run on the remote machine to start up rsync. In particular, the program could be a wrapper script which calls the actual rsync command in the middle, but which does some actions before and/or after the rsync invocation. One possible interesting use would be to acquire/release a lock (e.g., a flock), so that the operations of rsync at the remote end could be co-ordinated with another process at the far end which is contending for write access to the same files. There could be multiple rsync processes simultaneously holding the shared lock (I am aware of potential for starvation but am not concerned about that right now). The 'writer' process I'm dealing with would just be changing a few hard-links, so it would not block the rsync process for any significant lengh of time.
I have looked at other co-ordination approaches, e.g., implementing a custom remote locking protocol between the client and server, but they all involve more development work and/or are unsatisfactory for other reasons, which is why I am interested in the wrapper/(f)lock approach.
My questions are:
1) Is this a reasonable way to solve the problem of co-ordinating rsync 'readers' with another, 'writer' process accessing the same directory?
2) Can you also put a wrapper around rsync when using the inetd (or xinetd) daemon approach to running rsync, by adding a line something like the following to /etc/inetd.conf (as per the rsyncd.conf man page):
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
but replacing /usr/bin/rsync with the path to your rsync-lookalike wrapper, which in this case would be a C/C++ -code program which seizes a lock, forks off rsync, waits for rsync to complete, then releases the lock.
Thanks,
Tom
One potential catch with the wrapper approach: the remote process seems to be called with extra arguments, which are appended to whatever command line you specify with --rsync-path. So if you need to pass arguments something like the following style is needed.
#! /bin/sh
lock_target=$1
shift
if ! lockfile ${lock_target}.lock ; then exit 1 ; fi
trap "rm -f ${lock_target}.lock" EXIT HUP TERM INT
/usr/bin/rsync "$#"
Thanks to the question and the comments. Armed with your ideas I solved it (for me) using --rsync-path but without any wrapper scrips on the remote host, simply by putting all payload script into --rsync-path, with a few tricks.
This particular example uses rsync to pull data from remote host while holding a flock on the remote host, e.g. remote host dumps data periodically while also holding a flock, so dump and pull must not be interleaved.
Points to note
rsync will append its arguments to the end of whatever command you specify in "--rsync-path", so command needs to cope with that, and for that I rely on bash shell features on both pulling and remote hosts.
any pre and post processing on remote host must not write to STDOUT because that will corrupt rsync protocol and rsync will bail. Any error output should go to STDERR and it will turn up on pulling host as rsync STDERR output. This is why '1>&2' in all the error handling.
this probably relies on remote command spawned by rsync to run by bash because I think the good old sh does not support arrays. This works for me between RHEL7 boxes. Possible work around proposed at the end.
With that in mind, here is my simplified concept only rehash (I've not run this particular script, my full solution has extra layers that distract attention from the main point).
The script on the pulling host:
#!/bin/bash
function rsync_wrap() {
{
flock --exclusive --timeout ${LOCK_TIMEOUT} 100 || {
echo "Failed to lock: ${LOCK_TIMEOUT}" 1>&2
return 1
}
# call real rsync with original arguments
rsync "$#"
exit_code=$?
if [ ${exit_code} -eq 0 ]; then
# Do clean up when success
# rm -f "${LOCK_FILE}"
# rm -rf /eg/purge/data
else
# Do clean up when failed
fi
# Note, return is important, do not let it fall out
return ${exit_code}
} 100<"${LOCK_FILE}"
echo "Failed to open lock file: ${LOCK_FILE}" 1>&2
return 1
}
# Define vars
LOCK_FILE=/var/somedir/name.lock; # or /dev/shm/name.lock
LOCK_TIMEOUT=600; #in seconds
# Build remote command, define vars and functions inside the command
remote_cmd="
# this approach deals with crazy chars in variables and function code
$( declare -p LOCK_FILE )
$( declare -p LOCK_TIMEOUT )
$( declare -f rsync_wrap )
rsync_wrap "
local_cmd=(
rsync
-a
--rsync-path="${remote_cmd}"
# I want to handle network timeouts in SSH, not in rsync,
# because rsync does not know that waiting for lock is expected
-e "ssh -o BatchMode=yes -o ServerAliveCountMax=3 -o ServerAliveInterval=30 ${IDENTITY_FILE:+ -i '${IDENTITY_FILE}'}"
/remote/source/path
/local/destination/path/
)
# Do it
"${local_cmd[#]}"
If remote side executes --rsync-path in something other than bash then maybe the whole remote command could be wrapped in something like:
local_cmd="bash -c '${local_cmd//\'/\'\\\'\'}'"
As per comments to the original post, it is indeed feasible to use wrapper approach to implement (f)locks around rsync at the server end.
I am writing a file syncing application where I collect event from the filesystem whenever the file is modified and than later I copy it over to remote share via rsync over ssh. In my setup I have a slot which is connected to a QTimer. Each 5 seconds I pick a file from a sqlite db for synchronization and start a QProcess::start with the following parameters
/usr/bin/rsync -a /aufs/another-test-folder/testfile286.txt --rsh="ssh -p 8023" user#myserver.de:/home/neox/another-test-folder/testfile286.txt --rsync-path="mkdir -p /home/neox/another-test-folder && rsync"
I have at most 2 rsync processes running in parallel. This results in a process tree:
MyApp
\_rsync
| \_ssh
|_rsync
\_ssh
The problem is that sometimes the application hangs and the ps says that ssh processes have gone zombie. First I have tried to kill MyApp with SIGKILL but no luck. Than I moved on to kill rsync and ssh but still no luck. The whole tree hangs. And if I try to start the daemon from another console or even try to ssh to another box, I can't. My idea here is that somewhere ssh blocks some IO resources. Any idea how to solve this?
P.S. This happens randomly and not often