OK so I have paramiko v2.2.1 and I am trying to login to a machine and restart a service. Inside the service scripts it basically starts a process via nohup. However if I allow paramiko to disconnect as soon as it is done the process started terminates with a PIPE signal when it writes to stdout.
If I start the service by ssh'ing into the box and manually starting it there is no issue and it runs in the background fine. Also if I add long sleep 10 before disconnecting (close) paramiko it also seems to work just fine.
The service is started via a init.d script via a line like this:
env LD_LIBRARY_PATH=$bin_path nohup $bin_path/ServerLoop.sh \
"$bin_path/Service service args" "$#" &
Where ServerLoop.sh simply calls the service forever in a loop like this so it will never die:
SERVER=$1
shift
ARGS=$#
logger $ARGS
while [ 1 ]; do
$SERVER $ARGS
logger "$SERVER terminated with exit code: $STATUS. Server has been restarted"
sleep 1
done
I have noticed when I start the service by ssh'ing into the box I get a nohup.out file written to the root. However when I run through paramiko I get no nohup.out written anywhere on the system ... ie this after I manually ssh into the box and start the service:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
/nohup.out
And this is after I run through paramiko:
root#ts4700:/mnt/mc.fw/bin# find / -name "nohup*"
/usr/bin/nohup
/usr/share/man/man1/nohup.1.gz
As I understand it nohup will only redirect the output to nohup.out if "If standard output is a terminal" (from the manual), otherwise it thinks it is saving the output to a file so it does not redirect. Hence I tried the following:
In [43]: import paramiko
In [44]: paramiko.__version__
Out[44]: '2.2.1'
In [45]: ssh = paramiko.SSHClient()
In [46]: ssh.set_missing_host_key_policy(AutoAddPolicy())
In [47]: ssh.connect(ip, username='root', password=not_for_so_sorry, look_for_keys=False, allow_agent=False)
In [48]: stdin, stdout, stderr = ssh.exec_command("tty")
In [49]: stdout.read()
Out[49]: 'not a tty\n'
So I am thinking that nohup is not redirecting to nohup.out when I run it through paramiko because tty is not returning a terminal. I don't know why adding a sleep(10) would fix this though as the service if run on the command line is quite verbose.
I have also noticed that if the service is started from a manual ssh its tty in the ps ax output is still set to the ssh tty ... however if the process is started by paramiko its tty in the ps ax output is set to "?" .. since both processes are run through nohup I would have expected this to be the same.
If the problem is that nohup is indeed not redirecting the output to nohup.out because of the tty is there a way to force this to happen or a better way to run this sort of command via paramiko?
Thanks all, any help with this would be great :)
From the rsync manual documentation I see that by using the option rsync-path, it is possible to specify what program is to be run on the remote machine to start up rsync. In particular, the program could be a wrapper script which calls the actual rsync command in the middle, but which does some actions before and/or after the rsync invocation. One possible interesting use would be to acquire/release a lock (e.g., a flock), so that the operations of rsync at the remote end could be co-ordinated with another process at the far end which is contending for write access to the same files. There could be multiple rsync processes simultaneously holding the shared lock (I am aware of potential for starvation but am not concerned about that right now). The 'writer' process I'm dealing with would just be changing a few hard-links, so it would not block the rsync process for any significant lengh of time.
I have looked at other co-ordination approaches, e.g., implementing a custom remote locking protocol between the client and server, but they all involve more development work and/or are unsatisfactory for other reasons, which is why I am interested in the wrapper/(f)lock approach.
My questions are:
1) Is this a reasonable way to solve the problem of co-ordinating rsync 'readers' with another, 'writer' process accessing the same directory?
2) Can you also put a wrapper around rsync when using the inetd (or xinetd) daemon approach to running rsync, by adding a line something like the following to /etc/inetd.conf (as per the rsyncd.conf man page):
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
but replacing /usr/bin/rsync with the path to your rsync-lookalike wrapper, which in this case would be a C/C++ -code program which seizes a lock, forks off rsync, waits for rsync to complete, then releases the lock.
Thanks,
Tom
One potential catch with the wrapper approach: the remote process seems to be called with extra arguments, which are appended to whatever command line you specify with --rsync-path. So if you need to pass arguments something like the following style is needed.
#! /bin/sh
lock_target=$1
shift
if ! lockfile ${lock_target}.lock ; then exit 1 ; fi
trap "rm -f ${lock_target}.lock" EXIT HUP TERM INT
/usr/bin/rsync "$#"
Thanks to the question and the comments. Armed with your ideas I solved it (for me) using --rsync-path but without any wrapper scrips on the remote host, simply by putting all payload script into --rsync-path, with a few tricks.
This particular example uses rsync to pull data from remote host while holding a flock on the remote host, e.g. remote host dumps data periodically while also holding a flock, so dump and pull must not be interleaved.
Points to note
rsync will append its arguments to the end of whatever command you specify in "--rsync-path", so command needs to cope with that, and for that I rely on bash shell features on both pulling and remote hosts.
any pre and post processing on remote host must not write to STDOUT because that will corrupt rsync protocol and rsync will bail. Any error output should go to STDERR and it will turn up on pulling host as rsync STDERR output. This is why '1>&2' in all the error handling.
this probably relies on remote command spawned by rsync to run by bash because I think the good old sh does not support arrays. This works for me between RHEL7 boxes. Possible work around proposed at the end.
With that in mind, here is my simplified concept only rehash (I've not run this particular script, my full solution has extra layers that distract attention from the main point).
The script on the pulling host:
#!/bin/bash
function rsync_wrap() {
{
flock --exclusive --timeout ${LOCK_TIMEOUT} 100 || {
echo "Failed to lock: ${LOCK_TIMEOUT}" 1>&2
return 1
}
# call real rsync with original arguments
rsync "$#"
exit_code=$?
if [ ${exit_code} -eq 0 ]; then
# Do clean up when success
# rm -f "${LOCK_FILE}"
# rm -rf /eg/purge/data
else
# Do clean up when failed
fi
# Note, return is important, do not let it fall out
return ${exit_code}
} 100<"${LOCK_FILE}"
echo "Failed to open lock file: ${LOCK_FILE}" 1>&2
return 1
}
# Define vars
LOCK_FILE=/var/somedir/name.lock; # or /dev/shm/name.lock
LOCK_TIMEOUT=600; #in seconds
# Build remote command, define vars and functions inside the command
remote_cmd="
# this approach deals with crazy chars in variables and function code
$( declare -p LOCK_FILE )
$( declare -p LOCK_TIMEOUT )
$( declare -f rsync_wrap )
rsync_wrap "
local_cmd=(
rsync
-a
--rsync-path="${remote_cmd}"
# I want to handle network timeouts in SSH, not in rsync,
# because rsync does not know that waiting for lock is expected
-e "ssh -o BatchMode=yes -o ServerAliveCountMax=3 -o ServerAliveInterval=30 ${IDENTITY_FILE:+ -i '${IDENTITY_FILE}'}"
/remote/source/path
/local/destination/path/
)
# Do it
"${local_cmd[#]}"
If remote side executes --rsync-path in something other than bash then maybe the whole remote command could be wrapped in something like:
local_cmd="bash -c '${local_cmd//\'/\'\\\'\'}'"
As per comments to the original post, it is indeed feasible to use wrapper approach to implement (f)locks around rsync at the server end.
Below is by code for spawing a fcgi script for nginx.
spawn-fcgi -d /home/ubuntu/workspace -f /home/ubuntu/workspace/index.py -a 127.0.0.1 -p 9001
Now, lets I want to make changes to the index.py script and reload with out bring down the system. How do reload the spawned program so the next connections are using the updated program while the others finish? For now I am killing the spawned process and running command again. I am hoping for something more graceful.
I tried this by the way.
sudo kill -1 `sudo lsof -t -i:9001
I have recently made something similar for node.js.
The idea is to have index.py as a very simple bootstrap script (which doesn‘t actually change much over time). It should catch SIGHUP, and reload/reread the application files (which are expected to change frequently).
I need to identify a daemon process that is writing to a log file periodically. The problem is that I dont have any idea which process is doing the job, and I need to show some progress to the client by tomorrow. Anybody has any clue?
I have already sorted out the daemon processes running in the system with the help of the PPID. Any help would be appreciated.
Also I think it is possible (rarely) for a daemon not to have a PPID as 1. How can we find it out then?
Try the fuser command on your log file, which will display the PIDs of processes using it.
Example:
$ fuser file.log
file.log: 3065
lsof gives a list of open files with the processes.
So lsof | grep <filename> should help you.
You can use auditctl.
# sudo apt-get install auditd
# sudo /sbin/auditctl -w /path/to/file -p war -k hosts-file
-w watch /etc/hosts
-p warx watch for write, attribute change, execute or read events
-k hosts-file is a search key.
# sudo /sbin/ausearch -f /path/to/file | more
Gives output such as
type=UNKNOWN[1327] msg=audit(1459766547.822:130): proctitle=2F7573722F7362696E2F61706163686532002D6B007374617274
type=PATH msg=audit(1459766547.822:130): item=0 name="/path/to/file" inode=141561 dev=08:00 mode=0100444 ouid=33 ogid=33 rdev=00:00 nametype=NORMAL
type=CWD msg=audit(1459766547.822:130): cwd="/"
type=SYSCALL msg=audit(1459766547.822:130): arch=c000003e syscall=2 success=yes exit=41 a0=7f3c23034cd0 a1=80000 a2=1b6 a3=8 items=1 ppid=24452 pid=6797 auid=42949672
95 uid=33 gid=33 euid=33 suid=33 fsuid=33 egid=33 sgid=33 fsgid=33 tty=(none) ses=4294967295 comm="apache2" exe="/usr/sbin/apache2" key="hosts-file"
I am writing a file syncing application where I collect event from the filesystem whenever the file is modified and than later I copy it over to remote share via rsync over ssh. In my setup I have a slot which is connected to a QTimer. Each 5 seconds I pick a file from a sqlite db for synchronization and start a QProcess::start with the following parameters
/usr/bin/rsync -a /aufs/another-test-folder/testfile286.txt --rsh="ssh -p 8023" user#myserver.de:/home/neox/another-test-folder/testfile286.txt --rsync-path="mkdir -p /home/neox/another-test-folder && rsync"
I have at most 2 rsync processes running in parallel. This results in a process tree:
MyApp
\_rsync
| \_ssh
|_rsync
\_ssh
The problem is that sometimes the application hangs and the ps says that ssh processes have gone zombie. First I have tried to kill MyApp with SIGKILL but no luck. Than I moved on to kill rsync and ssh but still no luck. The whole tree hangs. And if I try to start the daemon from another console or even try to ssh to another box, I can't. My idea here is that somewhere ssh blocks some IO resources. Any idea how to solve this?
P.S. This happens randomly and not often