Using file locks with rsync - rsync

From the rsync manual documentation I see that by using the option rsync-path, it is possible to specify what program is to be run on the remote machine to start up rsync. In particular, the program could be a wrapper script which calls the actual rsync command in the middle, but which does some actions before and/or after the rsync invocation. One possible interesting use would be to acquire/release a lock (e.g., a flock), so that the operations of rsync at the remote end could be co-ordinated with another process at the far end which is contending for write access to the same files. There could be multiple rsync processes simultaneously holding the shared lock (I am aware of potential for starvation but am not concerned about that right now). The 'writer' process I'm dealing with would just be changing a few hard-links, so it would not block the rsync process for any significant lengh of time.
I have looked at other co-ordination approaches, e.g., implementing a custom remote locking protocol between the client and server, but they all involve more development work and/or are unsatisfactory for other reasons, which is why I am interested in the wrapper/(f)lock approach.
My questions are:
1) Is this a reasonable way to solve the problem of co-ordinating rsync 'readers' with another, 'writer' process accessing the same directory?
2) Can you also put a wrapper around rsync when using the inetd (or xinetd) daemon approach to running rsync, by adding a line something like the following to /etc/inetd.conf (as per the rsyncd.conf man page):
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
but replacing /usr/bin/rsync with the path to your rsync-lookalike wrapper, which in this case would be a C/C++ -code program which seizes a lock, forks off rsync, waits for rsync to complete, then releases the lock.
Thanks,
Tom

One potential catch with the wrapper approach: the remote process seems to be called with extra arguments, which are appended to whatever command line you specify with --rsync-path. So if you need to pass arguments something like the following style is needed.
#! /bin/sh
lock_target=$1
shift
if ! lockfile ${lock_target}.lock ; then exit 1 ; fi
trap "rm -f ${lock_target}.lock" EXIT HUP TERM INT
/usr/bin/rsync "$#"

Thanks to the question and the comments. Armed with your ideas I solved it (for me) using --rsync-path but without any wrapper scrips on the remote host, simply by putting all payload script into --rsync-path, with a few tricks.
This particular example uses rsync to pull data from remote host while holding a flock on the remote host, e.g. remote host dumps data periodically while also holding a flock, so dump and pull must not be interleaved.
Points to note
rsync will append its arguments to the end of whatever command you specify in "--rsync-path", so command needs to cope with that, and for that I rely on bash shell features on both pulling and remote hosts.
any pre and post processing on remote host must not write to STDOUT because that will corrupt rsync protocol and rsync will bail. Any error output should go to STDERR and it will turn up on pulling host as rsync STDERR output. This is why '1>&2' in all the error handling.
this probably relies on remote command spawned by rsync to run by bash because I think the good old sh does not support arrays. This works for me between RHEL7 boxes. Possible work around proposed at the end.
With that in mind, here is my simplified concept only rehash (I've not run this particular script, my full solution has extra layers that distract attention from the main point).
The script on the pulling host:
#!/bin/bash
function rsync_wrap() {
{
flock --exclusive --timeout ${LOCK_TIMEOUT} 100 || {
echo "Failed to lock: ${LOCK_TIMEOUT}" 1>&2
return 1
}
# call real rsync with original arguments
rsync "$#"
exit_code=$?
if [ ${exit_code} -eq 0 ]; then
# Do clean up when success
# rm -f "${LOCK_FILE}"
# rm -rf /eg/purge/data
else
# Do clean up when failed
fi
# Note, return is important, do not let it fall out
return ${exit_code}
} 100<"${LOCK_FILE}"
echo "Failed to open lock file: ${LOCK_FILE}" 1>&2
return 1
}
# Define vars
LOCK_FILE=/var/somedir/name.lock; # or /dev/shm/name.lock
LOCK_TIMEOUT=600; #in seconds
# Build remote command, define vars and functions inside the command
remote_cmd="
# this approach deals with crazy chars in variables and function code
$( declare -p LOCK_FILE )
$( declare -p LOCK_TIMEOUT )
$( declare -f rsync_wrap )
rsync_wrap "
local_cmd=(
rsync
-a
--rsync-path="${remote_cmd}"
# I want to handle network timeouts in SSH, not in rsync,
# because rsync does not know that waiting for lock is expected
-e "ssh -o BatchMode=yes -o ServerAliveCountMax=3 -o ServerAliveInterval=30 ${IDENTITY_FILE:+ -i '${IDENTITY_FILE}'}"
/remote/source/path
/local/destination/path/
)
# Do it
"${local_cmd[#]}"
If remote side executes --rsync-path in something other than bash then maybe the whole remote command could be wrapped in something like:
local_cmd="bash -c '${local_cmd//\'/\'\\\'\'}'"

As per comments to the original post, it is indeed feasible to use wrapper approach to implement (f)locks around rsync at the server end.

Related

open screen session on many remote hosts executing complex command, don't exit afterward

I have a long list of remote hosts and I want to run a shell command on all of them. The command takes a very long time, so I want to run the command inside screen on the remote machine, disconnecting immediately from each, and I want the terminal output on the remote to be preserved after the command exits. There is a "tag" that should be supplied to each command as an argument. I tried to do this with parallel, something like this:
$ cat servers.txt
user1#server1.example.com/tag1
user2#server2.example.com/tag2
# ...
$ cat run.sh
grep -v '^#' servers.txt |
parallel ssh -tt '{//}' \
'tag={/}; exec screen slow_command --option1 --option2 $tag other args'
This doesn't work: all of the remote processes are launched, but they are not detached (so the ssh sessions remain live and I don't get my local shell back), and once each command finishes, its screen exits immediately and the output is lost.
How do I fix this script? Note: if this is easier to do with tmux and/or some other marshalling program besides parallel, I'm happy to hear answers that explain how to do it that way.
Something like this:
grep -v '^#' servers.txt |
parallel -q --colsep / ssh {1} "screen -d -m bash -c 'echo do stuff \"{2}\";sleep 1000000'"
The final sleep makes sure the screen does not die. You will have 1000000 seconds to attach to it and kill it.
There is an awful lot of quoting there - especially if do stuff is complex.
It may be easier to make a function that computes tag on the remote machine. You need GNU Parallel 20200522 for this:
env_parallel --session
f() {
sshlogin="$1"
# TODO given $sshlogin compute $tag (e.g. a table lookup)
do_stuff() {
echo "do stuff $tag"
sleep 1000000
}
export -f do_stuff
screen -d -m bash -c do_stuff "$#"
}
env_parallel --nonall --slf servers_without_tag f '$PARALLEL_SSHLOGIN'
env_parallel --endsession

Socat: run script in bidirectional tunnel

I am running a tunnel like this:
socat TCP-LISTEN:9090,fork TCP:192.168.1.3:9090
I would like to run a script to execute code with the strings passing through the tunnel.
The script does not change the strings, only processes strings independently but allows passage without changing between both ends.
Is this possible?
You should be able to even alter the communication using this approach:
Prepare a helper script helper.sh which gets executed for each connection:
#!/bin/bash
./inFilter.sh | socat - TCP:192.168.1.3:9090 | ./outFilter.sh
And start listening by using:
socat TCP-LISTEN:9090,fork EXEC:"./helper.sh"
The scripts inFilter.sh and outFilter.sh are processing the client and the server parts of the communication.
Example inFilter.sh:
#!/bin/bash
while read -r l ; do echo "IN(${l})" ; done
Example outFilter.sh:
#!/bin/bash
while read -r l ; do echo "OUT(${l})" ; done
This method should work for line-based text communication (as almost everything is line-buffered).
To make it work for binary protocols wrapping all processes with stdbuf -i0 -o0 might help (see .e.g here) and getting rid of shell is probably a good idea in this case.
Good luck!

inotify and rsync on large number of files

I am using inotify to watch a directory and sync files between servers using rsync. Syncing works perfectly, and memory usage is mostly not an issue. However, recently a large number of files were added (350k) and this has impacted performance, specifically on CPU. Now when rsync runs, CPU usage spikes to 90%/100% and rsync takes long to complete, there are 650k files being watched/synced.
Is there any way to speed up rsync and only rsync the directory that has been changed? Or alternatively to set up multiple inotifywaits on separate directories. Script being used is below.
UPDATE: I have added the --update flag and usage seems mostly unchanged
#! /bin/bash
EVENTS="CREATE,DELETE,MODIFY,MOVED_FROM,MOVED_TO"
inotifywait -e "$EVENTS" -m -r --format '%:e %f' /var/www/ --exclude '/var/www/.*cache.*' | (
WAITING="";
while true; do
LINE="";
read -t 1 LINE;
if test -z "$LINE"; then
if test ! -z "$WAITING"; then
echo "CHANGE";
WAITING="";
rsync --update -alvzr --exclude '*cache*' --exclude '*.git*' /var/www/* root#secondwebserver:/var/www/
fi;
else
WAITING=1;
fi;
done)
I ended up removing the compression option (z) and upping the WAITING var to 10 (seconds). This seems to have helped, rsync still spikes CPU load but it is shorter lived. Credit goes to an answer on unix stackexchange
You're using rsync to synchronize the root directory of a large tree, so I'm not surprised at the performance loss.
One possible solution is to only synchronize the changed files/directories, instead of the whole root directory.
For instance, file1, file2 and file3 lay under from/dir. When changes are made to these 3 files, use
rsync --update -alvzr from/dir/file1 from/dir/file2 from/dir/file3 to/dir
rather than
rsync --update -alvzr from/dir/* to/dir
But this has a potential bug: rsync won't create directories automatically if target folders don't exist. However, you can use ssh to execute remote command and create directories by yourself.
You may need to set SSH public-key authentication as well, but according to the rsync command line you paste, I assume you've already done this.
reference:
rsync - create all missing parent directories?
rsync: how can I configure it to create target directory on server?
How to use SSH to run a shell script on a remote machine?
SSH error when executing a remote command: "stdin: is not a tty"

Alternative ways to issue multiple commands on a remote machine using SSH?

It appears that in this question, the answer was to separate statements with semicolons. However that could become cumbersome if we get into complex scripting with multiple if statements and complex quoted strings. I would think.
I imagine another alternative would be to simply issue multiple SSH commands one after the other, but again that would be cumbersome, plus I'm not set up for public/private key authentication so this would be asking for passwords a bunch of times.
What I'd ideally like is much similar to the interactive shell experience: at one point in the script you ssh into#the_remote_server and it prompts for the password, which you type in (interactively) and then from that point on until your script issues the "exit" command, all commands in the script are interpreted on the remote machine.
Of course this doesn't work:
ssh user#host.com
cd some/dir/on/remote/machine
tar -xzf my_tarball.tgz
cd some/other/dir/on/remote
cp -R some_directory somewhere_else
exit
Is there another alternative? I suppose I could take that part right out of my script and stick it into a script on the remote host. Meh. Now I'm maintaining two scripts. Plus I want a little configuration file to hold defaults and other stuff and I don't want to be maintaining that in two places either.
Is there another solution?
Use a heredoc.
ssh user#host.com << EOF
cd some/dir/on/remote/machine
tar -xzf my_tarball.tgz
cd some/other/dir/on/remote
cp -R some_directory somewhere_else
EOF
Use heredoc syntax, like
ssh user#host.com <<EOD
cd some/dir/on/remote/machine
...
EOD
or pipe, like
echo "ls -al" | ssh user#host.com

moving from one to another server in shell script

Here is the scenario,
$hostname
server1
I have the below script in server1,
#!/bin/ksh
echo "Enter server name:"
read server
rsh -n ${server} -l mquser "/opt/hd/ca/scripts/envscripts.ksh"
qdisplay
# script ends.
In above script I am logging into another server say server2 and executing the script "envscripts.ksh" which sets few alias(Alias "qdisplay") defined in it.
I can able to successfully login to server1 but unable to use the alias set by script "envscripts.ksh".
Geting below error,
-bash: qdisplay: command not found
can some please point out what needs to be corrected here.
Thanks,
Vignesh
The other responses and comments are correct. Your rsh command needs to execute both the ksh script and the subsequent command in the same invocation. However, I thought I'd offer an additional suggestion.
It appears that you are writing custom instrumentation for WebSphere MQ. Your approach is to remote shell to the WMQ server and execute a command to display queue attributes (probably depth).
The objective of writing your own instrumentation is admirable, however attempting to do it as remote shell is not an optimal approach. It requires you to maintain a library of scripts on each MQ server and in some cases to maintain these scripts in different languages.
I would suggest that a MUCH better approach is to use the MQSC client available in SupportPac MO72. This allows you to write the scripts once, and then execute them from a central server. Since the MQSC commands are all done via MQ client, the same script handles Windows, UNIX, Linux, iSeries, etc.
For example, you could write a script that remotely queried queue depths and printed a list of all queues with depth > 0. You could then either execute this script directly against a given queue manager or write a script to iterate through a list of queue managers and collect the same report for the entire network. Since the scripts are all running on the one central server, you do not have to worry about getting $PATH right, differences in commands like tr or grep, where ksh or perl are installed, etc., etc.
Ten years ago I wrote the scripts you are working on when my WMQ network was small. When the network got bigger, these platform differences ate me alive and I was unable to keep the automation up and running. When I switched to using WMQ client and had only one set of scripts I was able to keep it maintained with far less time and effort.
The following script assumes that the QMgr name is the same as the host name except in UPPER CASE. You could instead pass QMgr name, hostname, port and channel on the command line to make the script useful where QMgr names do not match the host name.
#!/usr/bin/perl -w
#-------------------------------------------------------------------------------
# mqsc.pl
#
# Wrapper for M072 SupportPac mqsc executable
# Supply parm file name on command line and host names via STDIN.
# Program attempts to connect to hostname on SYSTEM.AUTO.SVRCONN and port 1414
# redirecting parm file into mqsc.
#
# Intended usage is...
#
# mqsc.pl parmfile.mqsc
# host1
# host2
#
# -- or --
#
# mqsc.pl parmfile.mqsc < nodelist
#
# -- or --
#
# cat nodelist | mqsc.pl parmfile.mqsc
#
#-------------------------------------------------------------------------------
use strict;
$SIG{ALRM} = sub { die "timeout" };
$ENV{PATH} =~ s/:$//;
my $File = shift;
die "No mqsc parm file name supplied!" unless $File;
die "File '$File' does not exist!\n" unless -e $File;
while () {
my #Results;
chomp;
next if /^\s*[#*]/; # Allow comments using # or *
s/^\s+//; # Delete leading whitespace
s/\s+$//; # Delete trailing whitespace
# Do not accept hosts with embedded spaces in the name
die "ERROR: Invalid host name '$_'\n" if /\s/;
# Silently skip blank lines
next unless ($_);
my $QMgrName = uc($_);
#----------------------------------------------------------------------------
# Run the parm file in
eval {
alarm(10);
#Results = `mqsc -E -l -h $_ -p detmsg=1,prompt="",width=512 -c SYSTEM.AUTO.SVRCONN &1 | grep -v "^MQSC Ended"`;
};
if ($#) {
if ($# =~ /timeout/) {
print "Timed out connecting to $_\n";
} else {
print "Unexpected error connecting to $_: $!\n";
}
}
alarm(0);
if (#Results) {
print join("\t", #Results, "\n");
}
}
exit;
The parmfile.mqsc is any valid MQSC script. One that gathers all the queue depths looks like this:
DISPLAY QL(*) CURDEPTH
I think the real problem is that the r(o)sh cmd only executes the remote envscripts.ksh file and that your script is then trying to execute qdisplay on your local machine.
You need to 'glue' the two commands together so they are both executed remotely.
EDITED per comment from Gilles (He is correct)
rosh -n ${server} -l mquser ". /opt/hd/ca/scripts/envscripts.ksh ; qdisplay"
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer

Resources