Could rsync wait file created and then copy it in the same time - rsync

I want to copy file from A to B use "rsync", But the file is not exist right now in A. I run the "rsync" command on B, Can i let "rsync" to wait until the file is creating in A, and copy it in the same time?

Let's assume we start with a command running on hostB like
hostB$ rsync hostA:remotepath localpath
If hostA has a normal shell, we can make rsync wait until the file exists by tweaking the helper command it normally runs on hostA. Depending on the environment, something like this might work:
hostB$ rsync --rsync-path='
while [ ! -f remotefile ]; do sleep 1; done;
sleep 5;
rsync' hostA:remotepath localpath
the while loop busywaits for remotefile to become available
then sleep 5 allows a few seconds for the contents to settle
rsync is the normal remote command
must come last
must not end with newline or semicolon or comment

Related

rsync : how to copy only latest file from target to source

We have a main Linux server, say M, where we have files like below (for 2 months, and new files arriving daily)
Folder1
PROCESS1_20211117.txt.gz
PROCESS1_20211118.txt.gz
..
..
PROCESS1_20220114.txt.gz
PROCESS1_20220115.txt.gz
We want to copy only the latest file on our processing server, say P.
So as of now, we were using the below command, on our processing server.
rsync --ignore-existing -azvh -rpgoDe ssh user#M:${TargetServerPath}/${PROCSS_NAME}_*txt.gz ${SourceServerPath}
This process worked fine until now, but from now, in the processing server, we can keep files only up to 3 days. However, in our main server, we can keep files for 2 months.
So when we remove older files from the processing server, the rsync command copies all files from main server to the processing server.
How can I change rsync command to copy only latest file from Main server?
*Note: the example above is only for one file. We have multiple files on which we have to use the same command. Hence we cannot hardcode any filename.
What I tried:
There are multiple solutions, but all seems to be when I want to copy latest file from the server I am running rsync on, not on the remote server.
Also I tried running below to get the latest file from main server, but I cannot pass variable to SSH in my company, as it is not allowed. So below command works if I pass individual path/file name, but cannot work as with variables.
ssh M 'ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz|tail -1'
Would really appreciate any suggestions on how to implement this solution.
OS: Linux 3.10.0-1160.31.1.el7.x86_64
ssh quoting is confusing - to properly quote it, you have to double-quote it locally.
Handy printf %q trick is helpful - quote the relevant parts.
file=$(
ssh M "ls -1 $(printf "%q" "${getServerPath}/${PROCSS_NAME}")_*.txt.gz" |
tail -1
)
rsync --ignore-existing -azvh -rpgoDe ssh user#M:"$file" "${SourceServerPath}"
or maybe nicer to run tail -n1 on the remote, so that minimum amount of data are transferred (we only need one filename, not them all), invoke explicit shell and pass the variables as shell arguments:
file=$(ssh M "$(printf "%q " bash -c \
'ls -1 "$1"_*.txt.gz | tail -n1'
'_' "${TargetServerPath}/${PROCSS_NAME}"
)")
Overall, I recommend doing a function and using declare -f :
sshqfunc() { echo "bash -c $(printf "%q" "$(declare -f "$1"); $1 \"\$#\"")"; };
work() {
ls -1 "$1"_*txt.gz | tail -1
}
tmp=$(ssh M "$(sshqfunc work)" _ "${TargetServerPath}/${PROCSS_NAME}")
or you can also use the mighty declare to transfer variables to remote - then run your command inside single quotes:
ssh M "
$(declare -p TargetServerPath PROCSS_NAME);
"'
ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz | tail -1
'

Using file locks with rsync

From the rsync manual documentation I see that by using the option rsync-path, it is possible to specify what program is to be run on the remote machine to start up rsync. In particular, the program could be a wrapper script which calls the actual rsync command in the middle, but which does some actions before and/or after the rsync invocation. One possible interesting use would be to acquire/release a lock (e.g., a flock), so that the operations of rsync at the remote end could be co-ordinated with another process at the far end which is contending for write access to the same files. There could be multiple rsync processes simultaneously holding the shared lock (I am aware of potential for starvation but am not concerned about that right now). The 'writer' process I'm dealing with would just be changing a few hard-links, so it would not block the rsync process for any significant lengh of time.
I have looked at other co-ordination approaches, e.g., implementing a custom remote locking protocol between the client and server, but they all involve more development work and/or are unsatisfactory for other reasons, which is why I am interested in the wrapper/(f)lock approach.
My questions are:
1) Is this a reasonable way to solve the problem of co-ordinating rsync 'readers' with another, 'writer' process accessing the same directory?
2) Can you also put a wrapper around rsync when using the inetd (or xinetd) daemon approach to running rsync, by adding a line something like the following to /etc/inetd.conf (as per the rsyncd.conf man page):
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
but replacing /usr/bin/rsync with the path to your rsync-lookalike wrapper, which in this case would be a C/C++ -code program which seizes a lock, forks off rsync, waits for rsync to complete, then releases the lock.
Thanks,
Tom
One potential catch with the wrapper approach: the remote process seems to be called with extra arguments, which are appended to whatever command line you specify with --rsync-path. So if you need to pass arguments something like the following style is needed.
#! /bin/sh
lock_target=$1
shift
if ! lockfile ${lock_target}.lock ; then exit 1 ; fi
trap "rm -f ${lock_target}.lock" EXIT HUP TERM INT
/usr/bin/rsync "$#"
Thanks to the question and the comments. Armed with your ideas I solved it (for me) using --rsync-path but without any wrapper scrips on the remote host, simply by putting all payload script into --rsync-path, with a few tricks.
This particular example uses rsync to pull data from remote host while holding a flock on the remote host, e.g. remote host dumps data periodically while also holding a flock, so dump and pull must not be interleaved.
Points to note
rsync will append its arguments to the end of whatever command you specify in "--rsync-path", so command needs to cope with that, and for that I rely on bash shell features on both pulling and remote hosts.
any pre and post processing on remote host must not write to STDOUT because that will corrupt rsync protocol and rsync will bail. Any error output should go to STDERR and it will turn up on pulling host as rsync STDERR output. This is why '1>&2' in all the error handling.
this probably relies on remote command spawned by rsync to run by bash because I think the good old sh does not support arrays. This works for me between RHEL7 boxes. Possible work around proposed at the end.
With that in mind, here is my simplified concept only rehash (I've not run this particular script, my full solution has extra layers that distract attention from the main point).
The script on the pulling host:
#!/bin/bash
function rsync_wrap() {
{
flock --exclusive --timeout ${LOCK_TIMEOUT} 100 || {
echo "Failed to lock: ${LOCK_TIMEOUT}" 1>&2
return 1
}
# call real rsync with original arguments
rsync "$#"
exit_code=$?
if [ ${exit_code} -eq 0 ]; then
# Do clean up when success
# rm -f "${LOCK_FILE}"
# rm -rf /eg/purge/data
else
# Do clean up when failed
fi
# Note, return is important, do not let it fall out
return ${exit_code}
} 100<"${LOCK_FILE}"
echo "Failed to open lock file: ${LOCK_FILE}" 1>&2
return 1
}
# Define vars
LOCK_FILE=/var/somedir/name.lock; # or /dev/shm/name.lock
LOCK_TIMEOUT=600; #in seconds
# Build remote command, define vars and functions inside the command
remote_cmd="
# this approach deals with crazy chars in variables and function code
$( declare -p LOCK_FILE )
$( declare -p LOCK_TIMEOUT )
$( declare -f rsync_wrap )
rsync_wrap "
local_cmd=(
rsync
-a
--rsync-path="${remote_cmd}"
# I want to handle network timeouts in SSH, not in rsync,
# because rsync does not know that waiting for lock is expected
-e "ssh -o BatchMode=yes -o ServerAliveCountMax=3 -o ServerAliveInterval=30 ${IDENTITY_FILE:+ -i '${IDENTITY_FILE}'}"
/remote/source/path
/local/destination/path/
)
# Do it
"${local_cmd[#]}"
If remote side executes --rsync-path in something other than bash then maybe the whole remote command could be wrapped in something like:
local_cmd="bash -c '${local_cmd//\'/\'\\\'\'}'"
As per comments to the original post, it is indeed feasible to use wrapper approach to implement (f)locks around rsync at the server end.

How do I use the nohup command without getting nohup.out?

I have a problem with the nohup command.
When I run my job, I have a lot of data. The output nohup.out becomes too large and my process slows down. How can I run this command without getting nohup.out?
The nohup command only writes to nohup.out if the output would otherwise go to the terminal. If you have redirected the output of the command somewhere else - including /dev/null - that's where it goes instead.
nohup command >/dev/null 2>&1 # doesn't create nohup.out
Note that the >/dev/null 2>&1 sequence can be abbreviated to just >&/dev/null in most (but not all) shells.
If you're using nohup, that probably means you want to run the command in the background by putting another & on the end of the whole thing:
nohup command >/dev/null 2>&1 & # runs in background, still doesn't create nohup.out
On Linux, running a job with nohup automatically closes its input as well. On other systems, notably BSD and macOS, that is not the case, so when running in the background, you might want to close input manually. While closing input has no effect on the creation or not of nohup.out, it avoids another problem: if a background process tries to read anything from standard input, it will pause, waiting for you to bring it back to the foreground and type something. So the extra-safe version looks like this:
nohup command </dev/null >/dev/null 2>&1 & # completely detached from terminal
Note, however, that this does not prevent the command from accessing the terminal directly, nor does it remove it from your shell's process group. If you want to do the latter, and you are running bash, ksh, or zsh, you can do so by running disown with no argument as the next command. That will mean the background process is no longer associated with a shell "job" and will not have any signals forwarded to it from the shell. (A disowned process gets no signals forwarded to it automatically by its parent shell - but without nohup, it will still receive a HUP signal sent via other means, such as a manual kill command. A nohup'ed process ignores any and all HUP signals, no matter how they are sent.)
Explanation:
In Unixy systems, every source of input or target of output has a number associated with it called a "file descriptor", or "fd" for short. Every running program ("process") has its own set of these, and when a new process starts up it has three of them already open: "standard input", which is fd 0, is open for the process to read from, while "standard output" (fd 1) and "standard error" (fd 2) are open for it to write to. If you just run a command in a terminal window, then by default, anything you type goes to its standard input, while both its standard output and standard error get sent to that window.
But you can ask the shell to change where any or all of those file descriptors point before launching the command; that's what the redirection (<, <<, >, >>) and pipe (|) operators do.
The pipe is the simplest of these... command1 | command2 arranges for the standard output of command1 to feed directly into the standard input of command2. This is a very handy arrangement that has led to a particular design pattern in UNIX tools (and explains the existence of standard error, which allows a program to send messages to the user even though its output is going into the next program in the pipeline). But you can only pipe standard output to standard input; you can't send any other file descriptors to a pipe without some juggling.
The redirection operators are friendlier in that they let you specify which file descriptor to redirect. So 0<infile reads standard input from the file named infile, while 2>>logfile appends standard error to the end of the file named logfile. If you don't specify a number, then input redirection defaults to fd 0 (< is the same as 0<), while output redirection defaults to fd 1 (> is the same as 1>).
Also, you can combine file descriptors together: 2>&1 means "send standard error wherever standard output is going". That means that you get a single stream of output that includes both standard out and standard error intermixed with no way to separate them anymore, but it also means that you can include standard error in a pipe.
So the sequence >/dev/null 2>&1 means "send standard output to /dev/null" (which is a special device that just throws away whatever you write to it) "and then send standard error to wherever standard output is going" (which we just made sure was /dev/null). Basically, "throw away whatever this command writes to either file descriptor".
When nohup detects that neither its standard error nor output is attached to a terminal, it doesn't bother to create nohup.out, but assumes that the output is already redirected where the user wants it to go.
The /dev/null device works for input, too; if you run a command with </dev/null, then any attempt by that command to read from standard input will instantly encounter end-of-file. Note that the merge syntax won't have the same effect here; it only works to point a file descriptor to another one that's open in the same direction (input or output). The shell will let you do >/dev/null <&1, but that winds up creating a process with an input file descriptor open on an output stream, so instead of just hitting end-of-file, any read attempt will trigger a fatal "invalid file descriptor" error.
nohup some_command > /dev/null 2>&1&
That's all you need to do!
Have you tried redirecting all three I/O streams:
nohup ./yourprogram > foo.out 2> foo.err < /dev/null &
You might want to use the detach program. You use it like nohup but it doesn't produce an output log unless you tell it to. Here is the man page:
NAME
detach - run a command after detaching from the terminal
SYNOPSIS
detach [options] [--] command [args]
Forks a new process, detaches is from the terminal, and executes com‐
mand with the specified arguments.
OPTIONS
detach recognizes a couple of options, which are discussed below. The
special option -- is used to signal that the rest of the arguments are
the command and args to be passed to it.
-e file
Connect file to the standard error of the command.
-f Run in the foreground (do not fork).
-i file
Connect file to the standard input of the command.
-o file
Connect file to the standard output of the command.
-p file
Write the pid of the detached process to file.
EXAMPLE
detach xterm
Start an xterm that will not be closed when the current shell exits.
AUTHOR
detach was written by Robbert Haarman. See http://inglorion.net/ for
contact information.
Note I have no affiliation with the author of the program. I'm only a satisfied user of the program.
Following command will let you run something in the background without getting nohup.out:
nohup command |tee &
In this way, you will be able to get console output while running script on the remote server:
sudo bash -c "nohup /opt/viptel/viptel_bin/log.sh $* &> /dev/null" &
Redirecting the output of sudo causes sudo to reask for the password, thus an awkward mechanism is needed to do this variant.
If you have a BASH shell on your mac/linux in-front of you, you try out the below steps to understand the redirection practically :
Create a 2 line script called zz.sh
#!/bin/bash
echo "Hello. This is a proper command"
junk_errorcommand
The echo command's output goes into STDOUT filestream (file descriptor 1).
The error command's output goes into STDERR filestream (file descriptor 2)
Currently, simply executing the script sends both STDOUT and STDERR to the screen.
./zz.sh
Now start with the standard redirection :
zz.sh > zfile.txt
In the above, "echo" (STDOUT) goes into the zfile.txt. Whereas "error" (STDERR) is displayed on the screen.
The above is the same as :
zz.sh 1> zfile.txt
Now you can try the opposite, and redirect "error" STDERR into the file. The STDOUT from "echo" command goes to the screen.
zz.sh 2> zfile.txt
Combining the above two, you get:
zz.sh 1> zfile.txt 2>&1
Explanation:
FIRST, send STDOUT 1 to zfile.txt
THEN, send STDERR 2 to STDOUT 1 itself (by using &1 pointer).
Therefore, both 1 and 2 goes into the same file (zfile.txt)
Eventually, you can pack the whole thing inside nohup command & to run it in the background:
nohup zz.sh 1> zfile.txt 2>&1&
You can run the below command.
nohup <your command> & > <outputfile> 2>&1 &
e.g.
I have a nohup command inside script
./Runjob.sh > sparkConcuurent.out 2>&1

KSH: Block two process from running at the same time

I have two process that running at random time and I want to force them not to ever run at the same time due to reader-writer problem. My thought is whenever a process run, I create a LOCK file, both process has a logic of checking whether a LOCK exist. If LOCK is existed, then sleep for bit and wake up and check it again. Here is a small piece of it
if [[ ! -f ${INPUT_DIR}/LOCK ]]
then
#Create LOCK file
cat /dev/null > ${INPUT_DIR}/LOCK
retcode=${?}
if [[ ${retcode} -ne 0 ]]
then
echo `date` "Error in creating LOCK file by processA.sh - Error code: " ${retcode} >> ${CORE_LOG}
exit
fi
echo `date` "LOCK turns on by processA.sh" >> ${CORE_LOG}
...
rm ${INPUT_DIR}/LOCK
fi
Howver this does not QUITE stop the two process from running at the same time. There are rare time when both process would get pass the first IF checking if the log exist (if both process get invoke at the same time and there was no LOCK exist, very likely that it will get pass the first IF statement), both try to create a LOCK file, since cat /dev/null > ${INPUT_DIR}/LOCK will not generate an error, even when LOCK is already exist. Is there a solution to this?
For the main versions of unix, the preferred solution is to use a lock directory, I would assume this is true for linux, but I haven't had to test it recently.
Creating a directory is an atomic process, and only 1 of the processes will succeed, assuming that you are making a static name like /bin/mkdir -p /tmp/myProjWorkSpace/LOCK. If you need to have information embedded in your lock, then you need a file, and you need sepqrate subdirs per process, possibly add the processID (.$$) to the dir name.
I hope this helps.

moving from one to another server in shell script

Here is the scenario,
$hostname
server1
I have the below script in server1,
#!/bin/ksh
echo "Enter server name:"
read server
rsh -n ${server} -l mquser "/opt/hd/ca/scripts/envscripts.ksh"
qdisplay
# script ends.
In above script I am logging into another server say server2 and executing the script "envscripts.ksh" which sets few alias(Alias "qdisplay") defined in it.
I can able to successfully login to server1 but unable to use the alias set by script "envscripts.ksh".
Geting below error,
-bash: qdisplay: command not found
can some please point out what needs to be corrected here.
Thanks,
Vignesh
The other responses and comments are correct. Your rsh command needs to execute both the ksh script and the subsequent command in the same invocation. However, I thought I'd offer an additional suggestion.
It appears that you are writing custom instrumentation for WebSphere MQ. Your approach is to remote shell to the WMQ server and execute a command to display queue attributes (probably depth).
The objective of writing your own instrumentation is admirable, however attempting to do it as remote shell is not an optimal approach. It requires you to maintain a library of scripts on each MQ server and in some cases to maintain these scripts in different languages.
I would suggest that a MUCH better approach is to use the MQSC client available in SupportPac MO72. This allows you to write the scripts once, and then execute them from a central server. Since the MQSC commands are all done via MQ client, the same script handles Windows, UNIX, Linux, iSeries, etc.
For example, you could write a script that remotely queried queue depths and printed a list of all queues with depth > 0. You could then either execute this script directly against a given queue manager or write a script to iterate through a list of queue managers and collect the same report for the entire network. Since the scripts are all running on the one central server, you do not have to worry about getting $PATH right, differences in commands like tr or grep, where ksh or perl are installed, etc., etc.
Ten years ago I wrote the scripts you are working on when my WMQ network was small. When the network got bigger, these platform differences ate me alive and I was unable to keep the automation up and running. When I switched to using WMQ client and had only one set of scripts I was able to keep it maintained with far less time and effort.
The following script assumes that the QMgr name is the same as the host name except in UPPER CASE. You could instead pass QMgr name, hostname, port and channel on the command line to make the script useful where QMgr names do not match the host name.
#!/usr/bin/perl -w
#-------------------------------------------------------------------------------
# mqsc.pl
#
# Wrapper for M072 SupportPac mqsc executable
# Supply parm file name on command line and host names via STDIN.
# Program attempts to connect to hostname on SYSTEM.AUTO.SVRCONN and port 1414
# redirecting parm file into mqsc.
#
# Intended usage is...
#
# mqsc.pl parmfile.mqsc
# host1
# host2
#
# -- or --
#
# mqsc.pl parmfile.mqsc < nodelist
#
# -- or --
#
# cat nodelist | mqsc.pl parmfile.mqsc
#
#-------------------------------------------------------------------------------
use strict;
$SIG{ALRM} = sub { die "timeout" };
$ENV{PATH} =~ s/:$//;
my $File = shift;
die "No mqsc parm file name supplied!" unless $File;
die "File '$File' does not exist!\n" unless -e $File;
while () {
my #Results;
chomp;
next if /^\s*[#*]/; # Allow comments using # or *
s/^\s+//; # Delete leading whitespace
s/\s+$//; # Delete trailing whitespace
# Do not accept hosts with embedded spaces in the name
die "ERROR: Invalid host name '$_'\n" if /\s/;
# Silently skip blank lines
next unless ($_);
my $QMgrName = uc($_);
#----------------------------------------------------------------------------
# Run the parm file in
eval {
alarm(10);
#Results = `mqsc -E -l -h $_ -p detmsg=1,prompt="",width=512 -c SYSTEM.AUTO.SVRCONN &1 | grep -v "^MQSC Ended"`;
};
if ($#) {
if ($# =~ /timeout/) {
print "Timed out connecting to $_\n";
} else {
print "Unexpected error connecting to $_: $!\n";
}
}
alarm(0);
if (#Results) {
print join("\t", #Results, "\n");
}
}
exit;
The parmfile.mqsc is any valid MQSC script. One that gathers all the queue depths looks like this:
DISPLAY QL(*) CURDEPTH
I think the real problem is that the r(o)sh cmd only executes the remote envscripts.ksh file and that your script is then trying to execute qdisplay on your local machine.
You need to 'glue' the two commands together so they are both executed remotely.
EDITED per comment from Gilles (He is correct)
rosh -n ${server} -l mquser ". /opt/hd/ca/scripts/envscripts.ksh ; qdisplay"
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer

Resources