Rsync takes longer time “receiving incremental file list” - rsync

I am using rysnc to copy files from remote host to local machine using a cron job. Every time I need the rsync to copy new files only from remote host. But its getting struck at this line "receiving incremental file list" for very long time. Below is the command I am using. Is there any other way I can fasten up this rsync process?
rsync -avz --inplace --progress --delete -ahe ssh remoteuser#remotehost:/home/bin/dir1/data /home/bin/dir1

Have you tried with --delete-before, --delete-after or --delay-updates?
Some options require rsync to know the full file list, so these options
disable the incremental recursion mode. These include: --delete-before, --delete-after, --prune-empty-dirs, and --delay-updates. Because of this, the default delete mode when you specify --delete is now --delete-during when both ends of the connection are at least 3.0.0 (use --del or --delete-during to request this improved deletion mode explicitly). See also the --delete-delay option that is a better choice than using --delete-after.
(from: http://linux.die.net/man/1/rsync)

Related

rsync delete files on sending side after transfer

I want to download a large amount of data from a remote machine.
I'd like the data to be erased on the remote machine everytime a file finishes downloading.
How do I do that? Is there a flag for rsync to do this?
You need to pass the --remove-source-files option to the rsync command. It tells rsync to remove from the sending side the files (meaning non-directories) that are a part of the transfer and have been successfully duplicated on the receiving side. Do not pass the --delete option to rsync command as it delete extraneous files from destination directory.
Delete source after successful transfer using rsync.
The syntax is:
rsync --remove-source-files -options /path/to/src/ /path/to/dest
rsync --remove-source-files -options /path/to/src/ computerB:/path/to/dest
rsync --remove-source-files -av /path/to/src/*.avi computerB:/path/to/dest
Reference : http://www.cyberciti.biz/faq/linux-unix-bsd-appleosx-rsync-delete-file-after-transfer/

inotify and rsync on large number of files

I am using inotify to watch a directory and sync files between servers using rsync. Syncing works perfectly, and memory usage is mostly not an issue. However, recently a large number of files were added (350k) and this has impacted performance, specifically on CPU. Now when rsync runs, CPU usage spikes to 90%/100% and rsync takes long to complete, there are 650k files being watched/synced.
Is there any way to speed up rsync and only rsync the directory that has been changed? Or alternatively to set up multiple inotifywaits on separate directories. Script being used is below.
UPDATE: I have added the --update flag and usage seems mostly unchanged
#! /bin/bash
EVENTS="CREATE,DELETE,MODIFY,MOVED_FROM,MOVED_TO"
inotifywait -e "$EVENTS" -m -r --format '%:e %f' /var/www/ --exclude '/var/www/.*cache.*' | (
WAITING="";
while true; do
LINE="";
read -t 1 LINE;
if test -z "$LINE"; then
if test ! -z "$WAITING"; then
echo "CHANGE";
WAITING="";
rsync --update -alvzr --exclude '*cache*' --exclude '*.git*' /var/www/* root#secondwebserver:/var/www/
fi;
else
WAITING=1;
fi;
done)
I ended up removing the compression option (z) and upping the WAITING var to 10 (seconds). This seems to have helped, rsync still spikes CPU load but it is shorter lived. Credit goes to an answer on unix stackexchange
You're using rsync to synchronize the root directory of a large tree, so I'm not surprised at the performance loss.
One possible solution is to only synchronize the changed files/directories, instead of the whole root directory.
For instance, file1, file2 and file3 lay under from/dir. When changes are made to these 3 files, use
rsync --update -alvzr from/dir/file1 from/dir/file2 from/dir/file3 to/dir
rather than
rsync --update -alvzr from/dir/* to/dir
But this has a potential bug: rsync won't create directories automatically if target folders don't exist. However, you can use ssh to execute remote command and create directories by yourself.
You may need to set SSH public-key authentication as well, but according to the rsync command line you paste, I assume you've already done this.
reference:
rsync - create all missing parent directories?
rsync: how can I configure it to create target directory on server?
How to use SSH to run a shell script on a remote machine?
SSH error when executing a remote command: "stdin: is not a tty"

How can I configure rsync to create target directory on remote server?

I would like to rsync from local computer to server. On a directory that does not exist, and I want rsync to create that directory on the server first.
How can I do that?
If you have more than the last leaf directory to be created, you can either run a separate ssh ... mkdir -p first, or use the --rsync-path trick as explained here :
rsync -a --rsync-path="mkdir -p /tmp/x/y/z/ && rsync" $source user#remote:/tmp/x/y/z/
Or use the --relative option as suggested by Tony. In that case, you only specify the root of the destination, which must exist, and not the directory structure of the source, which will be created:
rsync -a --relative /new/x/y/z/ user#remote:/pre_existing/dir/
This way, you will end up with /pre_existing/dir/new/x/y/z/
And if you want to have "y/z/" created, but not inside "new/x/", you can add ./ where you want --relativeto begin:
rsync -a --relative /new/x/./y/z/ user#remote:/pre_existing/dir/
would create /pre_existing/dir/y/z/.
From the rsync manual page (man rsync):
--mkpath create the destination's path component
--mkpath was added in rsync 3.2.3 (6 Aug 2020).
Assuming you are using ssh to connect rsync, what about to send a ssh command before:
ssh user#server mkdir -p existingdir/newdir
if it already exists, nothing happens
The -R, --relative option will do this.
For example: if you want to backup /var/named/chroot and create the same directory structure on the remote server then -R will do just that.
this worked for me:
rsync /dev/null node:existing-dir/new-dir/
I do get this message :
skipping non-regular file "null"
but I don't have to worry about having an empty directory hanging around.
I don't think you can do it with one rsync command, but you can 'pre-create' the extra directory first like this:
rsync --recursive emptydir/ destination/newdir
where 'emptydir' is a local empty directory (which you might have to create as a temporary directory first).
It's a bit of a hack, but it works for me.
cheers
Chris
This answer uses bits of other answers, but hopefully it'll be a bit clearer as to the circumstances. You never specified what you were rsyncing - a single directory entry or multiple files.
So let's assume you are moving a source directory entry across, and not just moving the files contained in it.
Let's say you have a directory locally called data/myappdata/ and you have a load of subdirectories underneath this.
You have data/ on your target machine but no data/myappdata/ - this is easy enough:
rsync -rvv /path/to/data/myappdata/ user#host:/remote/path/to/data/myappdata
You can even use a different name for the remote directory:
rsync -rvv --recursive /path/to/data/myappdata user#host:/remote/path/to/data/newdirname
If you're just moving some files and not moving the directory entry that contains them then you would do:
rsync -rvv /path/to/data/myappdata/*.txt user#host:/remote/path/to/data/myappdata/
and it will create the myappdata directory for you on the remote machine to place your files in. Again, the data/ directory must exist on the remote machine.
Incidentally, my use of -rvv flag is to get doubly verbose output so it is clear about what it does, as well as the necessary recursive behaviour.
Just to show you what I get when using rsync (3.0.9 on Ubuntu 12.04)
$ rsync -rvv *.txt user#remote.machine:/tmp/newdir/
opening connection using: ssh -l user remote.machine rsync --server -vvre.iLsf . /tmp/newdir/
user#remote.machine's password:
sending incremental file list
created directory /tmp/newdir
delta-transmission enabled
bar.txt
foo.txt
total: matches=0 hash_hits=0 false_alarms=0 data=0
Hope this clears this up a little bit.
eg:
from: /xxx/a/b/c/d/e/1.html
to: user#remote:/pre_existing/dir/b/c/d/e/1.html
rsync:
cd /xxx/a/ && rsync -auvR b/c/d/e/ user#remote:/pre_existing/dir/
rsync source.pdf user1#192.168.56.100:~/not-created/target.pdf
If the target file is fully specified, the directory ~/not-created is not created.
rsync source.pdf user1#192.168.56.100:~/will-be-created/
But the target is specified with only a directory, the directory ~/will-be-created is created. / must be followed to let rsync know will-be-created is a directory.
use rsync twice~
1: tranfer a temp file, make sure remote relative directories has been created.
tempfile=/Users/temp/Dir0/Dir1/Dir2/temp.txt
# Dir0/Dir1/Dir2/ is directory that wanted.
rsync -aq /Users/temp/ rsync://remote
2: then you can specify the remote directory for transfer files/directory
tempfile|dir=/Users/XX/data|/Users/XX/data/
rsync -avc /Users/XX/data rsync://remote/Dir0/Dir1/Dir2
# Tips: [SRC] with/without '/' is different
This creates the dir tree /usr/local/bin in the destination and then syncs all containing files and folders recursively:
rsync --archive --include="/usr" --include="/usr/local" --include="/usr/local/bin" --include="/usr/local/bin/**" --exclude="*" user#remote:/ /home/user
Compared to mkdir -p, the dir tree even has the same perms as the source.
If you are using a version or rsync that doesn't have 'mkpath', then --files-from can help. Suppose you need to create 'mysubdir' in the target directory
Create 'filelist.txt' to contain
mysubdir/dummy
mkdir -p source_dir/mysubdir/
touch source_dir/mysubdir/dummy
rsync --files-from='filelist.txt' source_dir target_dir
rsync will copy mysubdir/dummy to target_dir, creating mysubdir in the process. Tested with rsync 3.1.3 on Raspberry Pi OS (debian).

Using local settings through SSH

Is it possible to have an SSH session use all your local configuration files (.bash_profile, .vimrc, etc..) on login? That way you would have the same configuration for, say, editing files in vim in the remote session.
I just came across two alternatives to just doing a git clone of your dotfiles. I take no credit for either of these and can't say I've used either extensively so I don't know if there are pitfalls to either of these.
sshrc
sshrc is a tool (actually just a big bash function) that copies over local rc-files without permanently writing them to the remove user's $HOME - the idea being that might be a shared admin account that other people use. Appears to be customizable for different remote hosts as well.
.ssh/config and LocalCommand
This blog post suggests a way to automatically run a command when you login to a remote host. It tars and pipes a set of files to the remote, then un-tars them on the remote's $HOME:
Your local ~/.ssh/config would look like this:
Host *
PermitLocalCommand yes
LocalCommand tar c -C${HOME} .bashrc .bash_profile .exports .aliases .inputrc .vimrc .screenrc \
| ssh -o PermitLocalCommand=no %n "tar mx -C${HOME}"
You could modify the above to only run the command on certain hosts (instead of the * wildcard) or customize for different hosts as well. There might be a fair amount of duplication per host with this method - although you could package the whole tar c ... | ssh .. "tar mx .." into a script maybe.
Note the above looks like it clobbers the same files on the remote when you connect, so use with caution.
Use a dotfiles.git repo
What I do is keep all my config files in a dotfiles.git on a central server.
You can set it up so that when you ssh into a remote machine, you automatically pull the latest version of the dotfiles. I do something like this:
ssh myhost
cd ~/dotfiles
git pull --rebase
cd ~
ln -sf dotfiles/$username/linux/.* .
Note:
To put that in a shell script, you can automate the process of executing commands on a remote machine by piping to ssh.
The "$username" is there so that you can share your config files with other people you're working with.
The "ln -sf" creates symbolic links to all your dotfiles, overwriting any local ones, such that ~/.emacs is linked to the version controlled file ~/dotfiles/$username/.emacs.
The use of a "linux" subdirectory is just to allow for configuration changes across platforms. I also have a mac directory under dotfiles/$username/mac. Most of the files in the /mac directory are symlinked from the linux directory as it's very similar, but there are some exceptions.
Finally, note that you can make this even more sophisticated with hostnames and the like rather than just a generic 'linux'. With a dotfiles.git, you can also raid dotfiles from your friends, which is awesome -- everyone has their own set of little tricks and hacks.
No, because it's not SSH using your config files, but the remote shell.
I suggest keeping your config files in Subversion or some other VCS. Here's how I do it.
Well, no, because as Andy Lester says, the remote machine is the one doing the work, and it has no access back to your local machine to get .vimrc ...
On the other hand, you could use sshfs to mount the remote file system locally and edit the files locally. This doesn't require you to install anything on the remote machine. Not sure how efficient it is, maybe not great for editing big files over slow links.
Or Komodo IDE has a neat "Open >> Remote File" option which lets you edit files on remote
machines by scping them back and forth automatically.
I do this kind of things every day. I have about 15 bash rc files and .vimrc, a few vim plugin scripts, .screenrc and some other rc files. I have a sync script (written in bash) which uses the cool rsync command to sync all these files to remote servers. Every time I update some files on my main server, I would call the script to sync them to remote servers.
Setting up a svn/git/hg repository on the main server also works for me but my remote servers need to be repeatedly reinstalled for testing. So I find it's more convenient to use rsync.
A few years ago I also used the rdist tool which can also meet the requirement for most of the time. But now I prefer rsync as it supports incremental sync which is very efficient.
ssh can be configured to pass certain environment variables through to the other (remote side). And since most shells will check some environment variables for additional settings to apply, you can hack that into applying some local settings remotely. But its a bit complicated and most administrators turn off the ssh environment variable pass-through in the sshd config anyways.
You could always just copy the files to the machine before connecting with ssh:
#!/bin/bash
scp ~/.bash_profile ~/.vimrc user#host:
ssh user#host
This works best if you are using keys to login and no one else logs in as that user.
Here's a simple bash script I've used for this purpose. It syncs over some folders I like to have copied over using rsync and then adds the ~/bin folder to the remote machines .bashrc if it's not there already. It works best if you have have copied your ssh keys to each server. I use this approach instead of a "dotfiles repo" as lots of the servers I connect to don't have git on them.
So to use it, you'd do something like this:
./bin_sync_to_machine.sh server1
bin_sync_to_machine.sh
function show_help()
{
echo ""
echo "usage: SERVER {SERVER2 SERVER3 etc...}"
echo ""
exit
}
if [ "$1" == "help" ]
then
show_help
fi
if [ -z "$1" ]
then
show_help
fi
# Sync ~/bin and some dot files to remote server using rsync
for SERVER in $*; do
rsync -avrz --progress ~/bin/ -e ssh $SERVER:~/bin
rsync -avrz --progress ~/.vim/ -e ssh $SERVER:~/.vim
rsync -avrz --progress ~/.vimrc -e ssh $SERVER:~/.vimrc
rsync -avrz --progress ~/.aliases $SERVER:~/.aliases
rsync -avrz --progress ~/.aliases $SERVER:~/.bash_aliases
# Ensure remote server has ~/bin in the path
ssh $SERVER '~/bin/path_add_to_path.sh'
done
path_add_to_path.sh
pathadd() {
if [ -d "$1" ] && [[ ":$PATH:" != *":$1:"* ]]; then
PATH="${PATH:+"$PATH:"}$1"
fi
}
# Add to current path if running in a shell
pathadd ~/bin
# Add to ~/.bashrc
if ! grep -q PATH:~/bin ~/.bashrc; then
echo "PATH=\$PATH:~/bin" >> ~/.bashrc
fi
if ! grep -q source ~/.aliases ~/.bashrc; then
echo "source ~/.aliases" >> ~/.bashrc
fi
I wrote an extremely simple tool for this that will allow you to natively transport your .vimrc file whenever you ssh, by using SSHd built-in config options in a non-standard way.
No additional svn,scp,copy/paste, etc required.
It is simple, lightweight, and works by default on all server configurations I have tested so far.
https://github.com/gWOLF3/viSSHous
I think that https://github.com/fsquillace/kyrat does what you need.
I wrote it long time ago before sshrc was born and it has more benefits compared to sshrc:
It does not require dependencies on xxd for both hosts (which can be unavailable on remote host)
Kyrat uses a more efficient encoding algorithm
It is just ~20 lines of code (really easy to understand!)
No need of root access or any installations to the remote host
For instance:
$> echo "alias q=exit" > ~/.config/kyrat/bashrc
$> kyrat myuser#myserver.com
myserver.com $> q
exit

Using rsync to delete a single file

File foo.txt exists on the remote machine at: /home/user/foo.txt
It doesn't exist on the local machine.
I want to delete foo.txt using rsync.
I do not know (and assume for the purposes of this question that I cannot find out) what other files are in /home/user on either the local or remote machines, so I can't just sync the whole directory.
What rsync command can I use to delete foo.txt on the remote machine?
Try this:
rsync -rv --delete --include=foo.txt '--exclude=*' /home/user/ user#remote:/home/user/
(highly recommend running with --dry-run first to test it) Although it seems like it would be easier to use ssh...
ssh user#remote "rm /home/user/foo.txt"
That's a bit trivial, but if, like me, you came to this page looking for a way to delete the content of a directory from remote server using rsync, this is how I did it:
Create an empty mock folder:
mkdir mock
Sync with it:
rsync -arv --delete --dry-run ~/mock/ remote_server:~/dir_to_clean/
Remove --dry-run from the line above to actually do the thing.
As suggested above, use --dry-run to test prior. --delete deletes files on the remote location per the rsync man page.
rsync -rv --delete user#hostname.local:full/path/to/foo.txt
Comment below stating this will list only is incorrect. To list only use --list-only and remove --delete.
Just came across the same problem, needed to use rsync to delete a remote file, as only rsync and no other SSH commands were allowed. The --remove-source-files option (formerly known as --remove-sender-files) did exactly that:
rsync -avPn --remove-source-files remote:/home/user/foo.txt .
rm foo.txt
As always, remove the -n option to really execute this.

Resources