rsync running at the same time on two machines - rsync

I would like to set up rsync to run on two machines.
Machine A rsync to Machine B
Machine B rsync to Machine A
Would there be any risk of running rsync in two machines at the same time, synching the same files?

If you're very careful you can do it:
Ensure that you use --times so that the time stamps always match, and --update so that it will not overwrite newer versions.
Use --temp-dir with a directory outside the synced tree so that rsync does not find any temp files during its scan. The temporary directory should be on the same file-system as the destination, however, or else atomic moves won't be possible.
Do not use --inplace, any variant of --delete, or --checksum.
Even then, I would always do it first one way, and then the other, or use a tool like SyncThing.

Related

Any way to make rsync based transfers faster using a staging directory

I rsync between two systems using ssh. The command line that I use is something like:
rsync -avzh -e "ssh" ./builds/2023_02_02 akamaicdn:builds/2023_02_02
The akamaicdn is accessed using ssh and the corresponding identity, host name etc are specified in ~/.ssh/config file.
Most of the times the destination dir doesn't exist. That means it is a full upload after rsync optimizations.
But the content that gets uploaded next day has lot of things similar to the ones from previous day as these are build dirs and we have lot of common content between them.
Is there any way to tell remote rsync to use set of previous dirs to scan when it is determining what parts of a file have to be uploaded?
I am open to other optimizations if you can think of.

rsync doesn't copy *only* modifications

I'm using rsync to backup my files. I choose rysnc because it (should) use modification times to determine if changes have been made and if files need to be updated.
I started my backup (from my computer system (debian) to a portable external hard drive) with this command:
rsync -avz --update --delete --stats --progress --exclude-from=/home/user/scripts/ExclusionRSync --backup --backup-dir=/media/user/hdd/backups/deleted-files /home/user/ /media/user/hdd/backups/backup_user
It worked well and took a lot of time. I believed the second time would be very quick (since I didn't modify files). Unfortunately, the 2nd, 3th, 4th, ... times took as long as the first one. I still see all my files being copied even if these files are already in my portable hard drive.
I don't understand why rsync doesn't copy only modifications (rsync is known to be efficient and only copy changes and I specificly call --update option).
A side effect of this problem is that all files are moved to my backup dir (deleted-filed) as soon as they are transfered. Indeed, rsync delete the previous file before to copy the same file during each update...
I found the solution reading an answer on Serverfault.SE. The Fat filesystem was messing with timestamps:
FAT doesn't track modification times on files as precisely as, say
ext3 (FAT is only precise to within a 2 second window). This leads to
particularly nasty behavior with rsync as it will sometimes decide
that the original files is newer or older than the backup file by
enough that it needs to re-copy the data or at least re-check the
hashes. All in all, it makes for very poor performance on backups. If
you must stick with FAT, look into rsync's --size-only and
--modify-window flags as workarounds.

synchronization over http: rsync versus normal upload

I'm running file synchronization over HTTP. Both sides implement rsync. When synchronizing, for uploading I have two choices:
use a simple post request if:
the file to be uploaded does'nt exists on the remote side.
the file exists and is bigger than a certain value M.
else : perform rsync over get requests.
My question is: How can I determine the perfect value of M.
I'm certain that for a certain file size, performing simple upload is faster than performing rsync steps . Especially for multiple files.
Thanks
If you're using rsync correctly, I'd bet that it's always faster, especially with multiple files.
Rsync is specially built to check differences between directory trees and update the target directory incrementatlly.
The following is a one-liner to keep in mind whenever you need to sync two directory trees.
rsync -av --delete /path/to/src /path/to/target
(also works over SSH, if necessary.)
Only keep in mind that rsync is picky about trailing slashes on directory paths.

Transfering millions of images -- RSync not good enough

We've got a folder, 130GB in size, with millions of tiny (5-20k) image files, and we need to move it from our old server (EC2) to our new server (Hetzner, Germany).
Our SQL files SCP'd over really quickly -- 20-30mb/s atleast -- and the first ~5gb or so of images transfered pretty quick, too.
Then we went home for the day, and coming back in this morning, our images have slowed to only ~5kb/s in transfer. RSync seems to slow down as it hits the middle of the workload. I've looked into alternatives, like gigasync (which doesn't seem to work), but everyone seems to agree rsync is the best option.
We have so many files, doing ls -al takes over an hour, and all my attempts at using python to batch up our transfer into smaller parts have eaten all available RAM without successfully completing.
How can I transfer all these files at a reasonable speed, using readily available tools and some light scripting?
I don't know if it will significantly faster, but maybe a
cd /folder/with/data; tar cvz | ssh target 'cd /target/folder; tar xvz'
will do the trick.
If you can, maybe restructure your file arrangement. In similiar situations, I group the files project-wise or just 1000-wise together so that a single folder doesn't have too many entries at once.
But I can imagine that the necessity of rsync (which I otherwise like very well, too) to keep a list of transferred files is responsible for the slowness. If the rsync process occupies so much RAM that it has to swap, all is lost.
So another option could be to rsync folder by folder.
It's likely that the performance issue isn't with rsync itself, but a result of having that many files in a single directory. Very few file systems perform well with a single huge folder like that. You might consider refactoring that storage to use a hierarchy of subdirectories.
Since it sounds like you're doing essentially a one-time transfer, though, you could try something along the lines of a tar cf - -C <directory> . | ssh <newhost> tar xf - -C <newdirectory> - that might eliminate some of the extra per-file communication rsync does and the extra round-trip delays, but I don't think that will make a significant improvement...
Also, note that, if ls -al is taking an hour, then by the time you get near the end of the transfer, creating each new file is likely to take a significant amount of time (seconds or even minutes), since it first has to check every entry in the directory to see if it's in fact creating a new file or overwriting an old one.

How to test compress rate of rsync with two big local files?

I want to test if rsync will work to sync some huge DVD images containing installers in order too see if what speedup can I obtain from using rsync, if any.
I would like to run the test locally, how can I convince rsync to just evaluate how much data would be required in order to sync the two files?
PS. I am fully aware that I should try to sync small and uncompressed files, but this is outside the question in this case.
Just use
rsync -avz --log-file="/Users/username/rsync.log" /home/test /home/testlocation
then check log file for size and speed

Resources