Source file size increase during rsync - rsync

I backup a directory with rsync. I looked at the directory size before I started the rsync with du -s, which reported a directory size of ~1TB.
Then I started the rsync and during the sync I looked at the size of the backup directory to get an estimated end time. When the backup grew much larger than 1TB I got curious. It seems that the size of many files in the source directory increases. I did an du -s on a file in the source before and after the rsync process copied that file:
## du on source file **before** it was rsynced
# du -s file.dat
2 file.dat
## du on source file **after** it was rsynced
# du -s file.dat
4096 file.dat
```
The rsync command:
rsync -av -s --relative --stats --human-readable --delete --log-file someDir/rsync.log sourceDir destinationDir/
The file system on both sides (source, destination) is BeeGFS 6.16 on RHEL 7.4, kernel 3.10.0-693
Any ideas what is happening here?

file.dat is maybe a sparse file. Use option --sparse :
-S, --sparse
Try to handle sparse files efficiently so they take up less
space on the destination. Conflicts with --inplace because it’s
not possible to overwrite data in a sparse fashion.
Wikipedia about sparse files:
a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space.
A sparse file can be created as follows:
$ dd if=/dev/zero of=file.dat bs=1 count=0 seek=1M
Now let's examine and copy it:
$ ls -l file.dat
.... 1048576 Nov 1 20:59 file.dat
$ rsync file.dat file.dat.rs1
$ rsync --sparse file.dat file.dat.rs2
$ du -sh file.dat*
0 file.dat
1.0M file.dat.rs1
0 file.dat.rs2

Related

Issues using rsync to migrate files to new server

I am trying to copy a directory full of directories and small files to a new server for an app migration. rsync is always my go to tool for this type of migration but this time it is not working as expected.
The directory has 174,412 files and is 136g in size. Based on this I created a 256G disk for them on the new server.
The issue is when I rsync'd the files over to the new server the new partition ran out of space before all files were copied.
I did some tests with a bigger destination disk on my test machine and when it finishes the total size on the new disk is 272G
time sudo rsync -avh /mnt/dotcms/* /data2/
sent 291.61G bytes received 2.85M bytes 51.75M bytes/sec
total size is 291.52G speedup is 1.00
df -h /data2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data2vg-data2lv 425G 272G 154G 64% /data2
The source is on a NAS and the new target is a XFS file system so first I thought it may be a block size issue. But then I used the cp command and it copied the exact same size.
time sudo cp -av /mnt/dotcms/* /data
df -h /data2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data2vg-data2lv 425G 136G 290G 32% /data2
Why is rsync increasing the space used?
According to the documentation, dotcms makes use of hard links. So, you need to give rsync the -H option to preserve them. Note that GNU's cp -av preserves hard links so doesn't have this problem.
Other rsync options you should consider using include:
-H, --hard-links : preserve hard links
-A, --acls : preserve ACLs (implies --perms)
-X, --xattrs : preserve extended attributes
-S, --sparse : turn sequences of nulls into sparse blocks
--delete : delete extraneous files from destination dirs
This assumes you are running as root and that the destination is supposed to have the same users/groups as the source. If the users and groups are not the same, then #Cyrus' alternative commandline using --numeric-id may be more appropriate.

How to download ports OpenBSD

How to extract the ports, src, sys trees to /user/ports, /usr/src, usr/src/sys in OpenBSD. Command to do the download and the untar.
Check the PortsFetch section of the faq:
Once you have decided which flavor of the ports tree you want, you can get it from different sources. The table below gives an overview of where you can find the different flavors, and in which form. An 'o' marks availability and '-' means it is not available through that specific source.
Look for a file named ports.tar.gz on the mirrors.
$ cd /tmp
$ ftp https://ftp.openbsd.org/pub/OpenBSD/$(uname -r)/{ports.tar.gz,SHA256.sig}
$ signify -Cp /etc/signify/openbsd-$(uname -r | cut -c 1,3)-base.pub -x SHA256.sig ports.tar.gz
You want to untar this file in the /usr directory, which will create /usr/ports and all the directories under it.
# cd /usr
# tar xzf /tmp/ports.tar.gz

tar command not processing

I want to restore previous backuped directory (spect folder into /opt)
Architecture (Solaris 10) :
root#sms01sxdg /opt> ls -ltr
total 22
[...]
drwxr-xr-x 2 specadm nms 1024 Dec 24 13:40 spect
root#sms01sxdg /opt>
root#sms01sxdg /opt> df -kh
Filesystem size used avail capacity Mounted on
/dev/md/dsk/d0 9.8G 4.2G 5.6G 43% /
[...]
/dev/md/dsk/d30 7.9G 94M 7.7G 2% /opt/spect
root#sms01sxdg /opt>
I have previously backuped folder with tar command : tar cvf spect.tar spect.
It has worked successfully and when I launch tar -tf spect.tar it shows the sub-folders/files into.
When I try to restore backup, it doesn't work or more precisely, it returns nothing and files are finally not extracted.
root#sms01sxdg /opt> tar -xvf /export/specbackup_db/spect.tar .
root#sms01sxdg /opt> ls -l spect/
total 0
root#sms01sxdg /opt>
I suspect that the folder I have backup is a mount point and it is the cause of this problem.
But it seems the mount point is still mounted.
I have always performed this kind of command but it is the first time I encounter this kind.
Try again after removing the dot at the end of the command. Generally, you can specify the files that should be extracted from the tar by listing their paths after the tar file name ex: tar -xvf tarfile.tar file1 file2. this will extract only file1 and file2 from the tarfile.tar.

Why does du report a different file size from ls on this single file?

Hello I am trying to find size of a huge directory in my Unix system.
I'm using the command "du -k". But it is giving me weird results.
I narrowed down to just check the size of one tif file.
When did a ls -l
-rw-rw-rw- 1 dsp.ts5 datafeed 83239394 Jun 10 2013 V001.tif
The file size here is approx 83MB.
and when executed du -k V001.tif
108914 V001.tif
The file size here is 108MB!
I am having a hard time finding out why the two commands are returning different results?
du -k returns the number of 1K blocks.
ls -l shows the number of bytes.
It's important to understand the flags that you pass to your programs. I think you're using du -sk without understanding why you need the -s or the -k. Specifically, -s makes no sense if you are giving it the name of a single file, because -s is for summarizing the results of many files or directories.
man du and man ls will tell you all about the options that you can use.

inotify and rsync on large number of files

I am using inotify to watch a directory and sync files between servers using rsync. Syncing works perfectly, and memory usage is mostly not an issue. However, recently a large number of files were added (350k) and this has impacted performance, specifically on CPU. Now when rsync runs, CPU usage spikes to 90%/100% and rsync takes long to complete, there are 650k files being watched/synced.
Is there any way to speed up rsync and only rsync the directory that has been changed? Or alternatively to set up multiple inotifywaits on separate directories. Script being used is below.
UPDATE: I have added the --update flag and usage seems mostly unchanged
#! /bin/bash
EVENTS="CREATE,DELETE,MODIFY,MOVED_FROM,MOVED_TO"
inotifywait -e "$EVENTS" -m -r --format '%:e %f' /var/www/ --exclude '/var/www/.*cache.*' | (
WAITING="";
while true; do
LINE="";
read -t 1 LINE;
if test -z "$LINE"; then
if test ! -z "$WAITING"; then
echo "CHANGE";
WAITING="";
rsync --update -alvzr --exclude '*cache*' --exclude '*.git*' /var/www/* root#secondwebserver:/var/www/
fi;
else
WAITING=1;
fi;
done)
I ended up removing the compression option (z) and upping the WAITING var to 10 (seconds). This seems to have helped, rsync still spikes CPU load but it is shorter lived. Credit goes to an answer on unix stackexchange
You're using rsync to synchronize the root directory of a large tree, so I'm not surprised at the performance loss.
One possible solution is to only synchronize the changed files/directories, instead of the whole root directory.
For instance, file1, file2 and file3 lay under from/dir. When changes are made to these 3 files, use
rsync --update -alvzr from/dir/file1 from/dir/file2 from/dir/file3 to/dir
rather than
rsync --update -alvzr from/dir/* to/dir
But this has a potential bug: rsync won't create directories automatically if target folders don't exist. However, you can use ssh to execute remote command and create directories by yourself.
You may need to set SSH public-key authentication as well, but according to the rsync command line you paste, I assume you've already done this.
reference:
rsync - create all missing parent directories?
rsync: how can I configure it to create target directory on server?
How to use SSH to run a shell script on a remote machine?
SSH error when executing a remote command: "stdin: is not a tty"

Resources