Why isn't rsync faster at copying a modified file locally? - rsync

$ dd if=/dev/urandom of=1 bs=1048576 count=3
3+0 records in
3+0 records out
3145728 bytes transferred in 0.263337 secs (11945641 bytes/sec)
$ rsync -avz 1 2
building file list ... done
1
sent 3147373 bytes received 42 bytes 6294830.00 bytes/sec
total size is 3145728 speedup is 1.00
$ dd if=/dev/urandom of=new_prefix bs=1048576 count=3
3+0 records in
3+0 records out
3145728 bytes transferred in 0.276985 secs (11357037 bytes/sec)
$ cat 1 >> new_prefix
$ rsync -avz new_prefix 2
building file list ... done
new_prefix
sent 6294646 bytes received 42 bytes 4196458.67 bytes/sec
total size is 6291456 speedup is 1.00
Why aren't I receiving any speed-up when adding a prefix to file? AFAIK, rsync shouldn't just yield speedups for in-place modifications.

So what you're doing is:
Use rsync to copy a local file 1 to 2.
Make a new file new_prefix that is the same as 1 but has some more data inserted at the start.
Copy new_prefix on top of 2.
Think about what rsync has to do, to execute step 3.
Remember there is no OS interface to say "insert data at the start of a file": the only option is to rewrite the entire file 2. So rsync has to read the entire new_prefix file, and then write the entire 2. The IO is the limiting factor and there's no magic way around it.
If file 2 was remote, then librsync can make use of the similarity to send less network traffic and probably will show a speedup.

Related

Asterisk EAGI audio while running AMD or other asterisk app via "EXEC"

Is it possible to use "AMD" to detect silence in EAGI script and receive the audio on fd 3 at the same time?
Is this scenario supported or I am doing something wrong?
Simple demonstration bash script, which is run as EAGI(/home/agi/eagi.sh) from asterisk:
#!/bin/bash
log=/tmp/eagi.log
# Read all variables sent by Asterisk to array
declare -a array
while read -e ARG && [ "$ARG" ] ; do
array=(` echo $ARG | sed -e 's/://'`)
export ${array[0]}=${array[1]}
echo $ARG | sed -e 's/://' >>$log
done
/usr/bin/dd if=/dev/fd/3 of=/tmp/eagi.tmp.out &>>$log &
### or just sleep 10 ###
sleep 1
echo "EXEC AMD"
read line # blocks until silence is detected by AMD
echo $line >>$log
sleep 1
### ###
kill -USR1 %1; sleep 0.1; kill %1
ls -lh /tmp/eagi.tmp.out >>$log
echo "EXEC HANGUP "
read line
echo $line >>$log
exit
What it does is it starts capturing the audio data from fd 3 via dd started as background process. When I have just sleep 10 instead of the echo EXEC AMD, after the 10 seconds, dd has recorded the full audio file.
However with "AMD", dd stops receiving data on fd 3 as soon as the "AMD" is executed (confirmed also via strace) and continues after "AMD" finishes. So while "AMD" is running, no audio is recorded.
Output in the logfile looks like this:
Working (with just sleep):
1522+501 records in
1897+0 records out
971264 bytes (971 kB, 948 KiB) copied, 10.0023 s, 97.1 kB/s
-rw-r--r-- 1 asterisk asterisk 958K Sep 24 10:16 /tmp/eagi.tmp.out
Non-working (with "AMD" which detected silence after 6 seconds, and dd was running the whole time but only 1 second before and 1 second after "AMD" was recorded into the file):
322+101 records in
397+0 records out
203264 bytes (203 kB, 198 KiB) copied, 8.06516 s, 25.2 kB/s
-rw-r--r-- 1 asterisk asterisk 208K Sep 24 10:13 /tmp/eagi.tmp.out
So is this some kind of bug in Asterisk, or just unsupported usage? I didn't find much info about EAGI in the Asterisk documentation, so not sure what is supported and what not. Version of Asterisk is 16.2.1 on Debian 10, the testing call was done via webphone on Chrome browser, audio passed via fd 3 was 48 kHz, 16bit, mono (maybe with some other audio format/codec, both fd 3 and "AMD" would work at the same time?)
EDIT2: Removed info about my complicated setup and added simple reproducible example.
EDIT3: During further debugging I used "EXEC Background" to output some short audio file to the caller and also during this no audio was recorded. So the issue seems to be not only with "EXEC AMD", but also "EXEC Background" and probably also other asterisk applications invoked by "EXEC".

Issues using rsync to migrate files to new server

I am trying to copy a directory full of directories and small files to a new server for an app migration. rsync is always my go to tool for this type of migration but this time it is not working as expected.
The directory has 174,412 files and is 136g in size. Based on this I created a 256G disk for them on the new server.
The issue is when I rsync'd the files over to the new server the new partition ran out of space before all files were copied.
I did some tests with a bigger destination disk on my test machine and when it finishes the total size on the new disk is 272G
time sudo rsync -avh /mnt/dotcms/* /data2/
sent 291.61G bytes received 2.85M bytes 51.75M bytes/sec
total size is 291.52G speedup is 1.00
df -h /data2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data2vg-data2lv 425G 272G 154G 64% /data2
The source is on a NAS and the new target is a XFS file system so first I thought it may be a block size issue. But then I used the cp command and it copied the exact same size.
time sudo cp -av /mnt/dotcms/* /data
df -h /data2
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/data2vg-data2lv 425G 136G 290G 32% /data2
Why is rsync increasing the space used?
According to the documentation, dotcms makes use of hard links. So, you need to give rsync the -H option to preserve them. Note that GNU's cp -av preserves hard links so doesn't have this problem.
Other rsync options you should consider using include:
-H, --hard-links : preserve hard links
-A, --acls : preserve ACLs (implies --perms)
-X, --xattrs : preserve extended attributes
-S, --sparse : turn sequences of nulls into sparse blocks
--delete : delete extraneous files from destination dirs
This assumes you are running as root and that the destination is supposed to have the same users/groups as the source. If the users and groups are not the same, then #Cyrus' alternative commandline using --numeric-id may be more appropriate.

Rsync overwrite files without write permission

I'm trying to sync directories within the same machine, basically copying files from one directory to another directory.
under certain circumstances, the write permission of the destination files will be removed to protect them. However, rsync command seems to ignore the lack of write permission and overwrite all the files in the destination anyway. Any idea why?
Command used(all have the same problem):
$ rsync -azv --delete source/ destination/
$ rsync -azv source/ destination/
version:
rsync version 2.6.9 protocol version 29
destination file permission: -r--r--r--,
source file permission: -rwxrwxrwx,
destination file owner: same owner(not root though),
output:
building file list ... done
sent 101 bytes received 26 bytes 254.00 bytes/sec
total size is 1412 speedup is 11.12
resulting destination file: -rwxrwxrwx
OS:
both macOS(latest) and redhat linux

Source file size increase during rsync

I backup a directory with rsync. I looked at the directory size before I started the rsync with du -s, which reported a directory size of ~1TB.
Then I started the rsync and during the sync I looked at the size of the backup directory to get an estimated end time. When the backup grew much larger than 1TB I got curious. It seems that the size of many files in the source directory increases. I did an du -s on a file in the source before and after the rsync process copied that file:
## du on source file **before** it was rsynced
# du -s file.dat
2 file.dat
## du on source file **after** it was rsynced
# du -s file.dat
4096 file.dat
```
The rsync command:
rsync -av -s --relative --stats --human-readable --delete --log-file someDir/rsync.log sourceDir destinationDir/
The file system on both sides (source, destination) is BeeGFS 6.16 on RHEL 7.4, kernel 3.10.0-693
Any ideas what is happening here?
file.dat is maybe a sparse file. Use option --sparse :
-S, --sparse
Try to handle sparse files efficiently so they take up less
space on the destination. Conflicts with --inplace because it’s
not possible to overwrite data in a sparse fashion.
Wikipedia about sparse files:
a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space.
A sparse file can be created as follows:
$ dd if=/dev/zero of=file.dat bs=1 count=0 seek=1M
Now let's examine and copy it:
$ ls -l file.dat
.... 1048576 Nov 1 20:59 file.dat
$ rsync file.dat file.dat.rs1
$ rsync --sparse file.dat file.dat.rs2
$ du -sh file.dat*
0 file.dat
1.0M file.dat.rs1
0 file.dat.rs2

Wget - how not to download next files if there is no connection

When using wget with the recursive option (-r), if the page.html contains 100 different download links to 100 different files, if the connection between the PC and the device is interrupted (after the download of 20 file), the program continues trying to download the other 80 files, so wasting a lot of time.
I tried theese options but nothing...
-T 1
-t 1
--dns-timeout=1
--connect-timeout=1
--read-timeout=1
Is there way to stop wget after N minutes of no connection?

Resources