Could using -p "-partial" in rsync result in corrupted files? - rsync

I'm using a script which runs once an hour to rsync some files.
Sometimes the script is interrupted before it completes, e.g. because the client or server shut down.
For most files this is not an issue, as next time the script runs it will copy those.
However, for some large files that take a long time over LAN it may be interrupted before it completes. This means that rsync will have to start from scratch the next time but if it is interrupted again this second time etc then the files will never copy.
I'm therefore thinking of adding the -partial flag as described here:
Resuming rsync partial (-P/--partial) on a interrupted transfer
https://unix.stackexchange.com/questions/2445/resume-transfer-of-a-single-file-by-rsync
I have tested with "-partial" and it does work, i.e the operation continues from the last transferred file fragment.
My concern is whether this increases the risk of corrupted files?
https://lists.samba.org/archive/rsync/2002-August/003526.html
https://lists.samba.org/archive/rsync/2002-August/003525.html
Or to put it another way, even if "-partial" does create some corruption, then then next time rsync runs will it find and "correct" those corrupted blocks?
If that's the case then I'm OK to use "-partial" and in the case of any corruption it will simply be corrected the next time?
Thanks.
PS: I don't want to use "-c" as it creates too much hard disk activity.

From the rsync documentation on --checksum:
Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.
So, yes, rsync will correct any partially transferred files, no matter how corrupted they are (in the worst case, all the data will be transferred again).

Related

Is there any portable or at least vendor-specific way to detect if a Unix socket is orphaned?

When a process creates UDS and exits abnormally, it leaves a socket file behind. On the next run, the program may have troubles seeing the file already exists.
Is there any way to detect if a socket file is orphaned? The best way should be POSIX and available on any UNIX brand, but something Linux/FreeBSD/Solaris/whatever-specific is of use as well.
I'm not asking on how to
make /tmp get cleared on reboot. Sometimes app crash without reboot.
use any GUI or even command-line tool to check it manually.
remove list of files before running a program or put an unlink before bind.
Well, looks like I was just one step from the answer.
There is nothing like SO_REUSEADDR for UDS, and I do believe that's for good reason
There is a way to guard socket file with lock file, which is a (relatively) clean and sane way
Using /tmp/socket.lock to guard a /tmp/socket, we have to
open it with O_RDONLY | O_CREAT
flock it with LOCK_EX | LOCK_NB
and never do anything with the guard. If flock is successful on next run, than no process hold the lock file, resp., no process use the socket. We are ok to remove it.
Of course, we assume that every program using the socket uses the protocol as well.
Details are at Victor Gadov's github, copied here due to fragile nature of links in Internet.

Transfer function that saves progress?

Does a function exist similar to scp where if the connection is lost, then the progress is saved, and resuming the process picks up where it left off? I am trying to scp a large file, and my VPN connection keeps cutting out.
Use rsync --partial. It will keep partially transferred files, which you can then resume with the same invocation. From the rsync man page:
--partial
By default, rsync will delete any partially transferred file if the transfer is interrupted. In some circumstances it is
more desirable to keep partially transferred files. Using the --partial option tells rsync to keep the partial file which
should make a subsequent transfer of the rest of the file much faster.
Try something like rsync -aivz --partial user#host:/path/to/file ~/destination/folder/
Explanation of the other switches:
a — "archive mode": make transfer recursive; preserve symlinks, permissions, timestamps, group, owner; and (where possible) preserve special and device files
i — "itemize changes": shows you what exactly is getting changed (it will be a string of all + signs if you're copying a file anew +++++++)
v — "verbose": list files as they're transferred
z — "zip": compress data during transfer
Those are just the ones I usually use to transfer files. You can see a list of all options by looking at the rsync man page.

MPI one-sided file I/O

I have some questions on performing File I/Os using MPI.
A set of files are distributed across different processes.
I want the processes to read the files in the other processes.
For example, in one-sided communication, each process sets a window visible to other processors. I need the exactly same functionality. (Create 'windows' for all files and share them so that any process can read any file from any offset)
Is it possible in MPI? I read lots of documentations about MPI, but couldn't find the exact one.
The simple answer is that you can't do that automatically with MPI.
You can convince yourself by seeing that MPI_File_open() is a collective call taking an intra-communicator as first argument and returning a file handler to the opened file as last argument. In this communicator, all processes open the file and therefore, all processes must see the file. So unless a process sees a file, it cannot get a MPI_file handler to access it.
Now, that doesn't mean there's no solution. A possibility could be to do by hand exactly what you described, namely:
Each MPI process opens individually the file they see and are responsible of; then
Each of theses processes reads this local file into a buffer;
Theses individual buffers are all exposed, using either a global MPI_Win memory windows, or several individual ones, ready for one-sided read accesses; and finally
All read accesses to any data that were previously stored in these individual local files, are now done through MPI_Get() calls using the memory window(s).
The true limitation of this approach is that it requires to fully read all of the individual files, therefore, you need to have sufficient memory per node for storing each of them. I'm well aware that this is a very very big caveat that could just make the solution completely impractical. However, if the memory is sufficient, this is an easy approach.
Another even simpler solution would be to store the files into a shared file system, or having them all copied on all local file systems. I imagine this isn't an option since the question wouldn't have been asked otherwise...
Finally, in last resort, a possibility I see would be to dedicate a MPI process (or an OpenMP thread of a MPI process) per node to serve each files. This process would just act as a "file server", answering "read" request coming from the other MPI processes, and serving them by reading the requested data from the file, and sending it back via MPI. It's a bit lengthy to write, but it should work.

Does deleting a file on a webserver cancel its transmission?

I have a huge file on a server, e.g. a movie. Someone starts to down load that file. The download is not immediate, because the network has a certain maximum transmission rate. While the server is in the process of sending the file, I enter the command to delete the file.
What is the expected behavior?
Is the transmission cancelled?
Is the transmission completed first?
And if it is completed first, what if another request to download that file comes in before the delete command is carried out? Is that request queued behind the delete command or is it carried out parallel to other commands so that it is begun before the delete comes into effect, effectively keeping on blocking it.
On my desktop computer I cannot delete a file that is in use. Do web servers differ?
If the platform is Windows you can't delete the file.
if the platform is Unix- or Linux-based you can delete the file: however it remains in existence while it is open, which includes while it is being transmitted.
I'm not aware of any operating system where you are notified that a file you have open has been deleted, which is the only mechanism that could possibly cause transmission to be cancelled.

NFS sync vs async

I'm using NFS to allow two servers two communicate via simple text files, however sometimes it seems that the server reading the text files to get information is reading incomplete files, and then because of this crashes. Then I go to look at the "incomplete" file that made it crash, and the file is complete. Is it possible that the server reading these files is seeing them before they are complete written by NFS? I use linux's mv to move them from the local machine to NFS only when they are completely written, so there "should" never be an incomplete state on the NFS.
Could this problem have something to do with sync vs async? Right now I'm using async. From my understanding, async just means that you return from the write and your program can continue running, and this write will happen at a later time. Whereas sync means that your process will wait for that write to go through before it moves on. Would changing to sync fix this? Or is there a better way to handle this? I know two servers could communicate via a database, but I'm actually doing this to try to keep database usage down. Thanks!
mv across file-systems translates into cp+rm and is certainly not atomic, even without NFS involved. You should first copy the file to a temporary in the target file-system and then rename it to the correct name. For instance, instead of:
$ mv myfile.txt /mnt/targetfs/myfile.txt
do:
$ mv myfile.txt /mnt/targetfs/.myfile.txt.tmp
$ mv /mnt/targetfs/.myfile.txt.tmp /mnt/targetfs/myfile.txt
(This assumes that the process reading the file ignores it while it does not have the correct name.)

Resources