Consider the following algorithm for copying a file:
1 Open the source file for exclusive access. (Assume that this blocks until
no other process has the file open. Once this call returns, other processes
that attempt to open the file block until the file has been closed again by
the current process.)
2 Open the destination file for exclusive access.
3 Copy data from source to destination.
4 Close the destination file.
5 Close the source file.
What can possibly go wrong?
A competing thread or process could be using a different algorithm, and open the destination file first, then sit there waiting for the source file to become available.... Both processes would be waiting for each other.
Related
BizTalk SFTP receive port not picking files larger than 1GB(in my csae i can receive upto 5GB files). Even though it picks the file its very slow and before the whole file is dropped into the file folder the orchestration starts unzipping the zip file and throws error : cannot unzip as the file is being used by another process.Any help?
What you are seeing is not a problem with BizTalk Server or the SFTP Adapter.
This is happening because the SFTP Server is allowing the download to start before the file is completely written. This can be because the SFTP Server is not honoring a write lock or the actual source app is doing multiple open-write-close cycles while emitting the data.
So, this is actually not your problem and not a problem you can solve as the client.
The SFTP Server either needs to block the download or a temporary location/filename must be used until the file is complete.
This is not an uncommon problem, but must be fixed on the server side.
I have a huge file on a server, e.g. a movie. Someone starts to down load that file. The download is not immediate, because the network has a certain maximum transmission rate. While the server is in the process of sending the file, I enter the command to delete the file.
What is the expected behavior?
Is the transmission cancelled?
Is the transmission completed first?
And if it is completed first, what if another request to download that file comes in before the delete command is carried out? Is that request queued behind the delete command or is it carried out parallel to other commands so that it is begun before the delete comes into effect, effectively keeping on blocking it.
On my desktop computer I cannot delete a file that is in use. Do web servers differ?
If the platform is Windows you can't delete the file.
if the platform is Unix- or Linux-based you can delete the file: however it remains in existence while it is open, which includes while it is being transmitted.
I'm not aware of any operating system where you are notified that a file you have open has been deleted, which is the only mechanism that could possibly cause transmission to be cancelled.
I'm using a script which runs once an hour to rsync some files.
Sometimes the script is interrupted before it completes, e.g. because the client or server shut down.
For most files this is not an issue, as next time the script runs it will copy those.
However, for some large files that take a long time over LAN it may be interrupted before it completes. This means that rsync will have to start from scratch the next time but if it is interrupted again this second time etc then the files will never copy.
I'm therefore thinking of adding the -partial flag as described here:
Resuming rsync partial (-P/--partial) on a interrupted transfer
https://unix.stackexchange.com/questions/2445/resume-transfer-of-a-single-file-by-rsync
I have tested with "-partial" and it does work, i.e the operation continues from the last transferred file fragment.
My concern is whether this increases the risk of corrupted files?
https://lists.samba.org/archive/rsync/2002-August/003526.html
https://lists.samba.org/archive/rsync/2002-August/003525.html
Or to put it another way, even if "-partial" does create some corruption, then then next time rsync runs will it find and "correct" those corrupted blocks?
If that's the case then I'm OK to use "-partial" and in the case of any corruption it will simply be corrected the next time?
Thanks.
PS: I don't want to use "-c" as it creates too much hard disk activity.
From the rsync documentation on --checksum:
Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.
So, yes, rsync will correct any partially transferred files, no matter how corrupted they are (in the worst case, all the data will be transferred again).
I want to know what happens in the kernel when an open() system call is invoked? How does it return a file descriptor for a file?
The kernel creates internally a structure containing additional informations about the file you did just open. This structure holds informations such as the inode number, the name of the file on the file system, its size, its associated superblock, etc ...
In fact, within the kernel, it is the VFS (Virtual File System) that handles I/O operations on a file will it be local (on you hard disk) or remote (located on an FTP server for instance like ftpfs does).
Every file systems on GNU/Linux implements the same mechanisms of opening/reading/writing/closing files. This ensures every developers don't have to bother about what kind of file they are trying to access, no matter what kind of file you are interacting with, the same open(), read() ... APIs can be used. You can find additional informations on what the VFS is here and here (great article by IBM).
Finally, each file descriptor that is returned by let's say open is relative to your program, so the first file you might be opening will be associated to the file descriptor 3 and so on ... It is possible to find out what file descriptors are binded to each process on many GNU/Linux distributions via /proc/{pid_of_your_process}.
If you really want to dive deep, you can browse the source for many unix variants. For linux, check out http://lxr.linux.no/#linux+v3.9/fs/open.c -- search for SYSCALL_DEFINE3(open, to get to the actual "open" syscall.
The kernel:
looks for the file (hard drive, usb, named pipes, standard streams, ...)
if everything went well, saves itself a descriptor that you opened the file
returns you a descriptor
if you close() or the process exits, releases it's info about your open()
biztalk 2010 cu4, win2k8 server, no anti virus
I'm having an issue where the biztalk file adapter is picking up the exact same file twice intermittently. This happens to both receive locations that are either unc remote or local across 2 different receive locations in 2 different applications.
The receive location has all default settings. I've tried setting rename files ticked and unticked with no resolution to the issue. The file masks are of \H3OR*.txt.
The time of pickup being the 'unparsed interchanges' between the duplicates is never greater than 1 second. 2 ms is common. Looking at the unparsed interchanges of the duplicates, the context properties 'receivedfilename' is exactly the same. The occurrence of the duplication is roughly 1 in 8 files being received.
The receive location does have credentials to the unc path and it does delete files after it's done with them.
Restarting both the receive location and the biztalk host has no effect.
Let me know if you need any more info.
thanks.
Sometimes the problem lies elsewhere. Are you sure the upstream process, which creates these files, in the first place, isn't duplicating them i.e. sending the same file in quick succession?
You might test this by creating another send port, which subscribes to these files but writes them out to a folder but appends the %MessageID% to the %SourceFileName%.
If you have 2 files with the same %SourceFileName% but different %MessageID% with 1 or more seconds apart, it proves the problem is upstream.