What are alternatives to saving a file with a really long filename? - unix

I have an unarchiver that takes in an archive name, and a directory name, and dumps all files from that archive into that directory. No other command-line options. However, someone zipped a file in the archive I am looking to decompress, with 500-ish characters in the filename, and now that program fails when it hits that file (practically all file systems have a limit of 256). What alternative do I have, short of changing the source code and recompiling the unarchiver?
I must mount something as a directory, which would take the files that the unarchiver is writing, and dump them elsewhere-- possibly even as one big file. This something should not send fail messages, even if some write really did fail. Is this possible?

Related

Replacing static files that are under heavy read load

Let us assume we have a static file server (Nginx + Linux) that serves 10 files. The files are read almost as frequently as the server can process. However, some of the files need to be replaced with new versions, so that the filename and URL address remain unaltered. How to replace the files safely without a fear that some reads fail or become a mix of two versions?
I understand this is a rather basic operating system matter and has something to do with renames, symlinks, and file sizes. However, I failed to find a clear reference or a good discussion and I hope we can build one here.
Use rsync. Typically I choose rsync -av src dst, but YMMV.
What is terrific about rsync is that, in addition to having essentially zero cost when little or nothing changed, it uses atomic rename. So during file transfer, a ".fooNNNNN" temp file gets bigger and bigger. Once completed, rsync closes the file and renames it on top of "foo". So web clients either see all of the old, or all of the new file. Notice that range downloads (say from restart after error) are not atomic, exposing such clients to lossage, especially if bytes were inserted near beginning of file. SHA1 wouldn't validate for such a client, and he would have to restart his download from scratch. BTW, if these are "large" files, tell nginx to use zero-copy sendfile().

Best method for dealing with Unix compressed files (.Z) in IDL?

I'm working on some code in IDL that retrieves data files through FTP that are Unix compressed (.Z) files. I know IDL can work with .gz compressed files with the /compress keyword however it doesn't seem capable of playing nicely with the .Z compression.
What are my options for working with these files? The files I am downloading are coming from another institution so I have no control in the compression being used. Downloading and decompressing the files manually before running the code is an absolute last resort as it makes things a lot more difficult as I don't always know which files I need from the FTP site in advance so the code grabs the ones needed based on the parameters in real time.
I'm currently running on Windows 7 but once the code is finished it will be used on a Unix system as well (computer cluster).
You can use SPAWN as you note in your comment (assuming you can find an equivalent of the Unix uncompress command that runs on Windows), or for higher speed you can use an external C function with CALL_EXTERNAL to do the decompression. Just by coincidence, I posted an answer on stackexchange the other day with just such a C function to decompress .Z files here.

Determine file compression type

I backed up a large number of files to S3 from a PC before switching to a Mac several months ago. Several months later, I'm now trying to open the files and realized the files were all compressed by the S3 GUI tool I used so I can not open them.
I can't remember what program I used to upload the files and standard decompression commands from the command line are not working e.g.,
unzip
bunzip2
tar -zxvf
How can I determine what the compression type is of the file? Alternatively, what other decompression techniques can I try?
PS - I know the files are not corrupted because I tested downloading and opening them back when I originally uploaded to S3.
You can use Universal Extractor (open source) to determine compression types.
Here is a link: http://legroom.net/software/uniextract/
The little downside is that it looks in the first place for the extension, but I manage to change the extensions myself for a inknown file and it works almost always, eg .rar or .exe etc..
EDIT:
I found a huge list of archive programs, maybe one of them will work? It's ridiciously big:
http://www.maximumcompression.com/data/summary_mf.php
http://www.maximumcompression.com/index.html

How do I Download efficiently with rsync?

A couple of questions related to one theme: downloading efficiently with Rsync.
Currently, I move files from an 'upload' folder onto a local server using rsync. Files to be moved are often dumped there, and I regularly run rsync so the files don't build up. I use '--remove-source-files' to remove files that have been transferred.
1) the '--delete' options that remove destination files have various options that allow you to choose when to remove the files. This would be handly for '--remove-source-files' since is seems that, by default, rsync only removes the files after all files have been transferred, rather than after each file; Othere than writing a script to make rsync transfer files one-by-one, is there a better way to do this?
2) on the same problem, if a large (single) file is transferred, it can only be deleted after the whole thing has been sucessfully moved. It strikes me that I might be able to use 'split' to split the file up into smaller chunks, to allow each to be deleted as the file downloads; is there a better way to do this?
Thanks.

How do the UNIX commands mv and rm work with open files?

If I am reading a file stored on an NTFS filesystem, and I try to move/rename that file while it is still being read, I am prevented from doing so. If I try this on a UNIX filesystem such as EXT3, it succeeds, and the process doing the reading is unaffected. I can even rm the file and reading processes are unaffected. How does this work? Could somebody explain to me why this behaviour is supported under UNIX filesystems but not NTFS? I have a vague feeling it has to do with hard links and inodes, but I would appreciate a good explanation.
Unix filesystems use reference counting and a two-layer architecture for finding files.
The filename refers to something called an inode, for information node or index node. The inode stores (a pointer to) the file contents as well as some metadata, such as the file's type (ordinary, directory, device, etc.) and who owns it.
Multiple filenames can refer to the same inode; they are then called hard links. In addition, a file descriptor (fd) refers to an inode. An fd is the type of object a process gets when it opens a file.
A file in a Unix filesystem only disappears when the last reference to it is gone, so when there are no more names (hard links) or fd's referencing it. So, rm does not actually remove a file; it removes a reference to a file.
This filesystem setup may seem confusing and it sometimes poses problems (esp. with NFS), but it has the benefit that locking is not necessary for a lot of applications. Many Unix programs also use the situation to their advantage by opening a temporary file and deleting it immediately after. As soon as they terminate, even if they crash, the temporary file is gone.
On unix, a filename is simply a link to the actual file(inode). Opening a file also creates a (temporary) link to the actual file. When all links to a file have disappeared (rm and close()) then the file is removed.
On NTFS, logically the filename is the file. There's no indirection layer from the filename to the file metainfo, they're the same object. If you open it, it's in use and can't be removed, just as the actual file(inode) on unix can't be removed while it's in use.
Unix: Filename ➜ FileInfo ➜ File Data
NTFS: FileName + FileInfo ➜ File Data

Resources