How to durably rename a file in POSIX? - directory

What's the correct way to durably rename a file in a POSIX file system? Specifically wondering about fsyncs on the directories. (If this depends on the OS/FS, I'm asking about Linux and ext3/ext4).
Note: there are other questions on StackOverflow about durable renames, but AFAICT they don't address fsync-ing the directories (which is what matters to me - I'm not even modifying file data).
I currently have (in Python):
dstdirfd = open(dstdirpath, O_DIRECTORY|O_RDONLY)
rename(srcdirpath + '/' + filename, dstdirpath + '/' + filename)
fsync(dstdirfd)
Specific questions:
Does this also implicitly fsync the source directory? Or might I end up with the file showing up in both directories after a power cycle (meaning I'd have to check the hard link count and manually perform recovery), i.e. it's impossible to guarantee a durably atomic move operation?
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
Thanks in advance.

Unfortunately Dave’s answer is wrong.
Not all POSIX systems might even have a durable storage. And if they do, it is still “allowed” to be hosed after a system crash. For those systems a no-op fsync() makes sense, and such fsync() is explicitly allowed under POSIX. It is also legal for the file to be recoverable in the old directory, the new directory, both, or any other location. POSIX makes no guarantees for system crashes or file system recoveries.
The real question should be:
How to do a durable rename on systems which support that through the POSIX API?
You need to do a fsync() on both, source and destination directory, because the minimum those fsync()s are supposed to do is persist how source or destination directory should look like.
Does a fsync(destdirfd) also implicitly fsync the source directory?
POSIX in general: no, nothing implies that
ext3/4: I’m not sure if both changes to source and destination dir end up in the same transaction in the journal. If they do, they get both commited together.
Or might I end up with the file showing up in both directories after a power cycle (“crash”), i.e. it's impossible to guarantee a durably atomic move operation?
POSIX in general: no guarantees, but you’re supposed to fsync() both directories, which might not be atomic-durable
ext3/4: how much fsync() you minimally need depends on the mount options. E.g. if mounted with “dirsync” you don’t need any of those two fsync()s. At most you need both fsync()s, but I’m almost sure one is enough (atomic-durable then).
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
POSIX: no
ext3/4: I really believe both end up in the same transaction, so it doesn’t matter which of them you fsync()
older kernels ext3: (if they aren’t in the same transaction) some not-so-optimal implementation did way too much syncing on fsync(), I bet it did commit every transaction which came before. And yes, a normal implementation would first link it to the destination and then remove it from the source. So the fsync(srcdirfd) would trigger the fsync() of the destination as well.
ext4/latest ext3: if they aren’t in the same transaction, you might be able to completely sync them independently (so do both)
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
For a real crash, no. By the way, a real crash goes beyond the viewpoint of the kernel. The hardware might reorder writes (and fail to write everything), corrupting the filesystem. Ext4 is better prepared against this, because it enables write barries (mount options) by default (ext3 does not) and can detect corruption with journal checksums (also a mount option).
And for learning: find out if both changes are somehow linked in the journal! :-P

POSIX defines that the rename function must be atomic.
So if you rename(A, B), under no circumstances should you ever see a state with the file in both directories or neither directory. There will always be exactly one, no matter what you do with fsync() or whether the system crashes.
But that doesn't solve the problem of making sure the rename() operation is durable. POSIX answers this question:
If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion.
So if you fsync() a directory, pending rename operations must be transferred to disk by the time this returns. fsync() of either directory should be sufficient because atomicity of the rename() operation would require that both directories' changes be synced atomically.
Finally, in contrast to the claim in the blog post mentioned in another answer, the rationale for this explains the following:
The fsync() function is intended to force a physical write of data from the buffer cache, and to assure that after a system crash or other failure that all data up to the time of the fsync() call is recorded on the disk. Since the concepts of "buffer cache", "system crash", "physical write", and "non-volatile storage" are not defined here, the wording has to be more abstract.
A system that claimed to be POSIX compliant and that considered it correct behavior (i.e. not a bug or hardware failure) to complete an fsync() and not persist those changes across a system crash would have to be deliberately misrepresenting itself with respect to the spec.
(updated with additional info re: Linux-specific vs. portable behavior)

The answer to your question is going to depend a lot on the specific OS being used, the type of filesystem being used and whether the source and dest are on the same device or not.
I'd start by reading the rename(2) man page on the platform you're using.

It sounds to me like you're trying to do the job of the filesystem. If you move a file the kernel and file-system are responsible for atomic operation and fault-recovery, not your code.
Anyway, this article seems to address your questions regarding fsync:
http://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/

Related

Encrypt file and hide the code from "info body"

Is there any way protect tcl code by "info body" after the file encrypt and execute in tool?
Obfuscate the code with the compiler from the Tcl Dev Kit; when the output of that is loaded in, it creates procedures whose contents cannot be inspected (by virtue of setting a special flag that turns off inspection). It also turns off a number of other related tools, such as the bytecode disassembler. (Curiously, this actually comes with a small performance penalty relative to standard Tcl; the special bytecode loader library is actually slower than Tcl's built-in bytecode compiler.)
That said, if you are genuinely worried about someone looking at your code, the only way to go is to not give users the code at all, but rather to host it as a service that they then just use remotely (with the clients not being subject to the same degree of protections).
And if you're not that worried, merely packing the code into a starkit (or other single-file distribution mechanism; there's a few options) is enough to stop all but the most determined of users, even with no further steps to conceal things.

How does QFileSystemWatcher determines if a file is modified?

I am trying to watch a log file using QFileSystemWatcher but fileChanged signal is not consistently emitted every time the log file is modified. Any idea how QFileSystemWatcher determines if a file is modified (on windows)?
QFileSystemWatcher's performance is entirely dependent on what the underlying platform provides. There are in general absolutely no guarantees that if one process is writing to a file, some other process will see these changes immediately. The behavior of QFileSystemWatcher may be informing you of that fact. The log writing process might elect to flush the file. Depending on the platform, the semantics of a flush might be such that when flush() returns, other processes are guaranteed to be able to see the changes made to the file prior to flush(). If so, then you'd expect QFileSystemWatcher to notify you of the changes.
As the platforms get new features, QFileSystemWatcher may lag in its implementation of new filesystem notification APIs. You'd need to read its source to figure out if it supports everything your platform of choice provides in this respect.
You need to qualify QFileSystemWatcher's behavior on each platform you intend to support. You may find out that explicitly polling a file information periodically may work better in some cases - again, the choice between polling and QFileSystemWatcher should be made on a platform-by-platform basis, as polling might incur unnecessary overheads if the watcher works OK on a given platform.

When is it safe to copy a SQLite data file that is currently open?

I have an application which uses the QSQLITE driver with a QSqlDatabse on a file on the local filesystem. I want to write a backup function which will save a snapshot of the database.
Simply copying the file seems like an obvious, easy way to do it but I'm not sure when it is safe to do so.
The application modifies the database at well-defined points. Each time, a new QSqlQuery object is created, used, and immediately destroyed. Explicitly locking/flushing is an acceptable solution, but the Qt API doesn't seem to expose this.
I can't find any documentation on when Qt commits the database to disk. I imagine the QSqlDatabase destructor would do it, but even then I also don't know if (on Windows or Linux) copying the file is guaranteed to result in the most-recent changes being copied (as opposed to, say, only those changes which have been finalised in the filesystem journal). Can someone confirm or deny this? Does it make any difference if the writing filehandle is closed before the copy is executed?
Maybe the only safe way is to do an online copy but I am already using the Qt API and don't know how this would interact.
Any advice would be appreciated.
It's trivial to copy a SQLite database but it's less trivial to do this in a way that won't corrupt it. This will give you a nice clean backup that's sure to be in a proper state, since writing to the database half-way through your copying process is impossible.
QSqlQuery qry(db);
qry.prepare( "BEGIN IMMEDIATE;");
qry.exec();
QFile::copy(databaseName, destination);
qry.prepare( "ROLLBACK;");
qry.exec();
After a BEGIN IMMEDIATE, no other database connection will be able to write to the database or do a BEGIN IMMEDIATE or BEGIN EXCLUSIVE.
This has very little to do with Qt. It is database related. This procedure will work with any ACID compliant database, and SQLite is one of these.
From http://www.sqlite.org/transactional.html
SQLite is Transactional
A transactional database is one in which all changes and queries
appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite
implements serializable transactions that are atomic, consistent,
isolated, and durable, even if the transaction is interrupted by a
program crash, an operating system crash, or a power failure to the
computer.
This does not mean you can copy the file and it will be consistent. You should probably use block level snapshots for this before you copy. If you are using Linux, read this,
http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html
The procedure would then be,
snapshot
copy DB from snapshot to backup device
remove snapshot volume
Snapshots are global "freeze" of file system, which is consistent because of ACID. File copy is linear operation, which cannot be guaranteed to be consistent without halting all DB operations for duration of copy. This means straight copy is not safe for online databases (in general).

how same key is used across processes to communicate each other using shared memory

I learned that it is necessary to use same key in both two processes to communicate using shared memory. In sample code I've seen , the key is hard coded in both programs(sender,receiver). My doubt is in real time how two unexpected processes use the same key.
I've read about ftok() function, but it asks for file path as argument. But how it is possible in real time as below scenario
suppose when user give print to file command from firefox some other program like ghostscript is going to make a ps/pdf file(assuming it uses shared memory). Here how firefox and ghostscript will use shared memory
Two processes unknown to each other would need to use a defined (and shared) protocol in order to use shared memory together. And that protocol would need to include the information about how to get to the shared memory (e.g., an integer value for a shmget call). Basically, it would need to define a "hard coded" identifier or some method for discovering it.
Without some kind of protocol defining this information (including what is in the memory), it would not be possible for one process to even deduce what was in a memory location that was set up by another process.

Encfs on top of ZFS

I want to switch to ZFS, but still want to encrypt my data. Native Encryption for ZFS was added in "ZFS Pool Version Number 30", but I'm using ZFS on FreeBSD with Version 28. My question is how would encfs (fuse encryption) affect ZFS specific features like data Integrity and deduplication?
encfs has a limit for a file/directory name length which is shorter than that of zfs. I couldn't remember but probably less than 255 chars per object. If you happen to have any files/directories with names exceeding that limit you'll get an i/o error during copying to a mounted encfs resource and the offending file/directory will not be created, that's all.
I do not use deduplication (too little RAM unfortunately) but since encfs uses ECB mode for the encryption of file names, then naturally similar file names are seen as such on the encrypted side (file attributes are unchanged too), which is fortunate for tools like rsync. Unfortunately for deduplication, encfs uses data signing (hmac) for initialization vectors which renders copies of the same content completely different. It is probably feasible to find a way to block this behaviour, but then data integrity depends on it, so I wouldn't recommend that.
If you need device-level encryption take look at cryptsetup. For this you would need to migrate your pool from /dev/disk/by-id/ata-* to /dev/disk/by-id/dm-name-* devices. That would not prohibit using deduplication, only incur slight performance penalty. And you would have to opearate on decrypted data for backup, which may not be desirable.
Currently I am using both methods (that is cryptsetup and encfs). It may seem a little redundant, but I find it necessary to avoid both decrypting data for backup as well as storing encryption parameters in plain text in the .encfs.xml file which bothers my paranoid sense of security ;)

Resources