How does QFileSystemWatcher determines if a file is modified? - qt

I am trying to watch a log file using QFileSystemWatcher but fileChanged signal is not consistently emitted every time the log file is modified. Any idea how QFileSystemWatcher determines if a file is modified (on windows)?

QFileSystemWatcher's performance is entirely dependent on what the underlying platform provides. There are in general absolutely no guarantees that if one process is writing to a file, some other process will see these changes immediately. The behavior of QFileSystemWatcher may be informing you of that fact. The log writing process might elect to flush the file. Depending on the platform, the semantics of a flush might be such that when flush() returns, other processes are guaranteed to be able to see the changes made to the file prior to flush(). If so, then you'd expect QFileSystemWatcher to notify you of the changes.
As the platforms get new features, QFileSystemWatcher may lag in its implementation of new filesystem notification APIs. You'd need to read its source to figure out if it supports everything your platform of choice provides in this respect.
You need to qualify QFileSystemWatcher's behavior on each platform you intend to support. You may find out that explicitly polling a file information periodically may work better in some cases - again, the choice between polling and QFileSystemWatcher should be made on a platform-by-platform basis, as polling might incur unnecessary overheads if the watcher works OK on a given platform.

Related

How to avoid the forge model derivative queue

I want to use the forge viewer as a preview tool in my web app for generated data.
The problem I have is that the model derivative API is sometimes slow sometimes fast.
I read that this happens because the files are placed in a queue and being processed subsequentially.
In my opinion, this can be solved by:
Having the extraction.update webhook also tell me where I am in the queue. So I can inform my users with better progress information. Or when the queue is too long I can not stop the process.
Being able to have a private queue. I have no problem paying more credits if necessary.
Being able to generate svf2 files on my own server.
But I don't know if any of these options are possible. Or if there is another workaround.
Yes, that could be useful. I logged that request in our system: DERI-7940
Might be considered later on, but no plans currently
I'm not aware of any plans for that
We're always working on making the translation service better, but unfortunately, I cannot tell when it will meet your requirements - including the implementation of the webhook feature you mentioned.
SVF2 is specifically for very large models - is that what you are working with? If not, then I'm quite certain that translating to SVF would be faster.

Qt: Catch external changes on an SQLite database

)
I'm deveoping a program using an SQLite database I acces via QSqlDatabase. I'd like to handle the (hopefully rare) case when some changes are done to the database which are not caused by the program while it's running (e. g. the user could remove write access, move or delete the file or modify it manually).
I tried to use a QFileSystemWatcher. I let it watch the database file, and in all functions wrtiting something to it, I blocked it's signals, so that only "external" changes would trigger the changed signal.
Problem is that the check of the QFileSystemWatcher and/or the actual writing to disk of QSqlDatabase::commit() seems not to happen in the exact moment I call commit(), so that actually, first the QFileSystemWatcher's signals are blocked, then I change some stuff, then I unblock them and then, it reports the file to be changed.
I then tried to set a bool variable (m_writeInProgress) to true each time a function requests a change. The "changed" slot then checks if a write action has be requested and if so, sets m_writeInProgress to false again and exits. This way, it would only handle "external" changes.
Problem is still that if the change happens in the exact moment the actual writing is going on, it's not catched.
So possibly, using a QFileSystemWatcher is the wrong way to implement this.
How could this be done in a safe way?
Thanks for all help!
Edit:
I found a way to solve a part of the problem. Starting an exclusive lock on the database file prevents other connections from changing it. It's quite simple, I just have to execute
PRAGMA locking_mode = EXCLUSIVE
BEGIN EXCLUSIVE
COMMIT
and handle the error that emerges if another instance of my program trys to access the database.
What's left is to know if the user (accidentally) deleted the file during runtime ...
First of all, there's no SQLITE support for this: SQLITE only supports monitoring changes created over a database connection within your direct control. Whatever happens in a separate process concurrently with your process, or when your process is not running, is by design completely out of your control.
The canonical solution to this problem is to encrypt the database with a key specific to your application (and perhaps user, etc.). Then, no third-party process can modify the database using SQLITE. Of course any process can corrupt your database, or get rid of it -- that's too bad. You can detect corruption trivially by using cryptographic signatures, perhaps even error correcting codes so as to be able to restore the data should a certain amount of corruption happen. You don't need notifications of someone moving or deleting the database file: you will know when you attempt to open the database and the "file not found" error is given back to you.
Of course all of the above requires a custom VFS implementation. That's very much par for the course.

Event Driven Architecture - Service Contract Design

I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?

Designing an asynchronous task library for ASP.NET

The ASP.NET runtime is meant for short work loads that can be run in parallel. I need to be able to schedule periodic events and background tasks that may or may not run for much longer periods.
Given the above I have the following problems to deal with:
The AppDomain can shutdown due to changes (Web.config, bin, App_Code, etc.)
IIS recycles the AppPool on a regular basis (daily)
IIS itself might restart, or for that matter the server might crash
I'm not convinced that running this code inside ASP.NET is not the right thing to do, becuase it would allow for a simpler programming model. But doing so would require that an external service periodically makes requests to the app so that the application is keept running and that all background tasks are programmed with utter most care. They will have to be able to pause and resume thier work, in the event of an unexpected error.
My current line of thinking goes something like this:
If all jobs are registered in the database, it should be possible to use the database as a bookkeeping mechanism. In the case of an error, the database would contain all state necessary to resume the operation at the next opportunity given.
I'd really appriecate some feedback/advice, on this matter. I've been considering running a windows service and using some RPC solution as well, but it doesn't have the same appeal to me. And I'd instead have a lot of deployment issues and sycnhronizing tasks and code cross several applications. Due to my business needs this is less than optimial.
This is a shot in the dark since I don't know what database you use, but I'd recommend you to consider dialog timers and activation. Assuming that most of the jobs have to do some data manipulation, and is likely that all have to do only data manipulation, leveraging activation and timers give an extremely reliable job scheduling solution, entirely embedded in the database (no need for an external process/service, not dependencies outside the database bounds like msdb), and is a solution that ensures scheduled jobs can survive restarts, failover events and even disaster recovery restores. Simply put, once a job is scheduled it will run even if the database is restored one week later on a different machine.
Have a look at Asynchronous procedure execution for a related example.
And if this is too radical, at least have a look at Using Tables as Queues since storing the scheduled items in the database often falls under the 'pending queue' case.
I recommend that you have a look at Quartz.Net. It is open source and it will give you some ideas.
Using the database as a state-keeping mechanism is a completely valid idea. How complex it will be depends on how far you want to take it. In many cases you will ended up pairing your database logic with a Windows service to achieve the desired result.
FWIW, it is typically not a good practice to manually use the thread pool inside an ASP.Net application, though (contrary to what you may read) it actually works quite nicely other than the huge caveat that you can't guarantee it will work.
So if you needed a background thread that examined the state of some object every 30 seconds and you didn't care if it fired every 30 seconds or 29 seconds or 2 minutes (such as in a long app pool recycle), an ASP.Net-spawned thread is a quick and very dirty solution.
Asynchronously fired callbacks (such as on the ASP.Net Cache object) can also perform a sort of "behind the scenes" role.
I have faced similar challenges and ultimately opted for a Windows service that uses a combination of building blocks for maximum flexibility. Namely, I use:
1) WCF with implementation-specific types OR
2) Types that are meant to transport and manage objects that wrap a job OR
3) Completely generic, serializable objects contained in a custom wrapper. Since they are just a binary payload, this allows any object to be passed to the service. Once in the service, the wrapper defines what should happen to the object (e.g. invoke a method, gather a result, and optionally make that result available for return).
Ultimately, the web site is responsible for querying the service about its state. This querying can be as simple as polling or can use asynchronous callbacks with WCF (though I believe this also uses some sort of polling behind the scenes).
I tell you what I have do.
I have create a class called Atzenta that have a timer (1-2 second trigger).
I have also create a table on my temporary database that keep the jobs. The table knows the jobID, other parameters, priority, job status, messages.
I can add, or delete a job on this class. When there is no action to be done the timer is stop. When I add a job, then the timer starts again. (the timer is a thread by him self that can do parallel work). I use the System.Timers and not other timers for this.
The jobs can have different priority.
Now let say that I place a job on this table using the Atzenta class. The next time that the timer is trigger is check the query on this table and find the first available job and just run it. No other jobs run until this one is end.
Every synchronize and flags are done from the table. In the table I have flags for every job that show if its |wait to run|request to run|run|pause|finish|killed|
All jobs are all ready known functions or class (eg the creation of statistics).
For stop and start, I use the global.asax and the Application_Start, Application_End to start and pause the object that keep the tasks. For example when I do a job, and I get the Application_End ether I wait to finish and then stop the app, ether I stop the action, notify the table, and start again on application_start.
So I say, Atzenta.RunTheJob(Jobs.StatisticUpdate, ProductID); and then I add this job on table, open the timer, and then on trigger this job is run and I update the statistics for the given product id.
I use a table on a database to synchronize many pools that run the same web app and in fact its work that way. With a common table the synchronize of the jobs is easy and you avoid 2 pools to run the same job at the same time.
On my back office I have a simple table view to see the status of all jobs.

How to durably rename a file in POSIX?

What's the correct way to durably rename a file in a POSIX file system? Specifically wondering about fsyncs on the directories. (If this depends on the OS/FS, I'm asking about Linux and ext3/ext4).
Note: there are other questions on StackOverflow about durable renames, but AFAICT they don't address fsync-ing the directories (which is what matters to me - I'm not even modifying file data).
I currently have (in Python):
dstdirfd = open(dstdirpath, O_DIRECTORY|O_RDONLY)
rename(srcdirpath + '/' + filename, dstdirpath + '/' + filename)
fsync(dstdirfd)
Specific questions:
Does this also implicitly fsync the source directory? Or might I end up with the file showing up in both directories after a power cycle (meaning I'd have to check the hard link count and manually perform recovery), i.e. it's impossible to guarantee a durably atomic move operation?
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
Thanks in advance.
Unfortunately Dave’s answer is wrong.
Not all POSIX systems might even have a durable storage. And if they do, it is still “allowed” to be hosed after a system crash. For those systems a no-op fsync() makes sense, and such fsync() is explicitly allowed under POSIX. It is also legal for the file to be recoverable in the old directory, the new directory, both, or any other location. POSIX makes no guarantees for system crashes or file system recoveries.
The real question should be:
How to do a durable rename on systems which support that through the POSIX API?
You need to do a fsync() on both, source and destination directory, because the minimum those fsync()s are supposed to do is persist how source or destination directory should look like.
Does a fsync(destdirfd) also implicitly fsync the source directory?
POSIX in general: no, nothing implies that
ext3/4: I’m not sure if both changes to source and destination dir end up in the same transaction in the journal. If they do, they get both commited together.
Or might I end up with the file showing up in both directories after a power cycle (“crash”), i.e. it's impossible to guarantee a durably atomic move operation?
POSIX in general: no guarantees, but you’re supposed to fsync() both directories, which might not be atomic-durable
ext3/4: how much fsync() you minimally need depends on the mount options. E.g. if mounted with “dirsync” you don’t need any of those two fsync()s. At most you need both fsync()s, but I’m almost sure one is enough (atomic-durable then).
If I fsync the source directory instead of the destination directory, will that also implicitly fsync the destination directory?
POSIX: no
ext3/4: I really believe both end up in the same transaction, so it doesn’t matter which of them you fsync()
older kernels ext3: (if they aren’t in the same transaction) some not-so-optimal implementation did way too much syncing on fsync(), I bet it did commit every transaction which came before. And yes, a normal implementation would first link it to the destination and then remove it from the source. So the fsync(srcdirfd) would trigger the fsync() of the destination as well.
ext4/latest ext3: if they aren’t in the same transaction, you might be able to completely sync them independently (so do both)
Are there any useful related testing/debugging/learning tools (fault injectors, introspection tools, mock filesystems, etc.)?
For a real crash, no. By the way, a real crash goes beyond the viewpoint of the kernel. The hardware might reorder writes (and fail to write everything), corrupting the filesystem. Ext4 is better prepared against this, because it enables write barries (mount options) by default (ext3 does not) and can detect corruption with journal checksums (also a mount option).
And for learning: find out if both changes are somehow linked in the journal! :-P
POSIX defines that the rename function must be atomic.
So if you rename(A, B), under no circumstances should you ever see a state with the file in both directories or neither directory. There will always be exactly one, no matter what you do with fsync() or whether the system crashes.
But that doesn't solve the problem of making sure the rename() operation is durable. POSIX answers this question:
If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion.
So if you fsync() a directory, pending rename operations must be transferred to disk by the time this returns. fsync() of either directory should be sufficient because atomicity of the rename() operation would require that both directories' changes be synced atomically.
Finally, in contrast to the claim in the blog post mentioned in another answer, the rationale for this explains the following:
The fsync() function is intended to force a physical write of data from the buffer cache, and to assure that after a system crash or other failure that all data up to the time of the fsync() call is recorded on the disk. Since the concepts of "buffer cache", "system crash", "physical write", and "non-volatile storage" are not defined here, the wording has to be more abstract.
A system that claimed to be POSIX compliant and that considered it correct behavior (i.e. not a bug or hardware failure) to complete an fsync() and not persist those changes across a system crash would have to be deliberately misrepresenting itself with respect to the spec.
(updated with additional info re: Linux-specific vs. portable behavior)
The answer to your question is going to depend a lot on the specific OS being used, the type of filesystem being used and whether the source and dest are on the same device or not.
I'd start by reading the rename(2) man page on the platform you're using.
It sounds to me like you're trying to do the job of the filesystem. If you move a file the kernel and file-system are responsible for atomic operation and fault-recovery, not your code.
Anyway, this article seems to address your questions regarding fsync:
http://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/

Resources