How can a filter implementing IStream know when it won't receive any further IStream commands - directshow

I've written a filter that implements IStream (the COM interface not the C++ standard library class). That's all working well but my problem is that I'm not sure when (if ever) I can be sure than no further IStream commands will be sent so the stream behind IStream can be closed.
The simplest place to close the stream would be in Stop() on my filter but this is too early.
According to MSDN docs, the filter graph manager will call Stop() on filters in the graph in upstream order so my filter will get Stopped before an upstream mux filter which typically will use IStream to do any end of streaming fixup (e.g. the GDCL mp4 mux filter). I've verified in the debugger that Stop() on my filter is called and exits before Stop() is called on upstream filters (which could potentially result in further IStream calls to my filter).
The system Microsoft file writer filter seems able to work this out. During streaming the sink file written by the file writer can't be renamed or moved as you'd expect but the file can moved once streaming has stopped. How is the Microsoft file writer detecting that it's safe to close the file? Is it getting some sort of extra callback once all filters in the graph have stopped or listening for the end of graph state change to stopped with a plugin distributor? Does it close the file when the IStream interface is released and the reference count falls to zero?

It's been a while since I asked this question and I still haven't worked out how the MS file writer works out when to close its output file.
Here are some possible solutions, some better than others:
Don't close the output stream until the filter is destroyed or removed from the graph. Clearly the MS file writer does not do this. The internal analyzer file writer filter in GraphStudioNext uses this approach cpp file, h file
Set a timer in Stop() on the downstream filter which periodically checks whether the upstream filter is still active. As soon as the upstream filter is no longer active, Stop() has finished and there should not be any further IStream calls so the output stream can be closed. This should work but isn't guaranteed to close the output stream before the Stop() call on the graph has returned. UPDATE - It's probably not safe to assume that because a filter is stopped it will not generate further IStream calls. According to File Writer Filter Documentation, "... It supports IStream to allow reading and writing the file header AFTER the graph is stopped." [my emphasis]
Close the stream when the last reference to the IStream interface is released. Likely to go wrong if there are any reference counting bugs on the IStream interface. Upstream filter may hang onto IStream reference until pin is disconnected and/or filter destroyed.
Insert an extra unconnected dummy filter in the graph whose only purpose is to wait in its own Stop() function until the upstream filter is closed to notify the downstream filter so it can close its output stream. Seems like a dirty hack with possible side effects. Relies on the Stop() calls being preemptive among different renderers in the graph.
In the downstream filter, respond to some other callback that happens after Stop() is called on the upstream filter but before Stop() on the graph returns. Would be ideal but I haven't found any mechanism to do this.
UPDATE 2 : another possible idea. On a timer callback QueryInterface on the containing graph and close the file output stream with GetState() on the graph returns State_Stopped as this doesn't seem to happen until all filters have returned from Stop() and all streaming should have finished. UPDATE 3 : This appears to be the best solution using CreateTimerQueueTimer called with the flag WT_EXECUTELONGFUNCTION and TryEnterCriticalSection on a dedicated CRITICAL_SECTION in the callback to prevent re-entrancy and thread bloat. Though it doesn't guarantee the output stream closes before Stop on the graph returns it should close the file soon afterwards (potentially very soon if a fine grained timer is used). Requires some care to avoid deadlocks and race conditions; e.g. the timer callback should not cache an AddRef'd filter graph interface, shouldn't hold the filter lock when calling IMediaControl::GetState() on the filter graph, ensuring elsewhere in the code that the timer callback is definitely terminated before streaming restarts, the filter is paused, disconnected, removed from the graph etc. It could even be that the MS File Writer uses this technique too and the output file closes so soon after Stop() that it's not easily detectable.

Related

Efficiently handling HTTP uploads of many large files in Go

There is probably an answer within reach, but most of the search results are "handling large file uploads" where the user does not know what they're doing or "handing many uploads" where the answer consistently is just an explanation of how to work with multipart requests and/or Flash uploader widgets.
I haven't had time to sift through Go's HTTP implementation, yet, but when does the application have the first chance to see the incoming body? Not until it has been completely received?
If I were to [poorly] decide to use HTTP to transfer a large amount of data and posted a single request with several 10-gigabyte parts, would I have to wait for the whole thing to be received before processing it or does the io.Reader with the body iteratively process it?
This is only tangentially related, but I also haven't been able to get a clear answer about whether I can choose to forcibly close the connection in the middle; whether or not, even if I close it, it will just keep receiving it on the port.
Thanks so much.
An application's handler is called after the headers are parsed and before the request body is read. The handler can read the request body as soon as the handler is called. The server does not buffer the entire request body.
An application can read file uploads without buffering the entire request by getting a multipart reader and iterating through the parts.
An application can replace the request body with a MaxBytesReader to force close the connection after a specified limit is breached.
The above comments are about the net/http server included in the standard library. The comments may not apply to other servers.
While I haven't done this with GB size files, my strategy with file processing (mostly stuff I read from and write to S3) is to use https://golang.org/pkg/os/exec/ with a cmd line utility that handles chunking a way you like. Then read and process by tailing the file as explained here: Reading log files as they're updated in Go
In my situations, network utilities can download the data far faster than my code can process it, so it makes sense to send it to disk and pick it up as fast as I can, that way I'm not holding some connection open while I process.

Erlang: difference between using gen_server:cast/2 and standard message passing

I was working though a problem and noticed some code where a previous programmer was passing messages using the standard convention of PID ! Message. I have been using gen_server:cast/2. I was wondering if somebody could explain to me the critical differences and considerations when choosing between the two?
There are a few minor differences:
Obviously, the gen_server handles casts in handle_cast and "normal" messages in handle_info.
A cast never fails; it always returns ok. Sending a message with ! fails with badarg if you are sending a message to an atom that is currently not registered by a process. (Sending a message to a pid never causes an error, even if the process is dead.)
If the gen_server is running on a remote node that is not currently connected to the local node, then gen_server:cast will spawn a background process to establish the connection and send the message, and return immediately, while ! only returns once the connection is established. (See the code for gen_server:do_send.)
As for when to choose one or the other, it's mostly a matter of taste. I'd say that if the message could be thought of as an asynchronous API function for the gen_server, then it should use cast, and have a specific API function in the gen_server callback module. That is, instead of calling gen_server:cast directly, like this:
gen_server:cast(foo_proc, {some_message, 42})
make a function call:
foo_proc:some_message(42)
and implement that function like the direct cast above. That encapsulates the specific protocol of the gen_server inside its own module.
In my mind, "plain" messages would be used for events, as opposed to API calls. An example would be monitor messages, {'DOWN', Ref, process, Id, Reason}, and events of a similar kind that might happen in your system.
In addition to legoscia post I would say that it is easier to trace dedicated function API than messages. Especially in prod environment.

App using QTreeView and QStandardItemModel does not catch up

I'm working on a programm (notifyfs) which takes care for caching of directory entries and watching the underlying filesystem for changes. The cache is stored in shared memory, (gui) clients can make use of the cache very easily.
Communication between server (notifyfs) and clients can go using a socket or via the shared memory self, by sharing a mutex and condition variable.
When a client wants to load a directory it does the following:
a. select a "view", which is a data struct in shared memory, which consists of a shared mutex, conditionvariable and a small queue (array), to communicate add/remove/change events with the client.
b. the client populates his/her model with what it already finds in the shared memory
c. send a message to the server with a reference to the view, and an indication to the path it wants to load its contents. This maybe a path, but if possible the parent entry.
d. server receives the message (does some checks), sets a watch on the directory, and synces the directory. When the directory has not yet been in the cache this means that every entry it detects is stored in the cache. While doing so it signals the view (the data in shared memory) an entry is added, and it stores this event in the array/queue.
e. the gui client has a special thread watching this view in shared memory constantly for changes using the pthread_cond_wait call. This thread is a special io thread, which can send three signals: entry added, entry removed and entry changed. The right parameters it reads from the array queue: a reference to the entry, and what the action is. These three signals are connected to three slots in my model, which is based upon a QStandardItemModel.
This works perfectly. It's very fast. When testing it I had a lot of debugging output. After removing these to test it without this extra slow io, it looks like the QTreeView can't catch up the changes. When loading a directory it loads two third of it, and when going to load another directory, this gets less and less.
I've connected the different signals from the special thread to the model using Qt::QueuedConnection.
Adding a row at a certain row is done using the insertRow(row, list) call, where row is of course the row, and list is a QList of items.
I've been looking to this issue for some time now, and saw that all the changes are detected by the special io thread, and that the signals are received by the model. Only the signal to the QTreeView is not received somehow. I've been thinking, do I have to set the communication between the models signal and the receiving slot of the treeview also to "Qt::QueuedConnection"? Maybe something else?
suggested in the reactions was to put the model in a special thread. This was tempting, but is not the right way to solve this. The model and the view should be in the same thread.
I solved this issue by doing as much as possible when it comes to providing the model with data by the special io thread. I moved some functions which populated the model to this special io, and used the standard calls to insert or remove a row. That worked.
Thanks to everyone giving suggestions,
Stef Bon

Handle netty asynchronous writes and closes

I'm responsible for the maintenance and evolution of an already developed SMTP proxy for about a month now.
During some research about other aspects of this application I found the link below that states that channel writes are not synchronous and subsequent closes must be done using a listener.
Netty Channel closed detection
Questions:
Is this valid for version 3.6.3 (Final) of Netty API?
Considering that this is valid, is there any problem to store the ChannelFuture returned by a channel's "write" operation for future usage?
The reason for question "2" is:
I've made an initial analysis of the application's code and there is lots of places with writes, closes and write with subsequent close (one command right after the other).
The only way, I could realize, to handle all operations, given the asynchronous nature of this aspect of Netty, involves the storing of ChannelFuture returned by all writing operations so I can use them to schedule close operations.
Basically I would create a helper with methods "write" and "close" and hold a Map<Channel, ChannelFuture>. Always that a write is called, I put a new record on this map with the channel itself and the ChannelFuture returned by the channel's "write" operation.
When a "close" helper method is invoked, I firstly try to find the channel on this map. If I can't find it, there is no pending write operation, so I can close the channel right away. Otherwise, I have a pending write, so I use the stored ChannelFuture to register a listener scheduling the channel's closing.

How to force the current message to be suspended and be retried later on from within a custom BizTalk **send** pipeline component?

Here is my scenario. BizTalk needs to transfer a file from a shared/central document library. First BizTalk receives an incoming message with a reference/path to this document in the library. Then it simply needs to read it out from this library and send it (potentially through different adapters). This is in essence, a scenario not so remote from the ClaimCheck EAI pattern.
Some ways to implement a claim check have been documented, noticeably BizTalk ESB Toolkit Claim Check, and BizTalk 2009: Dealing with Extremely Large Messages, Part I & Part II. These implementations do however take the assumption that the send pipeline can immediately read the stream that has been “checked in.”
That is not my case: the document will take some time before it is available in the shared library, and I cannot delay the initial received message. That leaves me with 2 options: either introduce some delay via an orchestration or ensure the send port will later on retry if the document is not there yet.
(A delay can only be introduced via an orchestration, there is no time-based subscriptions in BizTalk. Right?)
Since this a message-only flow I’d figure I could skip the orchestration. I have seen ways on how to have "Custom Retry Logic in Message Only Solution Using Pipeline" but what I need is not only a way to control the retry behavior (as performed by the adapter) but also to enforce it right from within the pipeline…
Every attempt I made so far just ended up with a suspended message that won’t be automatically retried even though the send adapter had retry configured… If this is indeed possible, then where/what should I do?
Oh right… and there is queuing… but unfortunately neither on premises nor in the cloud ;)
OK I may be pushing the limits… but just out of curiosity…
Many thanks for your help and suggestions!
I'm puzzled as to how this could be done without an Orch. The only way I can think of would be along the lines of:
The receive port for the initial messages just 'eats' the messages,
e.g. subscribing these messages to a dummy Send port with the Null Adapter,
ignoring them totally.
You monitor the Shared document library with a receive port, looking for any ? any new? document there.
Any located documents are subscribed by a send port and sent downstream.
An orchestration based approach would be along the lines of:
Orch is triggered by a receive of the Initial notification of an 'upcoming' new file to the library. If your initial notification is request response (e.g. exposed web service, you can immediately and synchronously issue the response)
Another receive port is used to do the monitoring of availability and retrieval of the file from shared library, correlating to the original notification message (e.g. by filename, or other key)
A mechanism to handle the retry if the document isn't available, and potentially an eventual timeout, e.g. if the document never makes it to the shared library.
And on success, a send port to then send the document downstream
Placing the delay shape in the Orch will offer more scalability than e.g. using Thread.Sleep() or similar in custom adapter or pipeline code, since BTS just calculates ad stamps the 'awaken' timestamp on the SQL record and can then dehydrate the orch, freeing up the thread.
The 'is the file there yet?' check can be done with a retry loop, delaying after each failed check, with a parallel branch with a timeout e.g. after an hour or so.
The polling interval can be controlled in the receive location, so I do not understand what you mean by there is no time based subscriptions in Biztalk. You also have a schedule window.
One way to introduce delay is to send that initial message to an internal webservice, which will simply post back the message to Biztalk after a specified time interval.
There are also loopback adapters, which simply post the message back into the messagebox. This can be ammended to add a delay.

Resources