Media Foundation: multi-input MFT and topology connection order - ms-media-foundation

The problem
I'm writing a custom MFT with two inputs and one output (it merges two video streams into one).
My MFT requires media types to be set on its inputs before it can provide an output type.
I've set up my topology by connecting two source nodes (they take different streams from an aggregate media source) to my transform node, and then an EVR to my single output.
When I start the media session, I see that the topology invokes SetInputType on the first input, and it succeeds.
But then it immediately tries to get an output type: first by calling GetOutputCurrentType on my MFT, which returns MF_E_TRANSFORM_TYPE_NOT_SET as it is unable to provide one, and then by calling GetOutputAvailableType, which I made return MF_E_TRANSFORM_TYPE_NOT_SET as per the documentation (says You must set the input types before setting the output types; I also tried to output some partial media types but it's the same).
And here's the problem: after that, the topology seems to give up on my MFT: it never calls SetInputType on the second input.
The question
How can I force the topology to set all input types before dealing with the output?

Read this : Multiple input
Under Windows 7, it doesn't work...
You can provide a custom media session like i do in MFNode project.


How can I capture a webcam and append to a file?

My application needs to record video interviews with the ability to pause and resume, and have these multiple segments captured to the file.
I'm using to capture camera stream to a preview window AND an avi file, and it works, except that whenever I start recording a new segment, I overwrite the avi file instead of appending. The relevant code is:
captureGraphBuilder.SetOutputFileName( ref mediaSubType, Filename, out muxFilter, out fileWriterFilter )
How can I create a capture graph so that the capture is appended to a file instead of overwriting it?
Most media files/formats, and AVI specifically, do not suppose or allow appending. When you record, you populate the media file AND then you finalize it on completion. You typically don't have the option to "unfinalize" and resume recording.
The symptom of overwriting you are seeing is a side effect of writing filter implementation. There is no append vs overwrite mode you can easily switch to.
Your options basically are the following (in the order of less-to-more development):
Record new media file each time, then run an external tool (like FFmpeg) which is capable to concatenate media and produce new continuous file out of segments.
Implement a DirectShow filter inserted into the pipeline (esp. in two instances, for video and for audio) which is capable to implement pause/resume behavior. Once you pause the filter would discard new media data, and once you resume it starts again passing them respectively modifying time stamps to mimic continuous stream. The capture graph will be in running state through all segments and pauses.
Implement a custom multiplexer and/or writer filter which is capable to read existing file and append new media so that the file itself is once again finalized on completion with old and new segments, continuous.
Item #3 above is technically possible to implement, but I don't think such implementation at all exists: workarounds are always easier to do. #2 is a sort of supposed way to address the mentioned task, but since you are doing C# development with DirectShow.NET, I anticipate that it is going to be a bit difficult to address the challenge from this angle. #1 is relatively easy to do and the cost involved is an external tool to use.

Is it OK for a DirectShow filter to seek the filters upstream from itself?

Normally seek commands are executed on a filter graph, get called on the renderers in the graph and calls are passed upstream by filters until a filter that can handle the seek does the actual seek operation.
Could an individual filter seek the upstream filters connected to one or more of its input pins in the same way without it affecting the downstream portion of the graph in unexpected ways? I wouldn't expect that there wouldn't be any graph state changes caused by calling IMediaSeeking.SetPositions upstream.
I'm assuming that all upstream filters are connected to the rest of the graph via this filter only.
Obviously the filter would need to be prepared to handle the resulting BeginFlush, EndFlush and NewSegment calls coming from upstream appropriately and distinguish samples that arrived before and after the seek operation. It would also need to set new sample times on its output samples so that the output samples had consistent sample presentation times. Any other issues?
It is perfectly feasible to do what you require. I used this approach to build video and audio mixer filters for a video editor. A full description of the code is available from the BBC White Papers 129 and 138 available from
A rather ancient version of the code can be found on if you search for AAFEditPack. The code is written in Delphi using DSPack to get access to the DirectShow headers. I did this because it makes it easier to handle com object lifetimes - by implementing smart pointers by default. It should be fairly straightforward to transfer the ideas to a C++ implementation if that is what you use.
The filters keep lists of the sub-graphs (a section of a graph but running in the same FilterGraph as the mixers). The filters implement a custom version of TBCPosPassThru which knows about the output pins of the sub-graph for each media clip. It handles passing on the seek commands to get each clip ready for replay when its point in the timeline is reached. The mixers handle the BeginFlush, EndFlush, NewSegment and EndOfStream calls for each sub-graph so they are kept happy. The editor uses only one FilterGraph that houses both video and audio graphs. Seeking commands are make by the graph on both the video and audio renderers and these commands are passed upstream to the mixers which implement them.
Sub-graphs that are not currently active are blocked by the mixer holding references to the samples they have delivered. This does not cause any problems for the FilterGraph because, as Roman R says, downstream filters only care about getting a consecutive stream of sample and do not know about what happens upstream.
Some key points you need to make sure of to avoid wasted debugging time are:
Your decoder filters need to be able to queue to the exact media frame or audio time. Not as easy to do as you might expect, especially with compressed formats such as mpeg2, which was designed for transmission and has no frame index in the files. If you do not do this, the filter may wait indefinitely to get a NewSegment call with the correct media times.
Your sub graphs need to present a NewSegment time equal to the value you asked for in your seek command before delivering samples. Some decoders may seek to the nearest key frame, which is a bit unhelpful and some are a bit arbitrary about the timings of their NewSegment and the following samples.
The start and stop times of each clip need to be within the duration of the file. Its probably not a good idea to police this in the DirectShow filter because you would probably want to construct a timeline without needing to run the filter first. I did this in the component that manages the FilterGraph.
If you want to add sections from the same source file consecutively in the timeline, and have effects that span the transition, you need to have two instances of the sub-graph for that file and if you have more than one transition for the same source file, your list needs to alternate the graphs for successive clips. This is because each sub graph should only play monotonically: calling lots of SetPosition calls would waste cpu cycles and would not work well with compressed files.
The filter's output pins define the entire seeking behaviour of the graph. The output sample time stamps (IMediaSample.SetTime) are implemented by the filter so you need to get them correct without any missing time stamps. and you can also set the MediaTime (IMediaSample.SetMediaTime) values if you like, although you have to be careful to get them correct or the graph may drop samples or stall.
Good luck with your development. If you need any more information please contact me through StackOverflow or

my YUY2 output does not work with Video Renderer filter

I have a basic avstream driver (based on the avshws sample).
When testing the YUY2 output I get different results based on which renderer I use:
Video Renderer: blank image
VMR-7: scrambled image (due to the renderer using a buffer with a larger stride)
VMR-9: perfect render
I dont know why the basic video renderer (used by amcap) wont work. I have examined the graph of a webcam outputting the same format and I cannot see any difference apart from the renderer output.
I'm assuming you're writing your own filter based on avshws. I'm not familiar with this particular sample but generally you need to ensure two things:
Make sure your filter checks any media types proposed are acceptable. In the DirectShow baseclasses the video renderer calls the output pin IPin::QueryAccept which calls whichever base class member you're using e.g. CBasePin.CheckMediaType or CTransformFilter.CheckTransform
Make sure you call IMediaSample::GetMediaType on every output sample and respond appropriately e.g. calling CTransformFilter.SetMediaType and changing the format/stride of your output. It's too late to negotiate at this point - you've accepted the change already and if you really can't continue you have to abort streaming, return an error HRESULT and notify EC_ERRORABORT or EC_ERRORABORTEX. Unless it's buggy the downstream filter should have called your output pin's QueryAccept and received S_OK before it sends a sample with a media type change attached (I've seen occasional filters that add duplicate media types to the first sample without asking).
See Handling Format Changes from the Video Renderer
I have figured out the problem. I was missing one line to update the remaining bytes in the stream pointer structure:
Leading->OffsetOut.Remaining = 0;
This caused certain filters to drop my samples (AVI/MJPEG Decompressor, Dump) which meant that certain graph configurations would simply not render anything.

How to render a byte array from socket/application using DirectShow?

I have an application. I will have a situation, wherein I will receive a big array of encoded bytes. I have to decode them and render it. For decoding, I am using a custom decoder class. After the decode, how can I construct a DirectShow graph which will receive input data from the decoder? Please give some direction/samples on this.
Have a look at the PushSource sample in the DirectShow SDK. This sample shows you how to create a source filter that can be rendered. It is all about setting the output media type of your filter correctly so that the rest of the graph can be rendered. The sample also shows you how to feed media samples to the rest of the media pipeline. In your case what do you decode to? The PushSource sample outputs RGB24 IIRC.
Also, it sounds like you're decoding in the same filter as your receiving the bytes in? Typically in DirectShow you would write a source filter that is able to receive bytes from the network and outputs samples in the encoded format. You would then connect this filter to a custom decoder filter, that then outputs either RGB24 or some raw media format that is understood by DirectShow. Similarly for audio, you could output say, PCM.
I have used the same approach (CSource, CSourceStream). That is correct, the DoBufferProcessingLoop calls FillBuffer. My general approach has been to use the producer-consumer pattern. The networking-reading thread populates the queue with samples and in my overridden DoBufferProcessingLoop I check whether the queue has any data, calling FillBuffer if there is data. You can of course try other methods such as waiting on events (frame availibility). To see the approach I used you can download the source code of an example RTSP source filter at and see if that suits you. Best thing I would say is to just try stuff and learn as you go along.

change recording file programmatically in directshow

I made a console application, using directshow, that record from a live source (now a webcam, then a tv capture card), add current date and time in overlay and then save audio and video as .asf.
Now I want that the output file is going to change every 60 minutes without stopping the graph. I must not loose any seconds of the live stream.
The graph is something like this one:
I took a look at the GMFBridge but I have some compiling problem with their examples.
I am wondering if there is a way to split what exist from the overlay filter and audio source, connect them to another asf writer (paused) and then switch them every 60 minutes.
The paused asf filter's file name must change (pp.asf, pp2.asf, pp4.asf ...). Something like this:
with pp1 paused. I found some people in internet that say that the asf writer deletes the current file if the graph does not go in stop mode.
Well, I have the product ( that does exactly what you described (its used for broadcast compliance recording purposes) - and I found that only way to do that is this:
create a dshow graph that will be used only to capture the audio and video
then, at the end of the graph, insert samplegrabber filters, both for audio and video
then, use IWMWritter to create and save wmv file, using samples fetched from samplegrabber filters
when time comes, close one IWMWritter and create another one.
That way, you won't lose single frame when switching the output files.
Of course, there is also question of queue-ing and storing the samples (when switching the writters) and properly re-aligning the audio/video timestamps, but from my research, that's the only 'normal' way to do it, and I used in practice.
The solution is in writing a custom DShow filter with two input pins in your case. One for audio stream and the other for video stream. Inside that filter (doesn't have to be inside from the architecture point of view, because you can also use callbacks for example and do the job somewhere else) you should create asf files. While switching files, A/V data would be stored in cache (e.g. big enough circular buffer). You can also watch and modify A/V sync in that filter. For writing ASF files I would recommend Windows Media Format SDK.You can also add output pins if you like to pass A/V data further if necessary for preview, parallel streaming etc...
GMFBridge is a viable, but complicated solution, a more direct approach I have implemented in the past is querying your ASF Writer for the IWMWriterAdvanced2 interface and setting a custom sink. Within that interface you have methods to remove and add sinks to your ASF writer. The sink automatically connected will write to the file that you speficifed. One way to write whereever you want to is
1.) remove all default sinks:
2.) register a custom sink:
The custom sink can be a class that implements IWMWriterSink, which requires implementing callback methods that are called i.e. when the ASF header is written (OnHeader(/* [in] */ INSSBuffer *pHeader);) and when a data packet is written (OnDataUnit(/* [in] */ INSSBuffer *pDataUnit);) - in your implementation you can then write them wherever you want, for example offer additional methods on this class where you can specify the file name you want to write to.
Note that this solution does not quite get you were you want to if you need to write out the header information in each of the 60 minute files - after the initial header you will only get ASF packet data. A workaround for that could be to re-write the intial header before any packet data of each file, however this will produce an unindexed (non-seekable) ASF file.
