IMediaSample(DirectShow) to IDirect3DSurface9/IMFSample(MediaFoundation) - directshow

I am working on a custom video player. I am using a mix of DirectShow/Media Foundation in my architecture. Basically, I'm using DS to grab VOB frames(unsupported by MF). I am able to get a sample from DirectShow but am stuck on passing it to the renderer. In MF, I get a Direct3DSurface9 (from IMFSample), and present it on the backbuffer using the IDirect3D9Device.
Using DirectShow, I'm getting IMediaSample as my data buffer object. I don't know how to convert and pass this as IMFSample. I found others getting bitmap info from the sample and use GDI+ to render. But my video data may not always have RGB data. I wish to get a IDirect3DSurface9 or maybe IMFSample from IMediaSample and pass it for rendering, where I will not have to bother about color space conversion.
I'm new to this. Please correct me if I'm going wrong.

IMediaSample you have from upstream decoder in DirectShow is nothing but a wrapper over memory backed buffer. There is no and cannot be any D3D surface behind it (unless you take care of it on your own and provide a custom allocator, in which case you would not have a question in first place). Hence, you are to memory-copy data from this buffer into MF sample buffer.
There you come to the question that you want buffer formats (media types) match, so that you could copy without conversion. One of the ways - and there might be perhaps a few - is to first establish MF pipeline and find out what exactly pixel type you are offered with buffers on the video hardware. Then make sure you have this pixel format and media type in DirectShow pipeline, by using respective grabber initialization or color space conversion filters, or via color space conversion DMO/MFT.


Why would VkImageView format differ from the underlying VkImage format?

VkImageCreateInfo has the following member:
VkFormat format;
And VkImageViewCreateInfo has the same member.
What I don't understand why you would ever have a different format in the VkImageView from the VkImage needed to create it.
I understand some formats are compatible with one another, but I don't know why you would use one of the alternate formats
The canonical use case and primary original motivation (in D3D10, where this idea originated) is using a single image as either R8G8B8A8_UNORM or R8G8B8A8_SRGB -- either because it holds different content at different times, or because sometimes you want to operate in sRGB-space without linearization.
More generally, it's useful sometimes to have different "types" of content in an image object at different times -- this gives engines a limited form of memory aliasing, and was introduced to graphics APIs several years before full-featured memory aliasing was a thing.
Like a lot of Vulkan, the API is designed to expose what the hardware can do. Memory layout (image) and the interpretation of that memory as data (image view) are different concepts in the hardware, and so the API exposes that. The API exposes it simply because that's how the hardware works and Vulkan is designed to be a thin abstraction; just because the API can do it doesn't mean you need to use it ;)
As you say, in most cases it's not really that useful ...
I think there are some cases where it could be more efficient, for example getting a compute shader to generate integer data for some types of image processing can be more energy efficient than either float computation or manually normalizing integer data to create unorm data. Using aliasing you the compute shader can directly write e.g. uint8 integers and a fragment shader can read the same data as unorm8 data

Is it OK for a DirectShow filter to seek the filters upstream from itself?

Normally seek commands are executed on a filter graph, get called on the renderers in the graph and calls are passed upstream by filters until a filter that can handle the seek does the actual seek operation.
Could an individual filter seek the upstream filters connected to one or more of its input pins in the same way without it affecting the downstream portion of the graph in unexpected ways? I wouldn't expect that there wouldn't be any graph state changes caused by calling IMediaSeeking.SetPositions upstream.
I'm assuming that all upstream filters are connected to the rest of the graph via this filter only.
Obviously the filter would need to be prepared to handle the resulting BeginFlush, EndFlush and NewSegment calls coming from upstream appropriately and distinguish samples that arrived before and after the seek operation. It would also need to set new sample times on its output samples so that the output samples had consistent sample presentation times. Any other issues?
It is perfectly feasible to do what you require. I used this approach to build video and audio mixer filters for a video editor. A full description of the code is available from the BBC White Papers 129 and 138 available from
A rather ancient version of the code can be found on if you search for AAFEditPack. The code is written in Delphi using DSPack to get access to the DirectShow headers. I did this because it makes it easier to handle com object lifetimes - by implementing smart pointers by default. It should be fairly straightforward to transfer the ideas to a C++ implementation if that is what you use.
The filters keep lists of the sub-graphs (a section of a graph but running in the same FilterGraph as the mixers). The filters implement a custom version of TBCPosPassThru which knows about the output pins of the sub-graph for each media clip. It handles passing on the seek commands to get each clip ready for replay when its point in the timeline is reached. The mixers handle the BeginFlush, EndFlush, NewSegment and EndOfStream calls for each sub-graph so they are kept happy. The editor uses only one FilterGraph that houses both video and audio graphs. Seeking commands are make by the graph on both the video and audio renderers and these commands are passed upstream to the mixers which implement them.
Sub-graphs that are not currently active are blocked by the mixer holding references to the samples they have delivered. This does not cause any problems for the FilterGraph because, as Roman R says, downstream filters only care about getting a consecutive stream of sample and do not know about what happens upstream.
Some key points you need to make sure of to avoid wasted debugging time are:
Your decoder filters need to be able to queue to the exact media frame or audio time. Not as easy to do as you might expect, especially with compressed formats such as mpeg2, which was designed for transmission and has no frame index in the files. If you do not do this, the filter may wait indefinitely to get a NewSegment call with the correct media times.
Your sub graphs need to present a NewSegment time equal to the value you asked for in your seek command before delivering samples. Some decoders may seek to the nearest key frame, which is a bit unhelpful and some are a bit arbitrary about the timings of their NewSegment and the following samples.
The start and stop times of each clip need to be within the duration of the file. Its probably not a good idea to police this in the DirectShow filter because you would probably want to construct a timeline without needing to run the filter first. I did this in the component that manages the FilterGraph.
If you want to add sections from the same source file consecutively in the timeline, and have effects that span the transition, you need to have two instances of the sub-graph for that file and if you have more than one transition for the same source file, your list needs to alternate the graphs for successive clips. This is because each sub graph should only play monotonically: calling lots of SetPosition calls would waste cpu cycles and would not work well with compressed files.
The filter's output pins define the entire seeking behaviour of the graph. The output sample time stamps (IMediaSample.SetTime) are implemented by the filter so you need to get them correct without any missing time stamps. and you can also set the MediaTime (IMediaSample.SetMediaTime) values if you like, although you have to be careful to get them correct or the graph may drop samples or stall.
Good luck with your development. If you need any more information please contact me through StackOverflow or

my YUY2 output does not work with Video Renderer filter

I have a basic avstream driver (based on the avshws sample).
When testing the YUY2 output I get different results based on which renderer I use:
Video Renderer: blank image
VMR-7: scrambled image (due to the renderer using a buffer with a larger stride)
VMR-9: perfect render
I dont know why the basic video renderer (used by amcap) wont work. I have examined the graph of a webcam outputting the same format and I cannot see any difference apart from the renderer output.
I'm assuming you're writing your own filter based on avshws. I'm not familiar with this particular sample but generally you need to ensure two things:
Make sure your filter checks any media types proposed are acceptable. In the DirectShow baseclasses the video renderer calls the output pin IPin::QueryAccept which calls whichever base class member you're using e.g. CBasePin.CheckMediaType or CTransformFilter.CheckTransform
Make sure you call IMediaSample::GetMediaType on every output sample and respond appropriately e.g. calling CTransformFilter.SetMediaType and changing the format/stride of your output. It's too late to negotiate at this point - you've accepted the change already and if you really can't continue you have to abort streaming, return an error HRESULT and notify EC_ERRORABORT or EC_ERRORABORTEX. Unless it's buggy the downstream filter should have called your output pin's QueryAccept and received S_OK before it sends a sample with a media type change attached (I've seen occasional filters that add duplicate media types to the first sample without asking).
See Handling Format Changes from the Video Renderer
I have figured out the problem. I was missing one line to update the remaining bytes in the stream pointer structure:
Leading->OffsetOut.Remaining = 0;
This caused certain filters to drop my samples (AVI/MJPEG Decompressor, Dump) which meant that certain graph configurations would simply not render anything.

Different approaches on getting captured video frames in DirectShow

I was using a callback mechanism to grab the webcam frames in my media application. It worked, but was slow due to certain additional buffer functions that were performed within the callback itself.
Now I am trying the other way to get frames. That is, call a method and grab the frame (instead of callback). I used a sample in CodeProject which makes use of IVMRWindowlessControl9::GetCurrentImage.
I encountered the following issues.
In a Microsoft webcam, the Preview didn't render (only black screen) on Windows 7. But the same camera rendered Preview on XP.
Here my doubt is, will the VMR specific functionalities be dependent on camera drivers on different platforms? Otherwise, how could this difference happen?
Wherever the sample application worked, I observed that the biBitCount member of the resulting BITMAPINFOHEADER structure is 32.
Is this a value set by application or a driver setting for VMR operations? How is this configured?
Finally, which is the best method to grab the webcam frames? A callback approach? Or a Direct approach?
Thanks in advance,
IVMRWindowlessControl9::GetCurrentImage is intended for occasional snapshots, not for regular image grabbing.
Quote from MSDN:
This method can be called at any time, no matter what state the filter
is in, whether running, stopped or paused. However, frequent calls to
this method will degrade video playback performance.
This methods reads back from video memory which is slow in first place. This methods does conversion (that is, slow again) to RGB color space because this format is most suitable for for non-streaming apps and gives less compatibility issues.
All in all, you can use it for periodic image grabbing, however this is not what you are supposed to do. To capture at streaming rate you need you use a filter in the pipeline, or Sample Grabber with callback.

Handling Dynamic Format Changes in DirectShow

I just have simple graph:
SourceFilter ---> CustomTransformFilter --> VideoRendererFilter
In my CustomTranformFilter i change video properties dynamically:i.e i rescale video into new dimensions.
Input Video[1024,720]-->|CustomTransformFilter|--->Output Video[640,480]
But my renderer see the video as still in its original size ( [1024,720] not rescaled [640,480] )
And i get corrupted images at video renderer:Since renderer try to draw new image based on old dimensions...
How can i fix it?
Best Wishes
As i understand from Davies answer :
Given: The graph is active, but the filters in question do not support dynamic
pin reconnections
Possible mechanisms for changing the format: (MSDN DirectShow Doc)
a. QueryAccept (Downstream)
b. QueryAccept (Upstream)
c. ReceiveConnection
Davies suggest ReceiveConnection.
ReceiveConnection:is used when an output pin proposes a format change to
its downstream peer, and the new format requires a larger buffer. ( MSDN DirectShow Doc).
The gmfbridge example is "too complex" for me to figure out how to use "ReceiveConnection".
I am novice at DirectShow.
Any one has simple code example that use ReceiveConnection mechanism to respond dynamic format change?
The normal way to do a dynamic type change in DirectShow is to attach a Media Type to the sample that you deliver. This won't work with the video renderer, since it is allocating the samples. You need to request a change in type before you get the sample from the allocator.
You do this using ReceiveConnection. You must make sure that there are no buffers outstanding on that allocator, and then you can call IPin::ReceiveConnection (without disconnecting first). There is an example of this in the gmfbridge code at, in BridgeSourceOutput::SwitchTo().
