Controlling trimming behavior of DirectShow's AVIMux? - directshow

When the directshow AVIMux is provided with two streams of data (e.g. audio and video) and one stream starts a bit before the other, is there any way to control how the AVIMux behaves? Namely, if the AVIMux gets a few video frames before the audio starts, it will actually omit the video frames from the output AVI. This contrasts with what it does when audio is missing at the end, when it includes the video frames anyways.
My sources for the audio and video are live streams (commercial capture filters I can't really improve/control), and I'd like to keep the video frames even though the audio starts a bit later.
Is there a nice way to do this? I can imagine wrapping the two filters into a custom filter with its own graph and inserting silence as necessary, but it would be awesome to not have to go to all of that trouble.

The question has seemingly incorrect assumption about dropping frames in the multiplexer. The multiplexer looks at video and audio data time stamps. If "a few frames before..." means that time stamps are negative and the data is preroll data, then it's dropped and excluded from resulting file. Otherwise it's included regardless of the actual order of data on the input of the multiplexer. Respective audio silence will be present in the beginning of the audio track.
That is, make sure the data is correctly time stamped and multiplexer will get it written.

tl;dr - For my use case, a frame I process not being present in the final AVI is showstopper, and the AVI mux/demux process is complicated enough that I'm better off just assuming some small number of frames may be dropped at the beginning. So I'll likely settle on pushing a number of special frames at the beginning (identified with a GUID/counter pair encoded in the pixels) before I start processing frames. I can then look for these special frames after writing the AVI to identify the frame where processing begins is present.
Everything I've seen leads me to believe what I originally asked for is effectively not possible. Based on file size, I think technically the video frames are written to the AVI file, but for most intents, they might as well not be.
That is, avi players like virtualdub and VLC, and even the directshow AVI splitter, ignore/drop any video frames present before the audio starts. So I imagine you'd have to parse the AVI file with some other library to extract the pre-audio frames.
The reason I care about this is because I write a parallel data structure with an entry for each frame in the AVI file, and I need to know which data goes with which frame. If frames are dropped from the AVI, I can't match the frames and the data.
I had success with creating custom transform filters after the video/audio capture filters. These filters look at the timestamps and drop the video frames until an audio start time is established and the video frames are after that time. Then the filters downstream know that they can rely on the video frames they process being written. The drawback is that the audio filter actually delivers samples a bit delayed, so when audio starts at 100ms, I don't find out until I'm already handling the video frame at 250ms, meaning I've dropped 250ms of video data to ensure I know when video frames will have accompanying audio. Combine that with different AVI tools behaving differently when video starts more than 1 video sample duration after the audio starts, and my confidence in trying to control the AVIMux/Splitter starts to wane.
All of that leads me to just accept that the AVIMux and AVI Splitter are complicated enough to not make it worth trying to control them exactly.

Related

Muxing non-synchronised streams to Haali

I have 2 input streams of data that are being passed to a Haali Muxer (mp4 format).
Currently I stream these to Haali directly in a DirectShow graph without a clock. I wondered if I should be trying to write these to the muxer synchronised, or whether it happily accepts a stream of audio data that stops before the video data stream stops. (I have issues with the output file not playing audio after seeking, and I'm not sure why this could occur)
I can't find much in the way of documentation for muxing with the Haali muxer, does anyone know the best place to look for info on this filter?
To have the streams multiplexed into single MP4 file you need single instance of multiplexer (Haali, GDCL, commercial, wrapper over mp4v2 library, over Media Foundation sink etc) with two (or more) input pins on it connected to respective sources, which in turn are going to be written as tracks.
Filter graph clock does not matter. Clock is for presentation, and file writers accept incoming data and write it as soon as possible anyway. It is more accurate to remove the clock, as you seem to already be doing, but having standard clock is not going to be different.
Data is synchronized using time stamps on individual media samples, parts of media streams. Multiplexer builds internal queues for every stream and then consumes data from the streams to build single file, in a sort of way that original stream data is interleaved. If one stream supplies too much data, that is, if data is available too early while another stream supplies data slowly, multiplexer blocks further data reception on this particular stream by not returning from respective processing call (IPin::Receive) expecting that during this wait the slow stream provides additional input. Eventually, what multiplexer looks at when matching data from different streams is data time stamps.
To obtain synchronized data in resulting MP4 file you, thus, need to make sure the payload data is properly time stamped. Multiplexer will take care of the rest.
This also includes that the time stamps should be monotonously increasing within a stream, and key frames/splice points are respectively indicated. Otherwise some multiplexers might issue a failure immediately, other would produce the output file but it might have playback issues (esp. seeking).

Is it OK for a DirectShow filter to seek the filters upstream from itself?

Normally seek commands are executed on a filter graph, get called on the renderers in the graph and calls are passed upstream by filters until a filter that can handle the seek does the actual seek operation.
Could an individual filter seek the upstream filters connected to one or more of its input pins in the same way without it affecting the downstream portion of the graph in unexpected ways? I wouldn't expect that there wouldn't be any graph state changes caused by calling IMediaSeeking.SetPositions upstream.
I'm assuming that all upstream filters are connected to the rest of the graph via this filter only.
Obviously the filter would need to be prepared to handle the resulting BeginFlush, EndFlush and NewSegment calls coming from upstream appropriately and distinguish samples that arrived before and after the seek operation. It would also need to set new sample times on its output samples so that the output samples had consistent sample presentation times. Any other issues?
It is perfectly feasible to do what you require. I used this approach to build video and audio mixer filters for a video editor. A full description of the code is available from the BBC White Papers 129 and 138 available from http://www.bbc.co.uk/rd
A rather ancient version of the code can be found on www.SourceForge.net if you search for AAFEditPack. The code is written in Delphi using DSPack to get access to the DirectShow headers. I did this because it makes it easier to handle com object lifetimes - by implementing smart pointers by default. It should be fairly straightforward to transfer the ideas to a C++ implementation if that is what you use.
The filters keep lists of the sub-graphs (a section of a graph but running in the same FilterGraph as the mixers). The filters implement a custom version of TBCPosPassThru which knows about the output pins of the sub-graph for each media clip. It handles passing on the seek commands to get each clip ready for replay when its point in the timeline is reached. The mixers handle the BeginFlush, EndFlush, NewSegment and EndOfStream calls for each sub-graph so they are kept happy. The editor uses only one FilterGraph that houses both video and audio graphs. Seeking commands are make by the graph on both the video and audio renderers and these commands are passed upstream to the mixers which implement them.
Sub-graphs that are not currently active are blocked by the mixer holding references to the samples they have delivered. This does not cause any problems for the FilterGraph because, as Roman R says, downstream filters only care about getting a consecutive stream of sample and do not know about what happens upstream.
Some key points you need to make sure of to avoid wasted debugging time are:
Your decoder filters need to be able to queue to the exact media frame or audio time. Not as easy to do as you might expect, especially with compressed formats such as mpeg2, which was designed for transmission and has no frame index in the files. If you do not do this, the filter may wait indefinitely to get a NewSegment call with the correct media times.
Your sub graphs need to present a NewSegment time equal to the value you asked for in your seek command before delivering samples. Some decoders may seek to the nearest key frame, which is a bit unhelpful and some are a bit arbitrary about the timings of their NewSegment and the following samples.
The start and stop times of each clip need to be within the duration of the file. Its probably not a good idea to police this in the DirectShow filter because you would probably want to construct a timeline without needing to run the filter first. I did this in the component that manages the FilterGraph.
If you want to add sections from the same source file consecutively in the timeline, and have effects that span the transition, you need to have two instances of the sub-graph for that file and if you have more than one transition for the same source file, your list needs to alternate the graphs for successive clips. This is because each sub graph should only play monotonically: calling lots of SetPosition calls would waste cpu cycles and would not work well with compressed files.
The filter's output pins define the entire seeking behaviour of the graph. The output sample time stamps (IMediaSample.SetTime) are implemented by the filter so you need to get them correct without any missing time stamps. and you can also set the MediaTime (IMediaSample.SetMediaTime) values if you like, although you have to be careful to get them correct or the graph may drop samples or stall.
Good luck with your development. If you need any more information please contact me through StackOverflow or DTSMedia.co.uk

High Resolution Capture and Encoding

I'm using two custom push filters to inject audio and video (uncompressed RGB) into a DirectShow graph. I'm making a video capture application, so I'd like to encode the frames as they come in and store them in a file.
Up until now, I've used the ASF Writer to encode the input to a WMV file, but it appears the renderer is too slow to process high resolution input (such as 1920x1200x32). At least, FillBuffer() seems to only be able to process around 6-15 FPS, which obviously isn't fast enough.
I've tried increasing the cBuffers count in DecideBufferSize(), but that only pushes the problem to a later point, of course.
What are my options to speed up the process? What's the right way to do live high res encoding via DirectShow? I eventually want to end up with a WMV video, but maybe that has to be a post-processing step.
You have great answers posted here to your question: High resolution capture and encoding too slow. The task is too complex for the CPU in your system, which is just not fast enough to perform realtime video encoding in the configuration you set it to work.

What does `scan` mean in CSS Media Queries?

What exactly does the scan media features values progressive and interlace do exactly in simple terms? And are these the only values available for the scan feature?
They have to do with the output method of the screen of the device.
Describes the scanning process of television output devices.
Source.
progressive and interlace are the only two possible values.
Progressive Scan
Progressive (or noninterlaced scanning) is a way of displaying, storing, or transmitting moving images in which all the lines of each frame are drawn in sequence.
Source.
Interlaced Scan
Interlaced video is a technique of doubling the perceived frame rate of a video signal without consuming extra bandwidth. Since the interlaced signal contains the two fields of a video frame shot at two different times, it enhances motion perception to the viewer and reduces flicker by taking advantage of the persistence of vision effect.
Source.
It is used in style sheets for television. More info (book excerpt) here. If Interlaced and progressive videos interest you in some way, you can read about it here.

Detecting format of raw burned data which was burned as CDDA

Given a file containing raw data burned to a CD/DVD, I'm trying to figure out which format was used (i.e. ISO, UDF, etc.).
For example, I can detect the ISO9660 format because in the beginning of one of the 2048-byte sectors I can find the string "CD001".
However, when it comes to CDDA (Redbook) data, it seems to be much harder. I tried looking for the synchronization bytes, but wasn't very lucky so far. Another complication in this case is that each track is written to another file. In case of ISO, for example, everything is on a single track; In case of audio I got a separate file for each track.
I'm aware that this format is abused by many companies, so I'm just looking for a pretty good detection.
The specific case is to detect audio CD (in CDDA format) created by Windows Media Player.
Any suggestions?

Resources