Is it OK for a DirectShow filter to seek the filters upstream from itself? - directshow

Normally seek commands are executed on a filter graph, get called on the renderers in the graph and calls are passed upstream by filters until a filter that can handle the seek does the actual seek operation.
Could an individual filter seek the upstream filters connected to one or more of its input pins in the same way without it affecting the downstream portion of the graph in unexpected ways? I wouldn't expect that there wouldn't be any graph state changes caused by calling IMediaSeeking.SetPositions upstream.
I'm assuming that all upstream filters are connected to the rest of the graph via this filter only.
Obviously the filter would need to be prepared to handle the resulting BeginFlush, EndFlush and NewSegment calls coming from upstream appropriately and distinguish samples that arrived before and after the seek operation. It would also need to set new sample times on its output samples so that the output samples had consistent sample presentation times. Any other issues?

It is perfectly feasible to do what you require. I used this approach to build video and audio mixer filters for a video editor. A full description of the code is available from the BBC White Papers 129 and 138 available from http://www.bbc.co.uk/rd
A rather ancient version of the code can be found on www.SourceForge.net if you search for AAFEditPack. The code is written in Delphi using DSPack to get access to the DirectShow headers. I did this because it makes it easier to handle com object lifetimes - by implementing smart pointers by default. It should be fairly straightforward to transfer the ideas to a C++ implementation if that is what you use.
The filters keep lists of the sub-graphs (a section of a graph but running in the same FilterGraph as the mixers). The filters implement a custom version of TBCPosPassThru which knows about the output pins of the sub-graph for each media clip. It handles passing on the seek commands to get each clip ready for replay when its point in the timeline is reached. The mixers handle the BeginFlush, EndFlush, NewSegment and EndOfStream calls for each sub-graph so they are kept happy. The editor uses only one FilterGraph that houses both video and audio graphs. Seeking commands are make by the graph on both the video and audio renderers and these commands are passed upstream to the mixers which implement them.
Sub-graphs that are not currently active are blocked by the mixer holding references to the samples they have delivered. This does not cause any problems for the FilterGraph because, as Roman R says, downstream filters only care about getting a consecutive stream of sample and do not know about what happens upstream.
Some key points you need to make sure of to avoid wasted debugging time are:
Your decoder filters need to be able to queue to the exact media frame or audio time. Not as easy to do as you might expect, especially with compressed formats such as mpeg2, which was designed for transmission and has no frame index in the files. If you do not do this, the filter may wait indefinitely to get a NewSegment call with the correct media times.
Your sub graphs need to present a NewSegment time equal to the value you asked for in your seek command before delivering samples. Some decoders may seek to the nearest key frame, which is a bit unhelpful and some are a bit arbitrary about the timings of their NewSegment and the following samples.
The start and stop times of each clip need to be within the duration of the file. Its probably not a good idea to police this in the DirectShow filter because you would probably want to construct a timeline without needing to run the filter first. I did this in the component that manages the FilterGraph.
If you want to add sections from the same source file consecutively in the timeline, and have effects that span the transition, you need to have two instances of the sub-graph for that file and if you have more than one transition for the same source file, your list needs to alternate the graphs for successive clips. This is because each sub graph should only play monotonically: calling lots of SetPosition calls would waste cpu cycles and would not work well with compressed files.
The filter's output pins define the entire seeking behaviour of the graph. The output sample time stamps (IMediaSample.SetTime) are implemented by the filter so you need to get them correct without any missing time stamps. and you can also set the MediaTime (IMediaSample.SetMediaTime) values if you like, although you have to be careful to get them correct or the graph may drop samples or stall.
Good luck with your development. If you need any more information please contact me through StackOverflow or DTSMedia.co.uk

Related

Dissasemble 68xx code without entry point vector

I am trying to disassemble a code from a old radio containing a 68xx (68hc12 like) microcontroller. The problem is, I dont have the access to the interrupt vector of the micro in the top of the ROM, so I don't know where start to look. I only have the code below the top. There is some suggestion of where or how can I find meaningful routines in the code data?
You can't really disassemble reliably without knowing where the reset vector points. What you can do, however, is try to narrow down the possible reset addresses by eliminating all those other addresses that cannot possibly be a starting point.
So, given that any address in the memory map that contains a valid opcode is a potential reset point, you need to either eliminate it, or keep it for further analysis.
For the 68HC11 case, you could try to guess somewhat the entry point by looking for LDS instructions with legitimate operand value (i.e., pointing at or near the top of available RAM -- if multiple RAM banks, then to any of them).
It may help a bit if you know the device's full memory map, i.e., if external memory is used, its mapping and possible mapped peripherals (e.g., LCD). Do you also know CONFIG register contents?
The LDS instruction is usually either the very first instruction, or close thereafter (so look back a few instructions when you feel you have finally singled out your reset address). The problem here is some data may, by chance, appear as LDS instructions so you could end up with multiple potentially valid entry points. Only one of them is valid, of course.
You can eliminate further by disassembling a few instructions starting from each of these LDS instructions until you either hit an illegal opcode (i.e. obviously not a valid code sequence but an accidental data arrangement that looks like opcodes), or you see a series of instructions that are commonly used in 68HC11 initialization. These involve (usually) initialization of any one or more of the registers BPROT, OPTION, SCI, INIT ($103D in most parts, but for some $3D), etc.
You could write a relatively small script (e.g., in Lua) to do the basic scanning of the memory map and produce a (hopefully small) set of potential reset points to be examined further with a true disassembler for hints like the ones I mentioned.
Now, once you have the reset vector figured out the job becomes somewhat easier but you still need to figure out where any interrupt handlers are located. For this your hint is an RTI instruction and whatever preceding code that normally should acknowledge the specific interrupt it handles.
Hope this helps.

IMediaControl::Run followed by IMediaControl::Stop followed by IMeidaControl::Run doesn't switch on certain Onboard cameras

I have a DirectShow webcam application. I make use of Sample Grabber to get the buffer callbacks and IVideoWindow to control the display co-ordinates for the Preview. I have Preview and Capture Streams which I run as below.
g_pBuild->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,cam,g_pGrabberF,pNullRenderer2); g_pBuild->RenderStream(&PIN_CATEGORY_PREVIEW, &MEDIATYPE_Video,cam,NULL,NULL);
On certain On board cameras, IMediaControl::Run followed by IMediaControl::Stop followed by IMediaCOntrol::Run doesn't switch on the camera.
Extenal USB cameras work properly here. How can I diagnose more on this? Any pointers, please help.
Maybe its specific to a certain hardware issue in the unit.
Do a quick test by adding sleep of 1 sec between calls.
If it does help than you need to find a way to know when to unit state in idle or not.
There are two important parts of the question which you did not provide:
Filter graph topologies
HRESULTs of the method calls
A problem you might be having is that one of the filters in the topology does not handle well state transitions and fails somewhere between states. Supposedly your second Run meets it still trying to complete Stop. You might get a HRESULT there which indicates the issue (better for you) or the filter fails silently.
The filter graph's is the unlikely source of the bug itself. Chances are high that it does everything flawlessly, however since internally it distributes the calls between filters, one of the filter is letting you down.

Different approaches on getting captured video frames in DirectShow

I was using a callback mechanism to grab the webcam frames in my media application. It worked, but was slow due to certain additional buffer functions that were performed within the callback itself.
Now I am trying the other way to get frames. That is, call a method and grab the frame (instead of callback). I used a sample in CodeProject which makes use of IVMRWindowlessControl9::GetCurrentImage.
I encountered the following issues.
In a Microsoft webcam, the Preview didn't render (only black screen) on Windows 7. But the same camera rendered Preview on XP.
Here my doubt is, will the VMR specific functionalities be dependent on camera drivers on different platforms? Otherwise, how could this difference happen?
Wherever the sample application worked, I observed that the biBitCount member of the resulting BITMAPINFOHEADER structure is 32.
Is this a value set by application or a driver setting for VMR operations? How is this configured?
Finally, which is the best method to grab the webcam frames? A callback approach? Or a Direct approach?
Thanks in advance,
IVMRWindowlessControl9::GetCurrentImage is intended for occasional snapshots, not for regular image grabbing.
Quote from MSDN:
This method can be called at any time, no matter what state the filter
is in, whether running, stopped or paused. However, frequent calls to
this method will degrade video playback performance.
This methods reads back from video memory which is slow in first place. This methods does conversion (that is, slow again) to RGB color space because this format is most suitable for for non-streaming apps and gives less compatibility issues.
All in all, you can use it for periodic image grabbing, however this is not what you are supposed to do. To capture at streaming rate you need you use a filter in the pipeline, or Sample Grabber with callback.

Generating silent audio track

I'm using a simple DirectShow graph to convert some videos to WMV format, which is working fine. I'm now trying to use a filter based on the Synth Filter sample to supply a silent audio track to the videos and I'm running into some problems.
Essentially, I don't know how to stop the graph when this filter (the synth filter) is connected. I guess because it just provides samples forever until somebody tells it to stop, the usual approach of calling IMediaEvent::WaitForCompletion on the filter graph doesn't work (the graph never stops). What I want it to do of course is stop as soon as the video source filter is finished.
I've tried tracking the position of the graph with IMediaSeeking::GetPositions and then manually stopping the graph when this exceeds the duration of the source file, but the accuracy of the stop time with this approach isn't great.
Can anyone think of a better way to do this? Do I need to have another filter that monitors the output from the video source and also has a pointer to the audio source so it can stop it as soon as the video source delivers EndOfStream? Is there no way to accomplish this from purely application-side code?
I've done something not too different myself in the past. I added support for IMediaSeeking to the silence generator filter, and then you need to make sure that you set start and stop times for the conversion (even if it's just 0 and duration), so that the silence generator can generate the right amount of audio and then send EOS.
G

Intelligent Voice Recording: Request for Ideas

Say you have a conference room and meetings take place at arbitrary impromptu times. You would like to keep an audio record of all meetings. In order to make it as easy to use as possible, no action would be required on the part of meeting attenders, they just know that when they have a meeting in a specific room they will have a record of it.
Obviously just recording nonstop would be inefficient as it would be a waste of data storage and a pain to sift through.
I figure there are two basic ways to go about it.
Recording simply starts and stops according to sound level thresholds.
Recording is continuous, but split into X minute blocks. Blocks found to contain no content are discarded.
I like the second way better because I feel there is less risk for losing data because of late starts, or triggers failing.
I would like to implement in Python, and on Windows if possible.
Implementation suggestions?
Bonus considerations that probably deserve their own questions:
best audio format and compression for this purpose
any way of determining how many speakers are present, assuming identification is unrealistic
This is one of those projects where the path is going to be defined more about what's on hand for ready reuse.
You'll probably find it easier to continuously record and saving the data off in chunks (for example, hour long pieces).
Format is going to be dependent on what you in the form of recording tools and audio processing library. You may even find that you use two. One format, like PCM encoded WAV for recording and processing, but compressed MP3 for storage.
Once you have an audio stream, you'll need to access it in a PCM form (list of amplitude values). A simple averaging approach will probably be good enough to detect when there is a conversation. Typical tuning attributes:
* Average energy level to trigger
* Amount of time you need to be at the energy level or below to identify stop and start (I recommend two different values)
* Size of analysis window for averaging
As for number of participants, unless you find a library that does this, I don't see an easy solution. I've used speech recognition engines before and also done a reasonable amount of audio processing and I haven't seen any 'easy' ways to do this. If you were to look, search out universities doing speech analysis research. You may find some prototypes you can modify to give your software some clues.
I think you'll have difficulty doing this entirely in Python. You're talking about doing frequency/amplitude analysis of MP3 files. You would have to open up the file and look for a volume threshold, then cut out the portions that go below that threshold. Figuring out how many speakers are present would require very advanced signal processing.
A cursory Google search turned up nothing for me. You might have better luck looking for an off-the-shelf solution.
As an aside- there may be legal complications to having a recorder running 24/7 without letting people know.

Resources