SoundMixer.computeSpectrum with microphone - apache-flex

Flex has the SoundMixer.computeSpectrum function that lets you compute an FFT from the currently playing sound. What I'd like to do is compute an FFT without playing the sound. Since Flash 10.1 lets us access the microphone bytes directly, it seems like we should be able to compute the FFT directly off of what the user is speaking.

Unfortunately this doesn't work as far as I know. As stated on the Adobe help pages:
The SoundMixer.computeSpectrum()
method lets an application read the
raw sound data for the waveform that
is currently being played. If more
than one SoundChannel object is
currently playing the
SoundMixer.computeSpectrum() method
shows the combined sound data of every
SoundChannel object mixed together.
This implies two drawbacks:
It just works on the output (SoundChannel)
It just works on the mix of all outputs.
If you don't need the output channel at all, you may turn down it's volume to zero or near to zero!? Don't know if that could work.
For myself I don't see any other chance at the moment to implement the FFT on my own to compute a spectrum on the microphone data.

I'm not sure if there's a way to pass that data, but if all else fails, you can always compute the FFT yourself.

Related

Is it OK for a DirectShow filter to seek the filters upstream from itself?

Normally seek commands are executed on a filter graph, get called on the renderers in the graph and calls are passed upstream by filters until a filter that can handle the seek does the actual seek operation.
Could an individual filter seek the upstream filters connected to one or more of its input pins in the same way without it affecting the downstream portion of the graph in unexpected ways? I wouldn't expect that there wouldn't be any graph state changes caused by calling IMediaSeeking.SetPositions upstream.
I'm assuming that all upstream filters are connected to the rest of the graph via this filter only.
Obviously the filter would need to be prepared to handle the resulting BeginFlush, EndFlush and NewSegment calls coming from upstream appropriately and distinguish samples that arrived before and after the seek operation. It would also need to set new sample times on its output samples so that the output samples had consistent sample presentation times. Any other issues?
It is perfectly feasible to do what you require. I used this approach to build video and audio mixer filters for a video editor. A full description of the code is available from the BBC White Papers 129 and 138 available from http://www.bbc.co.uk/rd
A rather ancient version of the code can be found on www.SourceForge.net if you search for AAFEditPack. The code is written in Delphi using DSPack to get access to the DirectShow headers. I did this because it makes it easier to handle com object lifetimes - by implementing smart pointers by default. It should be fairly straightforward to transfer the ideas to a C++ implementation if that is what you use.
The filters keep lists of the sub-graphs (a section of a graph but running in the same FilterGraph as the mixers). The filters implement a custom version of TBCPosPassThru which knows about the output pins of the sub-graph for each media clip. It handles passing on the seek commands to get each clip ready for replay when its point in the timeline is reached. The mixers handle the BeginFlush, EndFlush, NewSegment and EndOfStream calls for each sub-graph so they are kept happy. The editor uses only one FilterGraph that houses both video and audio graphs. Seeking commands are make by the graph on both the video and audio renderers and these commands are passed upstream to the mixers which implement them.
Sub-graphs that are not currently active are blocked by the mixer holding references to the samples they have delivered. This does not cause any problems for the FilterGraph because, as Roman R says, downstream filters only care about getting a consecutive stream of sample and do not know about what happens upstream.
Some key points you need to make sure of to avoid wasted debugging time are:
Your decoder filters need to be able to queue to the exact media frame or audio time. Not as easy to do as you might expect, especially with compressed formats such as mpeg2, which was designed for transmission and has no frame index in the files. If you do not do this, the filter may wait indefinitely to get a NewSegment call with the correct media times.
Your sub graphs need to present a NewSegment time equal to the value you asked for in your seek command before delivering samples. Some decoders may seek to the nearest key frame, which is a bit unhelpful and some are a bit arbitrary about the timings of their NewSegment and the following samples.
The start and stop times of each clip need to be within the duration of the file. Its probably not a good idea to police this in the DirectShow filter because you would probably want to construct a timeline without needing to run the filter first. I did this in the component that manages the FilterGraph.
If you want to add sections from the same source file consecutively in the timeline, and have effects that span the transition, you need to have two instances of the sub-graph for that file and if you have more than one transition for the same source file, your list needs to alternate the graphs for successive clips. This is because each sub graph should only play monotonically: calling lots of SetPosition calls would waste cpu cycles and would not work well with compressed files.
The filter's output pins define the entire seeking behaviour of the graph. The output sample time stamps (IMediaSample.SetTime) are implemented by the filter so you need to get them correct without any missing time stamps. and you can also set the MediaTime (IMediaSample.SetMediaTime) values if you like, although you have to be careful to get them correct or the graph may drop samples or stall.
Good luck with your development. If you need any more information please contact me through StackOverflow or DTSMedia.co.uk

Using OpenCL for multiple devices (multiple GPU)

Hello fellow StackOverflow Users,
I have this problem : I have one very big image which i want to work on. My first idea is to divide the big image to couple of sub-images and then send this sub-images to different GPUs. I don't use the Image-Object, because I don't work with the RGB-Value, but I'm only using the brightness value to manipulate the image.
My Question are:
Can I use one context with many commandqueues for every device? or should I use one context with one commandqueue for each device ?
Can anyone give me an example or ideas, how I can dynamically change the inputMem-Data (sub-images data) for setting up the kernel arguments to send to the each device ? (I only know how to send the same input data)
For Example, If I have more sub-images than the GPU-number, how can I distribute the sub-images to the GPUs ?
Or maybe another smarter approach?
I'll appreciate every help and ideas.
Thank you very much.
Use 1 context, and many queues. The simple method is one queue per device.
Create 1 program, and a kernel for each device (created from the same program). Then create different buffers (one per device) and set each kernel with each buffer. Now you have different kernels, and you can queue them in parallel with different arguments.
To distribute the jobs, simple use the event system. Checking if a GPU is empty and queing there the next job.
I can provide more detailed example with code, but as general sketch that should be the way to follow.
AMD APP SDK has few samples on multi gpu handling. You should be looking at these 2 samples
SimpleMultiDevice: shows how to create multiple commandqueues on single context and some performance results
BinomailoptionMultiGPU: look at loadBalancing method. It divides the buffer based on compute units & max clock freq of available gpus

IMediaControl::Run followed by IMediaControl::Stop followed by IMeidaControl::Run doesn't switch on certain Onboard cameras

I have a DirectShow webcam application. I make use of Sample Grabber to get the buffer callbacks and IVideoWindow to control the display co-ordinates for the Preview. I have Preview and Capture Streams which I run as below.
g_pBuild->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,cam,g_pGrabberF,pNullRenderer2); g_pBuild->RenderStream(&PIN_CATEGORY_PREVIEW, &MEDIATYPE_Video,cam,NULL,NULL);
On certain On board cameras, IMediaControl::Run followed by IMediaControl::Stop followed by IMediaCOntrol::Run doesn't switch on the camera.
Extenal USB cameras work properly here. How can I diagnose more on this? Any pointers, please help.
Maybe its specific to a certain hardware issue in the unit.
Do a quick test by adding sleep of 1 sec between calls.
If it does help than you need to find a way to know when to unit state in idle or not.
There are two important parts of the question which you did not provide:
Filter graph topologies
HRESULTs of the method calls
A problem you might be having is that one of the filters in the topology does not handle well state transitions and fails somewhere between states. Supposedly your second Run meets it still trying to complete Stop. You might get a HRESULT there which indicates the issue (better for you) or the filter fails silently.
The filter graph's is the unlikely source of the bug itself. Chances are high that it does everything flawlessly, however since internally it distributes the calls between filters, one of the filter is letting you down.

How to adjust a sound clip's volume in real time using DShow.h and strmiids.lib with C++

I am trying to figure out how to set the volume in in real time that my sound clips play at in my C++ program, and do things like make the volume of the sound increase as 2 objects move closer to one another. Right now, I am using "DShow.h" as well as "strmiids.lib", and I am using the interface provided by the following data member pointers:
IGraphBuilder* m_graphBuilder;
IMediaControl* m_mediaControl;
IMediaEvent* m_mediaEvent;
IMediaSeeking* m_mediaSeeking;
Using the interface provided by these, is there a way to alter the volume of the media stream playing?
Have a look at the IBasicAudio interface.

I want to convert a sound from Mic to binary and match it from the database

I want to convert a sound from Mic to binary and match it from the database(a type of voice identification program but don't getting idea how to get sound from Mic directly so that i can convert it to binary?Also it is possible or not. Please guide me )
See this:
http://www.dotnetspider.com/resources/4967-How-record-voice-from-microphone.aspx
You're not going to be able to identify voices by doing a binary comparison on sound data. The binary of a particular sound will not be identical to an imitation of that sound unless it is literally the same file because of minor variations in just about everything. You'll need to do some signals processing to do a fuzzy comparison of the data. You can read about signal processing on wikipedia.
You will probably find it easier to use a third party library to process the sound for you. Something like this might be a good start.
You're looking at two very distinct problems here.
The first is pretty technical: Getting sound from the microphone into a digital waveform. How you do this exactly depends on the OS and API you're using (on Windows, you're probably looking at DirectX audio or, if available, ASIO). Typically, this is how you'd proceed:
Set up a recording buffer for the microphone, with suitable parameters (number of channels, physical input on the sound card, sample rate, bit depth, buffer size)
Start the recording. This usually involves pointing the sound library to a callback function to process the recorded buffer.
In the callback, read the buffer, convert it to a suitable format, and append it to the audio file of your choice. (You could also record to RAM only, but longer recordings may exceed available storage).
Store the recorded audio in a suitable database field (some kind of binary blob)
This is the easy part though; the harder part is matching a chunk of audio data against other chunks. A naïve approach would be to try and find exact matches, but that won't help you much, because the chance that you find one is practically zero - recording equipment, even the best, introduces a bit of random noise, and recording setups vary slightly whether you want to or not, so even if you'd have someone say something twice, perfectly identical, you'd still see differences in the recorded audio.
What you need to do, then, is find certain typical characteristics of the waveform. Things you could look for are:
Overall amplitude shape
Base frequencies
Selected harmonics (formants)
Extracting these is non-trivial and involves pretty severe math; and then you'll have to condense them into some sort of fingerprint, and find a way to compare them with some fuzziness (so that a near-match is good enough, rather than requiring exact matches). Finding the right parameters and comparison algorithms isn't easy, and it takes a lot of tweaking and testing; your best bet is to go find a library that does this for you.

Resources