Muxing non-synchronised streams to Haali - directshow

I have 2 input streams of data that are being passed to a Haali Muxer (mp4 format).
Currently I stream these to Haali directly in a DirectShow graph without a clock. I wondered if I should be trying to write these to the muxer synchronised, or whether it happily accepts a stream of audio data that stops before the video data stream stops. (I have issues with the output file not playing audio after seeking, and I'm not sure why this could occur)
I can't find much in the way of documentation for muxing with the Haali muxer, does anyone know the best place to look for info on this filter?

To have the streams multiplexed into single MP4 file you need single instance of multiplexer (Haali, GDCL, commercial, wrapper over mp4v2 library, over Media Foundation sink etc) with two (or more) input pins on it connected to respective sources, which in turn are going to be written as tracks.
Filter graph clock does not matter. Clock is for presentation, and file writers accept incoming data and write it as soon as possible anyway. It is more accurate to remove the clock, as you seem to already be doing, but having standard clock is not going to be different.
Data is synchronized using time stamps on individual media samples, parts of media streams. Multiplexer builds internal queues for every stream and then consumes data from the streams to build single file, in a sort of way that original stream data is interleaved. If one stream supplies too much data, that is, if data is available too early while another stream supplies data slowly, multiplexer blocks further data reception on this particular stream by not returning from respective processing call (IPin::Receive) expecting that during this wait the slow stream provides additional input. Eventually, what multiplexer looks at when matching data from different streams is data time stamps.
To obtain synchronized data in resulting MP4 file you, thus, need to make sure the payload data is properly time stamped. Multiplexer will take care of the rest.
This also includes that the time stamps should be monotonously increasing within a stream, and key frames/splice points are respectively indicated. Otherwise some multiplexers might issue a failure immediately, other would produce the output file but it might have playback issues (esp. seeking).

Related

MediaRecorder API chunks as an independent videos

I'm trying to build simple app that would stream video from camera using browser to the remote server.
For the camera access from browser I've found a wonderful WebRTC API: getUserMedia.
Now for the streaming it to the server IIUC the best way would be to use some of the WebRTC_API for transporting and then use some server side library to deal with it.
However, at first I went with a bit different approach:
I've user MediaRecorder based on the stream from camera. And then I was setting the timeslice for the MediaRecorder.start() to be few hundred Ms, e.g. 200. And I had some assumptions in wrt MediaRecorder which are not in sync with what I was observing:
I've observed weird behaviour(wrt to my assumptions about MediaRecorder):
If there was only 1 chunk uploaded to server -> it opens just fine.
If there are multiple chunks -> none of them loads correctly, they give errors: Could not determine type of stream. But then if I use ffmpeg to concat all the chunks - resulting file is fine. Same happens if I'm concatenating the blobs from MediaRecorder.ondataavailable on the client.
Thus the question:
Can the chunks in theory be independent video files? Or it is not what MediaRecorder was designed for? If it is not - then why do we even have the option to give timeslice parameter in its start() method?
Bonus question
If we're setting timeslice comparatively small, e.g. 10ms -> lots of data blobs that are sent to MediaRecorder.ondataavailable are of size 0. Where we can find some sort of guarantees/specs on the minimal timeslice that we can use, so that the data blobs are meaningful?
In the documentation there are the following:
If timeslice is not undefined, then once a minimum of timeslice milliseconds of data have been collected, or some minimum time slice imposed by the UA, whichever is greater, start gathering data into a new Blob blob, and queue a task, using the DOM manipulation task source, that fires a blob event named dataavailable at recorder with blob.
So, my guess is that it is somehow related to some data blobs being of 0 size. What does it "some minimum time slice imposed by the UA" mean?
PS
Happy to provide code if needed. But the question is not about some specific code. It is to get understanding of the assumptions behind the MediaRecorder API and why they are there.
The timeslice parameter does not allow to create independent media chunks; instead, it gives an opportunity to save data (e.g. on the filesystem, or uploaded to a server) on a regular basis, rather than holding potentially large media content in memory.

Qt: API to write raw QAudioInput data to file just like QAudioRecorder

I'm trying to monitor an audio input and record the audio to a file but only when a level threshold is exceeded. There seems to be two main options for recording in Qt; QAudioRecorder and QAudioInput.
Long story short: I'm trying to find the API that can take raw audio sample data read from QAudioInput and record it to a file just like QAudioRecorder does, but strangely it doesn't seem to exist. To give an example, the setup of the QAudioRecorder would be something like the following (but instead of specifying a input device with setAudioInput() you pass it sampled bytes):
QAudioEncoderSettings audioSettings;
QAudioRecorder recorder = new QAudioRecorder;
audioSettings.setCodec("audio/PCM");
audioSettings.setQuality(QMultimedia::HighQuality);
recorder.setEncodingSettings(audioSettings);
recorder.setContainerFormat("wav");
recorder.setOutputLocation(QUrl::fromLocalFile("/tmp/test.wav"));
recorder.setAudioInput("default");
recorder.record();
I'm using QAudioInput because I need access to the raw samples. The problem with QAudioInput is, Qt does not seem to provide an easy way to take the raw samples I get out of the QAudioInput and pipe them into a file encoding them along the way. QAudioRecorder does this nicely, but you can't feed raw samples into QAudioRecorder; you just tell it which device to record from and how you want it stored.
Note I tried using QAudioInput and QAudioRecorder together - QAudioInput for the raw access and QAudioRecorder whenever I need to record, but there is two main issues with that: A) Only one of these can be reading a given device at a time. B) I want to record the data at and just before the threshold is exceeded and not just after the threshold is exceeded.
I ended up using QAudioRecorder+QAudioProbe. There are a couple of limitations though:
Firstly the attached QAudioProbe only works if QAudioRecorder is actually recording, so I had to write a wrapper over QAudioRecorder to switch on/off recording by switching output device to|from actual_file|/dev/null.
Second, as I stated "I want to record the data at and just before the threshold is exceeded and not just after the threshold is exceeded". Well, I had to compromise on that. The probe is used to detect the recording condition, but there is no way to stuff the data from the probe back into the recorder. I mean, I guess you could record to a buffer file in idle state and somehow prepend part of that data ... but the complexity wasn't worth it for me.
Aside; there was another issue with QAudioRecorder that motivated me to write a wrapper over it. Basically I found QAudioRecorder::stop() sometimes hangs indefinitely. To get around this, I had to heap allocate a recorder and delete it and init a new one with each new recording.

Controlling trimming behavior of DirectShow's AVIMux?

When the directshow AVIMux is provided with two streams of data (e.g. audio and video) and one stream starts a bit before the other, is there any way to control how the AVIMux behaves? Namely, if the AVIMux gets a few video frames before the audio starts, it will actually omit the video frames from the output AVI. This contrasts with what it does when audio is missing at the end, when it includes the video frames anyways.
My sources for the audio and video are live streams (commercial capture filters I can't really improve/control), and I'd like to keep the video frames even though the audio starts a bit later.
Is there a nice way to do this? I can imagine wrapping the two filters into a custom filter with its own graph and inserting silence as necessary, but it would be awesome to not have to go to all of that trouble.
The question has seemingly incorrect assumption about dropping frames in the multiplexer. The multiplexer looks at video and audio data time stamps. If "a few frames before..." means that time stamps are negative and the data is preroll data, then it's dropped and excluded from resulting file. Otherwise it's included regardless of the actual order of data on the input of the multiplexer. Respective audio silence will be present in the beginning of the audio track.
That is, make sure the data is correctly time stamped and multiplexer will get it written.
tl;dr - For my use case, a frame I process not being present in the final AVI is showstopper, and the AVI mux/demux process is complicated enough that I'm better off just assuming some small number of frames may be dropped at the beginning. So I'll likely settle on pushing a number of special frames at the beginning (identified with a GUID/counter pair encoded in the pixels) before I start processing frames. I can then look for these special frames after writing the AVI to identify the frame where processing begins is present.
Everything I've seen leads me to believe what I originally asked for is effectively not possible. Based on file size, I think technically the video frames are written to the AVI file, but for most intents, they might as well not be.
That is, avi players like virtualdub and VLC, and even the directshow AVI splitter, ignore/drop any video frames present before the audio starts. So I imagine you'd have to parse the AVI file with some other library to extract the pre-audio frames.
The reason I care about this is because I write a parallel data structure with an entry for each frame in the AVI file, and I need to know which data goes with which frame. If frames are dropped from the AVI, I can't match the frames and the data.
I had success with creating custom transform filters after the video/audio capture filters. These filters look at the timestamps and drop the video frames until an audio start time is established and the video frames are after that time. Then the filters downstream know that they can rely on the video frames they process being written. The drawback is that the audio filter actually delivers samples a bit delayed, so when audio starts at 100ms, I don't find out until I'm already handling the video frame at 250ms, meaning I've dropped 250ms of video data to ensure I know when video frames will have accompanying audio. Combine that with different AVI tools behaving differently when video starts more than 1 video sample duration after the audio starts, and my confidence in trying to control the AVIMux/Splitter starts to wane.
All of that leads me to just accept that the AVIMux and AVI Splitter are complicated enough to not make it worth trying to control them exactly.

Why does GetDeliveryBuffer blocked with an INTERLEAVE_CAPTURE mode AVI Mux?

I'm trying to use a customized filter to receive video and audio data from a RTSP stream, and deliver samples downstream the graph.
It seems like that this filter was modified from the SDK source.cpp sample (CSource), and implemented two output pins for audio and video.
When the filter is directly connected to an avi mux filter with INTERLEAVE_NONE mode, it works fine.
However, when the interleave mode of avi mux is set to INTERLEAVE_CAPTURE,
the video output pin will hang on the GetDeliveryBuffer method (in DoBufferProcessingLoop) of this filter after several samples have sent,
while the audio output pin still works well.
Moreover, when I inserted an infinite pin tee filter into one of the paths between the avi mux and this source filter,
the graph arbitrarily turned into stop state after some samples had been sent (one to three samples or the kind).
And when I put a filter that is just an empty trans-in-place filter which does nothing after the infinite tee,
the graph went back to the first case: never turns to stop state, but hang on the GetDeliveryBuffer.
(Here is an image that shows the connections I've mentioned like)
So here are my questions:
1: What could be the reasons that the video output pin hanged on the GetDeliveryBuffer ?
In my guess it looks like the avi mux caught these sample buffers and did not release them until they are enough for interleaving,
but even when I set the amount of video buffers to 30 in DecideBufferSize it will still hang. If the reason is indeed like that, so how do I decide the buffer size of the pin for a downstream avi muxer ?
Likely a creation of more than 50 buffers of a video pin is not guaranteed to work because the memory size cannot be promised. :(
2: Why does the graph goes to stop state when the infinite pin tee is inserted ? And why could a no-operation filter overcomes it ?
Any answer or suggestion is appreciated. Or hope someone just give me some directions. Thanks.
Blocked GetDeliveryBuffer means the allocator, you are requesting a buffer from, does not [yet] have anything for you. All media samples are outstanding and are not yet returned back to the allocator.
An obvious work around is to request more buffers at pin connection and memory allocator negotiation stage. This however just postpones the issue, which can very much similarly appear later for the same reason.
A typical issue with a topology in question is related to threading. Multiplexer filter which has two inputs will have to match input streams to produce a joint file. Quite so often on runtime it will be holding media samples on one leg while expecting more media samples to come on the other leg on another thread. It is assumes that upstream branches providing media samples are running independently so that a lock on one leg is not locking the other. This is why multiplexer can freely both block IMemInputPin::Receive methods and/old hold media samples inside. In the topology above it is not clear how exactly source filter is doing threading. The fact that it has two pins make me assume it might have threading issues and it is not taking into account that there might be a lock downstream on multiplexer.
Supposedly source filter is yours and you have source code for it. You are interested in making sure audio pin is sending media samples on a separate thread, such as through asynchronous queue.

How to use streaming audio data from microphone for ASR in Qt

I'm working on a speech recognition project and my program can recognize words from audio files. Now I need to work with the audio stream coming from microphone. I'm using QAudio for getting sound data from mic and QAudio has a function to start the process. This start(* QBuffer) function writes the data into a QBuffer(inherited from QByteArray) object. When I'm not dealing with continuous stream, I can stop recording from mic anytime I want and copy the whole data from QBuffer into a QByteArray and I can do whatever I wanna do with the data. But in continuous stream QBuffer's size increases by time and becomes 100Mb in 15 mins.
So I need to use some kind of circular buffer but I can't figure out how to do that especially with this start(*QBuffer) function. I also avoid of cutting the streaming sound at a point where the speech continues.
What is the basic way to handle streaming audio data for speech recognition?
Is it possible to change the start(*QBuffer) function into start(*QByteArray) and make the function to overwrite on that QByteArray to build and circular buffer?
Thanks in advance
boost.com is offering a circular buffer
http://www.boost.org/doc/libs/1_37_0/libs/circular_buffer/doc/circular_buffer.html#briefexample
It should meet what you need
Alain

Resources