Contradiction in the Media Foundation AAC specification regarding LATM/LAOS streams - ms-media-foundation

Reading the official specification for the ms media foundation AAC decoder there is the following section in the middle of the document:
The AAC decoder does not support any of the following:
Main profile, Sample-Rate Scalable (SRS) profile, or Long Term Prediction (LTP) profile.
Audio data interchange format (ADIF).
LATM/LAOS transport streams.
Coupling channel elements (CCEs). The decoder will skip audio frames with CCEs.
AAC-LC with a 960-sample frame size. Only
1024-sample frames are supported.
But in the beginning of the same page the following is stated:
Starting in Windows 8, the AAC decoder also supports decoding MPEG-4
audio transport streams with a multiplex layer (LATM) and
synchronization layer (LOAS). It can also convert an LATM/LOAS stream
to ADTS.
I assume and am quite sure that LATM/LAOS transport streams are supported and that the part which says LATM/LAOS is not supported was simply not removed when support for it was added. But is this assumption correct?
What I am afraid of is that I am somehow not seeing some nuance here and that both statements are true but in different cases.

Related

AMDh264Encoder returning MF_E_ATTRIBUTENOTFOUND when checking MFSampleExtension_CleanPoint

When receiving an output from IMFTransform::ProcessOutput, calling GetUINT32(MFSampleExtension_CleanPoint) on the sample fails and returns MF_E_ATTRIBUTENOTFOUND only while using the AMDh264Encoder (NV12 in, H264 out), as such there are no keyframes in the final output video so it is corrupted.
What causes getting attribute MFSampleExtension_CleanPoint MF_E_ATTRIBUTENOTFOUND to fail, only on the AMDh264Encoder?
Video encoder MFTs are supplied by hardware vendors. AMD does "AMDh264Encoder" for their hardware and introduces it with their video drivers, in particular.
For this reason implementations from different vendors have slight differences, AMD guys decided to not set this attributes on produced media samples.
You should skip this gracefully and treat the attribute as optional.

Way to encode 8-bit per sample audio into 2-bits per sample

If ADPCM can store 16-bit per sample audio into 4-bit per sample, is there a way to store 8-bit per sample audio into 2-bit per sample?
The G.726 standard supercedes G.721 and G.723 into a single standard, and adds 2-bit ADPCM to the 3- 4- and 5-bit modes from older standards. These are all very simple and fast to encode/decode. There appears to be no file format for the 2-bit version, but there is a widely re-used open source Sun library to encode/decode the formats; SpanDSP is just one library that includes the Sun code. These take 16-bit samples as input but it is trivial to convert 8-bit to 16-bit.
If you want to hear the 2-bit mode you may have to write your own converter that calls into the library.
There's also ADPCM specifications from long ago like "ADPCM Creative Technology" that support low bit rates and sample sizes.
See also the Sox documentation about various old compression schemes.
The number of bits per sample is not strictly related to the dynamic range or number of bits in the output. For example, the https://en.wikipedia.org/wiki/Direct_Stream_Digital format used in Super Audio CD achieves high quality with only 1 bit per sample, but at a 2.8224 MHz sample rate.
As far I know, ADPCM compression standard needs 4-bits per sample even if the original uncompressed audio has 8-bit audio samples. Hence there is NO way to encode audio using 2-bit per sample with ADPCM.
EDIT: I am specifically referring to G.726, which is one of the widely supported speech compression standard in WAV. Personally, I am not aware of freely available G.727 codec. FFMPEG is one of the libraries with extensive support for audio codecs. You can see the audio codec list supported by them at https://www.ffmpeg.org/general.html#Audio-Codecs. In the list, I do see support for other ADPCM formats, which may be worth exploring.

Streaming Video On Lossy Network

Currently I have a GStreamer stream being sent over a wireless network. I have a hardware encoder that coverts raw, uncompressed video into a MPEG2 Transport Stream with h.264 encoding. From there, I pass the data to a GStreamer pipeline that sends the stream out over RTP. Everything works and I'm seeing video, however I was wondering if there was a way to limit the effects of packet loss by tuning certain parameters on the encoder.
The two main parameters I'm looking at are the GOP Size and the I frame rate. Both are summarized in the documentation for the encoder (a Sensoray 2253) as follows:
V4L2_CID_MPEG_VIDEO_GOP_SIZE:
Integer range 0 to 30. The default setting of 0 means to use the codec default
GOP size. Capture only.
V4L2_CID_MPEG_VIDEO_H264_I_PERIOD:
Integer range 0 to 100. Only for H.264 encoding. Default setting of 0 will
encode first frame as IDR only, otherwise encode IDR at first frame of
every Nth GOP.
Basically, I'm trying to give the decoder as good of a chance as possible to create a smooth video playback, even given the fact that the network may drop packets. Will increasing the I frame rate do this? Namely, since the I frame doesn't have data relative to previous or future packets, will sending the "full" image help? What would be the "ideal" setting for the two above parameters given the fact that the data is being sent across a lossy network? Note that I can accept a slight (~10%) increase in bandwidth if it means the video is smoother than it is now.
I also understand that this is highly decoder dependent, so for the sake of argument let's say that my main decoder on the client side is VLC.
Thanks in advance for all the help.
Increasing the number of I-Frames will help the decoder recover quicker. You may also want to look at limiting the bandwidth of the stream since its going to be more likely to get the data through. You'll need to watch the data size though because your video quality can suffer greatly since I-Frames are considerably larger than P or B frames and the encoder will continue to target the specified bitrate.
If you had some control over playback (even locally capturing the stream and retransmitting to VLC) you could add FEC which would correct lost packets.

GDCL Mpeg-4 Multiplexor - Filter's can't agree on connection

I am attempting to publish some mp4 files with the GDCL Mpeg-4 Multiplexor but it's not accepting the input from my camera (QuickCam
Orbit/Sphere AF).
I see that it has set the the sub type is set to MEDIASUBTYPE_NULL.
I can't seem to figure out a set of filters that will adapt to
successfully link the pins. What do I need to do to successfully
adapt from my Capture pin to the multiplexor?
GDCL Mpeg-4 Multiplexor multiplexes compressed data and your camera captures raw (uncompressed) video. You need to insert a compressor in between in order to deliver MPEG-4 compatible video into the multiplexer. That is, MPEG-4 Part 2 or MPEG-4 Part 10 AKA H.264 video compressor. The multiplexer filter itslef does not do data compression/encoding.

How to play IMFMediaSample in media foundation?

I am able to extract samples out of a video using the readSample method. Now how can I play the data present in those samples? Or how to play IMFSample ?
Sample IMFSample is a block of data, such as video frame or a chunk of audio sequence. This is a tiny piece of data to be played alone. The API addresses more sophisticated playback scenarios, such as where playback is a session where one or more streams are streamed in sync.
Be sure to check Getting Started with MFPlay on MSDN to see how playback is set up with Media Foundation.

Resources