IOS13 AVAudioRecorder is not respecting to set AVLinearPCMBitDepthKey(16) - ios13

I am using AVAudioRecorder to record voice, and do setting as below:
- (NSMutableDictionary *)recordSetting {
NSMutableDictionary *recSetting = [[NSMutableDictionary alloc] init];
// General Audio Format Settings
recSetting[AVFormatIDKey] = #(kAudioFormatLinearPCM);
recSetting[AVSampleRateKey] = #16000;
recSetting[AVNumberOfChannelsKey] = #1;
// Linear PCM Format Settings
recSetting[AVLinearPCMBitDepthKey] = #16;
recSetting[AVLinearPCMIsBigEndianKey] = #YES;
recSetting[AVLinearPCMIsFloatKey] = #YES;
// Encoder Settings
recSetting[AVEncoderAudioQualityKey] = #(AVAudioQualityMedium);
recSetting[AVEncoderBitRateKey] = #128000;
return recSetting;
}
It works fine in iOS12, I can record 16bit wav file. After my iOS12 updated to iOS13, I can only record the 32bit file even if I change nothing, still set AVLinearPCMBitDepthKey at 16. So it seems AVLinearPCMBitDepthKey is not effected.
Could any expert support any solution or workaround? I still need 16 bit wav file after recording.
I will be very appreciate any help.

Quick answer: You need to set either AVLinearPCMBitDepthKey or AVEncoderBitRateKey, but not both.
Explanation: Bit Depth is only meaningful in reference to a PCM digital signal. Non-PCM formats, such as lossy compression formats, do not have associated bit depths, and use Bit Rate as the number of bits that are conveyed or processed per unit of time, which is commonly used to describe audio stream quality.
According to Apple's Linear PCM Format Settings, if you want to record uncompressed PCM in a WAV format, you need to set only BitDepth through AVLinearPCMBitDepthKey and not BitRate through AVEncoderBitRateKey. However, if you wish to use an audio encoder, such as kAudioFormatMPEG4AAC then you can set AVEncoderBitRateKey as part of the Encoder Settings.
Basically, the quality of a PCM stream is represented by two attributes: Sample Rate and Bit Depth.
The connection between Bit Rate and Bit Depth in PCM digital signal, is given by the following formula:
Sample Rate is the number of samples per time unit, usually per second. A sample is a measurement of signal amplitude and it contains the information of the amplitude value of the signal waveform over a period of time (usually a second), and usually measured in sample per second units.
Bit Depth is the number of bits of information in each sample, in bits per sample units.
Bit Rate is the number of bits (data) per unit of time, usually per second. It refers to the audio quality of the stream. It is measured in bits per second or Kilobitspersec (kbps), but again, this is less relevant for a PCM stream.
So, as in your example, if you record 16000 samples/second in 16-bit depth and 1 channel, then minimal Bit Rate calculation would be 256,000 bits/second to accommodate the flow of information in a PCM stream, and however not 128000 as in your code.
Therefore, if you want to record a raw PCM audio as in your code example, you need to set only the BitDepth through AVLinearPCMBitDepthKey, without setting BitRate through AVEncoderBitRateKey.
More on this topic here: Understanding Sample Rate, Bit Depth, and Bit Rate, Bit Rate, Audio But Depth and Sampling Rate

You need set AVLinearPCMIsFloatKey to NO.

Related

whats the differance between G.729 codec variations

what is the difference between
G.729
G.729A
G.729AB
and if I have codecs set up for G729 in asterisk does this mean that G729A and G.729AB will work?
thanks
Asterisk support g729a only. So you have no other options.
G729b is not compatible with g729a, g729ab mean both variant supported by switch.
G.729a is a compatible extension of G.729 that needs less CPU because it has lower speech quality. It is compatible with G729.
Some of its features are:
Sampling frequency 8 kHz/16-bit (80 samples for 10 ms frames)
Fixed bit rate (8 kbit/s 10 ms frames)
Fixed frame size (10 bytes for 10 ms frame)
Algorithmic delay is 15 ms per frame, with 5 ms look-ahead delay
G.729b:
Not compatible with G.729 or G.729a
Has a silence compression method that enables the voice activity detection module (VAD). It is used to detect voice activity in the signal.
It includes a discontinuous transmission (DTX) module which decides on updating the background noise parameters for non speech (noisy frames).
Uses 2-byte Silence Insertion Descriptor (SID) frames transmitted to initiate comfort noise generation (CNG). If transmission is stopped, and the link goes quiet because of no speech, the receiving side might assume that the link has been cut. By inserting comfort noise, analog hiss is simulated digitally during silence to assure the receiver that the link is active and operational.
Read more at its wikipedia article

Way to encode 8-bit per sample audio into 2-bits per sample

If ADPCM can store 16-bit per sample audio into 4-bit per sample, is there a way to store 8-bit per sample audio into 2-bit per sample?
The G.726 standard supercedes G.721 and G.723 into a single standard, and adds 2-bit ADPCM to the 3- 4- and 5-bit modes from older standards. These are all very simple and fast to encode/decode. There appears to be no file format for the 2-bit version, but there is a widely re-used open source Sun library to encode/decode the formats; SpanDSP is just one library that includes the Sun code. These take 16-bit samples as input but it is trivial to convert 8-bit to 16-bit.
If you want to hear the 2-bit mode you may have to write your own converter that calls into the library.
There's also ADPCM specifications from long ago like "ADPCM Creative Technology" that support low bit rates and sample sizes.
See also the Sox documentation about various old compression schemes.
The number of bits per sample is not strictly related to the dynamic range or number of bits in the output. For example, the https://en.wikipedia.org/wiki/Direct_Stream_Digital format used in Super Audio CD achieves high quality with only 1 bit per sample, but at a 2.8224 MHz sample rate.
As far I know, ADPCM compression standard needs 4-bits per sample even if the original uncompressed audio has 8-bit audio samples. Hence there is NO way to encode audio using 2-bit per sample with ADPCM.
EDIT: I am specifically referring to G.726, which is one of the widely supported speech compression standard in WAV. Personally, I am not aware of freely available G.727 codec. FFMPEG is one of the libraries with extensive support for audio codecs. You can see the audio codec list supported by them at https://www.ffmpeg.org/general.html#Audio-Codecs. In the list, I do see support for other ADPCM formats, which may be worth exploring.

Streaming Video On Lossy Network

Currently I have a GStreamer stream being sent over a wireless network. I have a hardware encoder that coverts raw, uncompressed video into a MPEG2 Transport Stream with h.264 encoding. From there, I pass the data to a GStreamer pipeline that sends the stream out over RTP. Everything works and I'm seeing video, however I was wondering if there was a way to limit the effects of packet loss by tuning certain parameters on the encoder.
The two main parameters I'm looking at are the GOP Size and the I frame rate. Both are summarized in the documentation for the encoder (a Sensoray 2253) as follows:
V4L2_CID_MPEG_VIDEO_GOP_SIZE:
Integer range 0 to 30. The default setting of 0 means to use the codec default
GOP size. Capture only.
V4L2_CID_MPEG_VIDEO_H264_I_PERIOD:
Integer range 0 to 100. Only for H.264 encoding. Default setting of 0 will
encode first frame as IDR only, otherwise encode IDR at first frame of
every Nth GOP.
Basically, I'm trying to give the decoder as good of a chance as possible to create a smooth video playback, even given the fact that the network may drop packets. Will increasing the I frame rate do this? Namely, since the I frame doesn't have data relative to previous or future packets, will sending the "full" image help? What would be the "ideal" setting for the two above parameters given the fact that the data is being sent across a lossy network? Note that I can accept a slight (~10%) increase in bandwidth if it means the video is smoother than it is now.
I also understand that this is highly decoder dependent, so for the sake of argument let's say that my main decoder on the client side is VLC.
Thanks in advance for all the help.
Increasing the number of I-Frames will help the decoder recover quicker. You may also want to look at limiting the bandwidth of the stream since its going to be more likely to get the data through. You'll need to watch the data size though because your video quality can suffer greatly since I-Frames are considerably larger than P or B frames and the encoder will continue to target the specified bitrate.
If you had some control over playback (even locally capturing the stream and retransmitting to VLC) you could add FEC which would correct lost packets.

Which codec is best and what should be their parameters value?

I'm a beginner in the field of audio codec and finding it hard to understand; how does sampling rate, bit rate and any other parameter affect the encoding/decoding[Audio format], the quality of audio and file size.
I read constant bit rate is good than variable bit rate, but how to know what amount of bit rate would be perfect to encode the file in as small size as possible without compromising the quality. I'm specifically focusing on audio codec for present.
I had heard about the OPUS, SILK, G.722, SPEEX, but don't know which one should I use to get the better quality and less file size. Also, what parameters should I set for this codecs so they can work effectively for me.
Can anyone guide on this?
Thanks in advance
If you think of the original analog music as a sound wave then converting it to digital means approximating that wave as digital bits. The sampling rate is how many points on that wave you are taking per unit time so the higher the sampling rate the closer you are to the original sound. Lower sampling rate means higher compression but lesser audio quality.
Similarly the bit rate is effectively 'how much' information you're encoding at each point so again, lower bit rate means higher compression but lower audio quality.
Compression algorithms generally use pyschoacoustics to try to determine what information can be lost with the least amount of audible difference. In some sections of a track this may be more or less than in others so using a variable bit rate enables you to achieve higher compression without a 'big' audible drop in quality.
It's well explained here: Link
I don't know the details of those codecs but generally what you should use and what parameters you should pass depends on what you're trying to achieve and for what purpose. For portable use where audio quality might not be paramount you might want to pass lower values to achieve smaller file sizes - for audiophile speakers you probably want to pass the maximum.

How to calculate sampleTime and sampleDuration with ogg file

I have create ogg decoder in media foundation.
I have read some packets as a sample (compress data), now I need to know the sample' time and sample's duration.
Now I know the AvgBytesPerSec and SamplesPerSec and so on, but this parameters are use for uncompress data.
so how can get IMFSample's time and duration by use compress data ?
I'll assume you know a few things before answering:
How to read the Vorbis setup packets (1st and 3rd in the stream):
Sample Rate
Decoding parameters (specifically the block sizes and modes)
How to read Vorbis audio packet headers:
Validation bit
Mode selection
How to calculate the current timestamp for uncompressed PCM data based on sample number.
How to calculate the duration of a buffer of uncompressed PCM data based on sample count.
The Vorbis Specification should help with the first two. Since you are not decoding the audio, you can safely discard the time, floor, residue, and mapping configuration after you've read it in (technically you can discard the codebooks as well, but only after you've read the floor configuration in).
Granule Position and Sample Position are interchangable terms in Vorbis.
To calculate the number of samples in a packet, add the current packet's block size to the previous packet's block size, then divide by 4. There are two exceptions to this: The first audio packet is empty (0 samples), and the last audio packet's size is calculated by subtracting the second last page's granule position from the last page's granule position.
To calculate the last sample position of a packet, use the following logic:
The first audio packet in the stream is 0.
The last full audio packet in a page is the page's granule position (including the last page)
Packets in the middle of a page are calculated from the page's granule position. Start at the granule position of the last full audio packet in the page, then subtract the number of samples in each packet after the one you are calculating for.
If you need the initial position of a packet, use the granule position of the previous packet.
If you need an example of how this is all done, you might try reading through this one (public domain, C). If that doesn't help, I have a from-scratch implementation in C# that I can link to.

Resources