How to calculate sampleTime and sampleDuration with ogg file - decode

I have create ogg decoder in media foundation.
I have read some packets as a sample (compress data), now I need to know the sample' time and sample's duration.
Now I know the AvgBytesPerSec and SamplesPerSec and so on, but this parameters are use for uncompress data.
so how can get IMFSample's time and duration by use compress data ?

I'll assume you know a few things before answering:
How to read the Vorbis setup packets (1st and 3rd in the stream):
Sample Rate
Decoding parameters (specifically the block sizes and modes)
How to read Vorbis audio packet headers:
Validation bit
Mode selection
How to calculate the current timestamp for uncompressed PCM data based on sample number.
How to calculate the duration of a buffer of uncompressed PCM data based on sample count.
The Vorbis Specification should help with the first two. Since you are not decoding the audio, you can safely discard the time, floor, residue, and mapping configuration after you've read it in (technically you can discard the codebooks as well, but only after you've read the floor configuration in).
Granule Position and Sample Position are interchangable terms in Vorbis.
To calculate the number of samples in a packet, add the current packet's block size to the previous packet's block size, then divide by 4. There are two exceptions to this: The first audio packet is empty (0 samples), and the last audio packet's size is calculated by subtracting the second last page's granule position from the last page's granule position.
To calculate the last sample position of a packet, use the following logic:
The first audio packet in the stream is 0.
The last full audio packet in a page is the page's granule position (including the last page)
Packets in the middle of a page are calculated from the page's granule position. Start at the granule position of the last full audio packet in the page, then subtract the number of samples in each packet after the one you are calculating for.
If you need the initial position of a packet, use the granule position of the previous packet.
If you need an example of how this is all done, you might try reading through this one (public domain, C). If that doesn't help, I have a from-scratch implementation in C# that I can link to.

Related

get part of mp3 using http Range header (with seconds)

I have mp3 files, and I have the time in seconds.milliseconds of the part I want
how can I get that using Range header in HTTP call ?
this is the file
https://mirrors.quranicaudio.com/muqri/alghamidi/mp3/001.mp3
and the time I want is from the second: 3.500 until 6.526
You can't with the only interoperable range unit, which is "bytes".
That said, if the MP3 has a fixed encoding rate you might be able to compute an approximation of the byte offset.

Change time value of a packet in a pcap file manually

I have a question. I want to change the time value of a packet in a pcap. when we open a pcap in wireshark, we see a timestamp value in 2nd column after the packet serial numbers.
I want to change the time value of the packets. Though I am able to do the same but face a problem like below.
Let's say current time value showed for the packet in wireshark is 0.960727
I want to add 100000 to this time stamp.
Now the new stamp for the packet becomes 0.1060727 which ideally should be 1.060727.
If you open any pcap file in wireshark, you will never find a time value of more than 6 digits after decimal point.
But when I add this value I get 7 numbers after decimal point.
Could anhyone please let me know how can I make the time value to 1.060727 instead of 0.1060727 ?
Thank you for your suggestions here.
Regards,
Som
It's not completely clear to me what you're trying to do, but I'm going to take a guess that you are attempting to manually modify the timestamp of a single packet by editing the binary capture file. Assuming the file format is a .pcap file, then I suppose you're attempting to add 100000 microseconds to the timestamp of one particular packet?
Assuming this is the case, then you need to locate the packet header's ts_sec and ts_usec values and add 0x000186a0 microseconds to the current value of the ts_usec value, but if this value exceeds 0x000f423f (i.e., it's greater than or equal to 1 second), then you should add 0x00000001 to the ts_sec value and subtract 0x000f4240 from the newly computed ts_usec value.
One important thing to keep in mind is whether the .pcap file is written in big-endian or litte-endian format. This is determined by the so-called magic number (0xa1b2c3d4 implying big-endian and 0xd4c3d2a1 implying little-endian). Make sure you perform the addition/subtraction using the correct byte order of the ts_usec value and ts_sec value if appropriate, and make sure you write the bytes back to the ts_usec and ts_sec fields in the expected byte order of the file; otherwise your resulting timestamp will not be correct.
If this is not what you're attempting to do, then please clarify exactly what it is that you are attempting to do.

Does TCP scale to fast networks?

It seems the maximum TCP receive window size is 1GB (when scaling is used). So then the largest RTT that would still make it possible to fill a 100Gb pipe with one connection is 40ms (because 2 * 40E-3 * 100E9 / 8 = 1GB). That would limit that sort of communication speed to a distance IRO 10000 kilometres.
Another scaling problem seems to be that 32-bit sequence numbers don't offer protection against duplicated packets delayed by more than about 400ms (because they wrap around in that amount of time). They also limit the window size to 2GB (because they need to be split between the sender and receiver window).
Three questions:
I am aware of TCP timestamps that can help solve the problem of sequence numbers, but I would like to know if that is a feature that just happens to help but was really designed for some other purpose. Also, I don't understand what it is that timestamps achieve that could not be done simply by increasing the number of bits used for sequence numbers.
I don't understand why the maximum receive window is just 1GB as opposed to 2GB that would presumably be trivially possible with the current headers.
Finally, I would like to know if TCP already scales well enough to be used over the sort of links that are supposedly coming soon.
Many thanks.
The TCP features you're talking about were specified in RFC 1323 in the early 1990s. The limitations you're encountering are justified by discussion text in the RFC:
The sequence number appears in the middle of the TCP segment header and could not have been lengthened without an incompatible change.
Using timestamps allows for the protocol to simultaneously measure round-trip time and protect against wrapped sequence numbers. Making the sequence number bigger would not provide any information about round-trip time.
You need the timestamps in order to measure round-trip time accurately. Measuring round-trip time without timestamps is a sampling problem, and the sampling becomes unsolvable due to aliasing if you get more than 1 error per window.
A 1 GB receive window is the largest that can be kept in sync across the connection. The RFC explains it about as well as can be done:
TCP determines if a data segment is "old" or "new" by testing
whether its sequence number is within 2**31 bytes of the left edge
of the window, and if it is not, discarding the data as "old". To
insure that new data is never mistakenly considered old and vice-
versa, the left edge of the sender's window has to be at most
2**31 away from the right edge of the receiver's window.
Similarly with the sender's right edge and receiver's left edge.
Since the right and left edges of either the sender's or
receiver's window differ by the window size, and since the sender
and receiver windows can be out of phase by at most the window
size, the above constraints imply that 2 * the max window size
must be less than 2**31, or
max window < 2**30
As Jonathon mentioned earlier, these limitations are per-TCP connection. It's tough to think of a scenario where a single application could reach the limits of a single TCP connection, and tougher to think of one where the application couldn't open additional connection(s) if needed.

What is the difference between the delay and the jitter in the context of real time applications?

According to Wikipedia Jitter is the undesired deviation from true periodicity of an assumed periodic signal, according to a papper on QoS that I am reading jitter is reffered to as delay variation. Are there any definition of the jitter in the context of real time applications? Are there applications that are sensitive to jitter but not sensitive to delay? If for example a streaming application use some kind of buffer to store packets before show them to the user, is it possible that this application is not sensitive to delay but is sensitive to jitter?
Delay: Is the amount of time data(signal) takes to reach the destination. Now a higher delay generally means congestion of some sort of breaking of the communication link.
Jitter: Is the variation of delay time. This happens when a system is not in deterministic state eg. Video Streaming suffers from jitter a lot because the size of data transferred is quite large and hence no way of saying how long it might take to transfer.
If your application is sensitive to jitter it is definitely sensitive to delay.
In Real-time Protocol (RTP, RFC3550), a header contains a timestamp field. The value of it usually comes from a monotonically incremented counter and the frequency of the increment is the clock-rate. This clock-rate must be the same all over the participant wants something with the timestamp field. The counters have different base offsets, because the start time may different or they contains it because of security reason, etc... All in all we say the clocks are not syncronized.
To show it in an example consider if we refer to snd_timestamp and rcv_timestamp the most recent packet sender timestamp from the RTP header field and receiver timestamp generated by the receiver using the same clock-rate.
The wrong conclusion is that
delay_in_timestamp_unit = rcv_timestamp - snd_timestamp
If the receiver and sender clock-rate has different base offset (and they have), this not gives you the delay, also it doesn't consider the wrap around the 32bit unsigned integer.
But monitoring the time for delivering packets is somehow necessary if we want a proper playout adaption algorithm or if we want to detect and avoid congestions.
Also note that if we have syncronized clocks delay_in_timestamp_unit might be not punctually represent the pure network delay, because of components at the sender or at the receiver side retaining these packets after and/or before the timestamp added and/or exemined. So if you calculate a 2seconds delay between the participant, but you know your network delay is around 100ms, then your packets suffer additional delays at the sender or/and at the receiver side. But that additional delay is somehow (or at least you hope that it is) constant, so the only delay changes in time is - hopefully - the network delay. So you should not say that if packet delay > 500ms then we have a congestion, because you have no idea what is the actual network delay if you use only one packet sender and receiver timestamp information.
But the difference between the delays of two consecutive packets might gives you some information about weather something wrong in the network or not.
diff_delay = delay_t0 - delay_t1
if diff_delay equals to 0 the delay is the same, if it greater than 0 the newly arrived packets needed more time then the previous one, and if it smaller than 0 it needed less time.
And from that relative information based on two consecutive delays you could say something.
How you determine the difference between two delay if the clocks are not syncronized?
Consider you stored the last timestamps in rcv_timestamp_t1 and snd_timestamp_t1
diff_delay = (rcv_timestamp_t0 - snd_timestamp_t0) - (rcv_timestamp_t1 - snd_timestamp_t1)
but that would be problem without maintaining the base offsets of the sender and the receiver, so reordering it:
diff_delay = (rcv_timestamp_t0 - rcv_timestamp_t1) - (snd_timestamp_t0 - snd_timestamp_t1)
and here you can subtract rcv timestamps from each other and it eliminates the offset rcv and snd contain, and then you can extract the rcv_diff from snd_diff and it gives you the information about the difference of the delays of two consecutive packets in the unit of the clock-rate.
Now, according to RFC3550 jitter is "An estimate of the statistical variance of the RTP data packet interarrival time".
In order to finally get to the point your question is
"What is the difference between the delay and the jitter in the context of real time applications?"
Tiny note, but real-time applications usually refer to systems processing data in a range of nanoseconds, so I think you refer to end-to-end systems.
Also despite of several altered definition of jitter, it all uses the difference of the delays of arrived packets and thus provide you information about the relative changes of the network delay, meanwhile delay itself is an absolute value of the time of delivery.

Streaming Video On Lossy Network

Currently I have a GStreamer stream being sent over a wireless network. I have a hardware encoder that coverts raw, uncompressed video into a MPEG2 Transport Stream with h.264 encoding. From there, I pass the data to a GStreamer pipeline that sends the stream out over RTP. Everything works and I'm seeing video, however I was wondering if there was a way to limit the effects of packet loss by tuning certain parameters on the encoder.
The two main parameters I'm looking at are the GOP Size and the I frame rate. Both are summarized in the documentation for the encoder (a Sensoray 2253) as follows:
V4L2_CID_MPEG_VIDEO_GOP_SIZE:
Integer range 0 to 30. The default setting of 0 means to use the codec default
GOP size. Capture only.
V4L2_CID_MPEG_VIDEO_H264_I_PERIOD:
Integer range 0 to 100. Only for H.264 encoding. Default setting of 0 will
encode first frame as IDR only, otherwise encode IDR at first frame of
every Nth GOP.
Basically, I'm trying to give the decoder as good of a chance as possible to create a smooth video playback, even given the fact that the network may drop packets. Will increasing the I frame rate do this? Namely, since the I frame doesn't have data relative to previous or future packets, will sending the "full" image help? What would be the "ideal" setting for the two above parameters given the fact that the data is being sent across a lossy network? Note that I can accept a slight (~10%) increase in bandwidth if it means the video is smoother than it is now.
I also understand that this is highly decoder dependent, so for the sake of argument let's say that my main decoder on the client side is VLC.
Thanks in advance for all the help.
Increasing the number of I-Frames will help the decoder recover quicker. You may also want to look at limiting the bandwidth of the stream since its going to be more likely to get the data through. You'll need to watch the data size though because your video quality can suffer greatly since I-Frames are considerably larger than P or B frames and the encoder will continue to target the specified bitrate.
If you had some control over playback (even locally capturing the stream and retransmitting to VLC) you could add FEC which would correct lost packets.

Resources