Streaming Video On Lossy Network - networking

Currently I have a GStreamer stream being sent over a wireless network. I have a hardware encoder that coverts raw, uncompressed video into a MPEG2 Transport Stream with h.264 encoding. From there, I pass the data to a GStreamer pipeline that sends the stream out over RTP. Everything works and I'm seeing video, however I was wondering if there was a way to limit the effects of packet loss by tuning certain parameters on the encoder.
The two main parameters I'm looking at are the GOP Size and the I frame rate. Both are summarized in the documentation for the encoder (a Sensoray 2253) as follows:
V4L2_CID_MPEG_VIDEO_GOP_SIZE:
Integer range 0 to 30. The default setting of 0 means to use the codec default
GOP size. Capture only.
V4L2_CID_MPEG_VIDEO_H264_I_PERIOD:
Integer range 0 to 100. Only for H.264 encoding. Default setting of 0 will
encode first frame as IDR only, otherwise encode IDR at first frame of
every Nth GOP.
Basically, I'm trying to give the decoder as good of a chance as possible to create a smooth video playback, even given the fact that the network may drop packets. Will increasing the I frame rate do this? Namely, since the I frame doesn't have data relative to previous or future packets, will sending the "full" image help? What would be the "ideal" setting for the two above parameters given the fact that the data is being sent across a lossy network? Note that I can accept a slight (~10%) increase in bandwidth if it means the video is smoother than it is now.
I also understand that this is highly decoder dependent, so for the sake of argument let's say that my main decoder on the client side is VLC.
Thanks in advance for all the help.

Increasing the number of I-Frames will help the decoder recover quicker. You may also want to look at limiting the bandwidth of the stream since its going to be more likely to get the data through. You'll need to watch the data size though because your video quality can suffer greatly since I-Frames are considerably larger than P or B frames and the encoder will continue to target the specified bitrate.
If you had some control over playback (even locally capturing the stream and retransmitting to VLC) you could add FEC which would correct lost packets.

Related

Muxing non-synchronised streams to Haali

I have 2 input streams of data that are being passed to a Haali Muxer (mp4 format).
Currently I stream these to Haali directly in a DirectShow graph without a clock. I wondered if I should be trying to write these to the muxer synchronised, or whether it happily accepts a stream of audio data that stops before the video data stream stops. (I have issues with the output file not playing audio after seeking, and I'm not sure why this could occur)
I can't find much in the way of documentation for muxing with the Haali muxer, does anyone know the best place to look for info on this filter?
To have the streams multiplexed into single MP4 file you need single instance of multiplexer (Haali, GDCL, commercial, wrapper over mp4v2 library, over Media Foundation sink etc) with two (or more) input pins on it connected to respective sources, which in turn are going to be written as tracks.
Filter graph clock does not matter. Clock is for presentation, and file writers accept incoming data and write it as soon as possible anyway. It is more accurate to remove the clock, as you seem to already be doing, but having standard clock is not going to be different.
Data is synchronized using time stamps on individual media samples, parts of media streams. Multiplexer builds internal queues for every stream and then consumes data from the streams to build single file, in a sort of way that original stream data is interleaved. If one stream supplies too much data, that is, if data is available too early while another stream supplies data slowly, multiplexer blocks further data reception on this particular stream by not returning from respective processing call (IPin::Receive) expecting that during this wait the slow stream provides additional input. Eventually, what multiplexer looks at when matching data from different streams is data time stamps.
To obtain synchronized data in resulting MP4 file you, thus, need to make sure the payload data is properly time stamped. Multiplexer will take care of the rest.
This also includes that the time stamps should be monotonously increasing within a stream, and key frames/splice points are respectively indicated. Otherwise some multiplexers might issue a failure immediately, other would produce the output file but it might have playback issues (esp. seeking).

TCP buffering on Linux

I have a peripheral over USB that is sending data samples at a rate of 183 MBit/s. I would like to send this data over ethernet, which is limited to < 100 Mbit/s. Is it possible to send this data without overflow (i.e missing data) by increasing the TCP socket buffer?
It also depends on the receiver window size. Even if there is 100mbits, sender will push data depending on the window size available on the receiver. TCP window size without scaling enabled can go only upto 64kb. In your case, this size is not sufficient as it needs at least (100-183Mbits)10MB buffer. In Windows 7 & newer Linux OS, TCP by default enables window scaling which can extend the size upto 1GB. After enabling the TCP window scaleing option, you can increase the socket buffer to a bigger size say 50MB which should provide the required buffering.
The short answer is, it depends.
Increasing buffers (at transmitter) can help if the data is bursty. If the average rate is <100MBit (actually less, you need to allow for network contention and overhead), then buffering can help. You can do this by increasing the size of the buffers internally to the TCP stack, or by buffering internally to your application.
If the data isn't bursty, or the average is still too high, you might need to compress the data before transmission. Dependant on the nature of the data, you may be able to achieve significant compression.

Why does GetDeliveryBuffer blocked with an INTERLEAVE_CAPTURE mode AVI Mux?

I'm trying to use a customized filter to receive video and audio data from a RTSP stream, and deliver samples downstream the graph.
It seems like that this filter was modified from the SDK source.cpp sample (CSource), and implemented two output pins for audio and video.
When the filter is directly connected to an avi mux filter with INTERLEAVE_NONE mode, it works fine.
However, when the interleave mode of avi mux is set to INTERLEAVE_CAPTURE,
the video output pin will hang on the GetDeliveryBuffer method (in DoBufferProcessingLoop) of this filter after several samples have sent,
while the audio output pin still works well.
Moreover, when I inserted an infinite pin tee filter into one of the paths between the avi mux and this source filter,
the graph arbitrarily turned into stop state after some samples had been sent (one to three samples or the kind).
And when I put a filter that is just an empty trans-in-place filter which does nothing after the infinite tee,
the graph went back to the first case: never turns to stop state, but hang on the GetDeliveryBuffer.
(Here is an image that shows the connections I've mentioned like)
So here are my questions:
1: What could be the reasons that the video output pin hanged on the GetDeliveryBuffer ?
In my guess it looks like the avi mux caught these sample buffers and did not release them until they are enough for interleaving,
but even when I set the amount of video buffers to 30 in DecideBufferSize it will still hang. If the reason is indeed like that, so how do I decide the buffer size of the pin for a downstream avi muxer ?
Likely a creation of more than 50 buffers of a video pin is not guaranteed to work because the memory size cannot be promised. :(
2: Why does the graph goes to stop state when the infinite pin tee is inserted ? And why could a no-operation filter overcomes it ?
Any answer or suggestion is appreciated. Or hope someone just give me some directions. Thanks.
Blocked GetDeliveryBuffer means the allocator, you are requesting a buffer from, does not [yet] have anything for you. All media samples are outstanding and are not yet returned back to the allocator.
An obvious work around is to request more buffers at pin connection and memory allocator negotiation stage. This however just postpones the issue, which can very much similarly appear later for the same reason.
A typical issue with a topology in question is related to threading. Multiplexer filter which has two inputs will have to match input streams to produce a joint file. Quite so often on runtime it will be holding media samples on one leg while expecting more media samples to come on the other leg on another thread. It is assumes that upstream branches providing media samples are running independently so that a lock on one leg is not locking the other. This is why multiplexer can freely both block IMemInputPin::Receive methods and/old hold media samples inside. In the topology above it is not clear how exactly source filter is doing threading. The fact that it has two pins make me assume it might have threading issues and it is not taking into account that there might be a lock downstream on multiplexer.
Supposedly source filter is yours and you have source code for it. You are interested in making sure audio pin is sending media samples on a separate thread, such as through asynchronous queue.

Is there a good way to frame a protocol so data corruption can be detected in every case?

Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.

64/66b encoding

There are a few things I don't understand about 64/66bit encoding, and failed to find the answers to on the web. Any help/links would be greatly appreciated:
i) how is the start of a frame recognised? I don't think it can be by the initial 10/01 bits called the preamble on wikipedia because you cannot tell them apart (if an idle link is 0, then 0000 10 and 000 01 0 look rather similar). I expect the end of a frame is indicated by a control word, with the rest of the bits perhaps used for the CRC?
ii) how do the scramblers synchronise, and how do they avoid scrambling the same packet the same way? Or to put this another way, why is not possible for a malicious user to induce substantial packet loss by carefully choosing a bad message?
iii) this might have been answered in ii), but if a packet is sent to a switch, and then onto another host, is it scrambled the same way both times?
Once again, many thanks in advance
Layers
First of all the OSI model needs to be clear.
The ethernet frame is a data link layer, while the 64b/66b encoding is part of the physical layer (More precisely the PCS of the physical layer)
The physical layer doesn't know anything about the start of a frame. It sees only data. (The start of an ethernet frame are data bytes which contain the preamble.)
64b/66b encoding
Now let's assume that the link is up and running.
In this case the idle link is not full of '0'-s. (In that case the link wouldn't be self-synchronous) Idle messages (idle characters and/or synchronization blocks ie control information) are sent over the idle link. (The control information encoded with 0b10 preamble) (This is why the emitted spectrum and power dissipation don't depend on if the link is in idle state or not)
So a start of a new frame acts like following:
The link sends idle information. (with 0b10 preamble)
Upper layer (data link layer) sends the frame (in 64bit chunks of data) to physical layer.
The physical layer sends the data (with 0b01 preamble) over the link.
(Note that physical layer frequently inserts control (sync) symbols into the raw frame even during a data burst)
Synchronization
Before data transmission 64b/66b encoded lane must be initialized. This initialization includes the lane initialization which the block synchronization. Xilinx's Aurora's specification (P34) is an example of link initialization.
Briefly receiver tries to match the sync character in different bit-position, and when it match multiple times it reports link-up.
Note, that the 64b/66b encoding uses self-synchronous scrambler. This is why the scrambler (itself) doesn't need to know anything about where we are in the data stream. If you run a self-synchronous (de-)scrambler long enough, it produces the decoded bit stream.
Maliciousness
Note, that 64b/66b encoding is not an encryption. This scrambling won't protect you from eavesdropping/tamper. (Encryption should placed at higher level of the OSI model)
Same packet multiple times
Because the scrambler is in different state/seed when you sending the same packet second time, the two encoded packet will differ. (Theoretically we can creates packets, which sets back the shift register of the scramble, but we need to consider the control symbols, so practically this is impossible.)

Resources