Difference between Frames and Packets in FFmpeg

Difference between Frames and Packets in FFmpeg - codec

I am trying to decode an MPEG video file using LibAV. There are two terms which I am not able to grok properly, Frames and Packets.
As per my present understanding, Frames are uncompressed video frames and packets are the compressed frames.
Questions :
Packet has multiple frames, right?
Can a frame be only part of one Packet? I refer to the case where a half of the frame information is in packet1 and another half in packet2? Is it possible?
How will we know how many frames are in a packet in LibAV?

To answer your first and third questions:
according to the doc for the AVPacket class: "For video, it should typically contain one compressed frame. For audio it may contain several compressed frames.
the decode video example gives this code that reads all frames within a packet; you can also use it to count the frames:
static void decode(AVCodecContext *dec_ctx, AVFrame *frame, AVPacket *pkt,
const char *filename)
{
char buf[1024];
int ret;
ret = avcodec_send_packet(dec_ctx, pkt);
if (ret < 0) {
fprintf(stderr, "Error sending a packet for decoding\n");
exit(1);
}
while (ret >= 0) {
ret = avcodec_receive_frame(dec_ctx, frame);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
return;
else if (ret < 0) {
fprintf(stderr, "Error during decoding\n");
exit(1);
}
printf("saving frame %3d\n", dec_ctx->frame_number);
fflush(stdout);
/* the picture is allocated by the decoder. no need to
free it */
snprintf(buf, sizeof(buf), filename, dec_ctx->frame_number);
pgm_save(frame->data[0], frame->linesize[0],
frame->width, frame->height, buf);
}
}

Basically, frames are natural, while packets are artifical. 😉
Frames are substantial, packets are auxiliary – they help process a stream successively by smaller parts of acceptable sizes (instead of processig a stream as a whole). “Divide and conquer.”
Packet has multiple frames, right?
Packet may have multiple (encoded) frames, or it may have only one, even incomplete.
Can a frame be only part of one Packet?
No. It may be spread over several packets. See the Frame 1 in the picture.
I refer to the case where a half of the frame information is in packet1 and another half in packet2? Is it possible?
Yes. See the Frame 1.
How will we know how many frames are in a packet in LibAV?
Frames per packet may be different in different multimedia files, it depends on how a particular stream was encoded.
Even in the same stream there may be packets with different number of (encoded) frames – compare Packet 0 and Packet 1.
There is no info in a packet how many (encoded) frames it contains.
Frames in the same packet have generally different sizes (as in the picture above), so a packet is not an array of equally-sized elements (frames).

Simply put, a packet is a block of data.
This is generally determined by bandwidth. If the device has limited internet speeds, or a phone with a choppy signal, then packetsize will be smaller. If it's a desktop with dedicated service, packetsize could be quite a bit larger.
A frame could be thought of as one cell of animation, but typically these days, due to compression, it's not an actual keyframe image, but simply the changes since the last entire keyframe. They'll send one keyframe, an actual image once every few seconds or so, but every frame in-between is just a blending of data that specifies which pixels have changed since the last image, the delta.
So yea, let's say your packetsize is 1024 bytes, then your resolution will be limited to however many pixels that stream can carry the changes for. They might send one-frame-per-packet to keep it simple, but I don't think there's anything that absolutely guarantees that, as the datastream is reconstructed from those packets, often out of order, and then the frame deltas are generated once all those packets are pieced together.
Audio takes up much less space than video, so they might only need to send one audio packet for every 50 video packets.
I know these guys did a few clips on video-streams being recombined from packets, on their channel -- https://www.youtube.com/watch?v=DkIhI59ysXI

Related

what to do if i see Ethernet FCS in middle of data and in the end of packet in Ethernet II ,jumbo Frame AND DIX

as we know the FCS is an end part of the Ethernet packet frame structure, and by checking the FCS part we can find that an error happened in data transfer or not.
I have a question about FCS checking in ETHERNET II or DIX packet these frames don't have a packet length in their frame structure,
what should I do if I faced with a packet that the correct FCS exists in a middle of data. for example, the packet have 512 byte's length at the byte 128 I see the correct FCS till byte 128 but actually some bytes are remaining, the worst case is at the end of the correct length I see the FCS error, besides the packet length is not mentioned in the frame structure what should i do ?
I see this problem in Jumbo frame (802.3) Ethernet II and DIX

The FCS is always at the end of the frame. Since, as you've pointed out, there's no frame size indication for the most popular frame types, putting the FCS anywhere else in the frame destroys its structure.
Even if you're using jumbo frames the FCS is always at the very end.
Depending on where the frames originated, how it was generated or captured (as a bitstream?) it might be possible to have some garbage at the end. This you'll need to detect and remove yourself as the generator/capture process was faulty.
With an IP packet inside the frame and its Total Length field you'll have some indication on where the user data (SDU) stops and the FCS might start. Still, there's guessing involved.

Write data to SSD1306 via I2C

I'm using an SSD1306 OLED and have a question about it.
When writing data to its buffer via I2C, some libraries write 16 bytes every time.
For example:
void SSD1306::sendFramebuffer(const uint8_t *buffer) {
// Set Column Address (0x00 - 0x7F)
sendCommand(SSD1306_COLUMNADDR);
sendCommand(0x00);
sendCommand(0x7F);
// Set Page Address (0x00 - 0x07)
sendCommand(SSD1306_PAGEADDR);
sendCommand(0x00);
sendCommand(0x07);
for (uint16_t i = 0;i < SSD1306_BUFFERSIZE;) {
i2c.start();
i2c.write(0x40);
for (uint8_t j = 0;j < 16; ++j, ++i) {
i2c.write(buffer[i]);
}
i2c.stop();
}
}
Why don't they write 1024 bytes directly?

Most of the I2C libraries I've seen source code for, including that for the Aruduino, chunk the data in this fashion. While the I2C standard doesn't require this, as other poster mentioned, there may be buffer considerations. The .stop() command here might signal the device to process the 16 bytes just sent and prepare for more.
Invariably, you need to read the datasheet for your device and understand what it expects in order to display properly. They say "RTFM" in software, but hardware is at least as unforgiving. You must read and follow the datasheet when interfacing with external hardware devices.

Segmenting data into more frames helps when the receiving device has not enough buffer space or is simply not fast enough to digest the data at full rate. The START/STOP approach might give the receiving device a bit of time to process the received data. In your specific case, the 16-byte chunks seem to be exactly one line of the display.
Other reasons for segmenting transfers are multi-master operations, but that doesn't seem to be the case here.

Streaming Video On Lossy Network

Currently I have a GStreamer stream being sent over a wireless network. I have a hardware encoder that coverts raw, uncompressed video into a MPEG2 Transport Stream with h.264 encoding. From there, I pass the data to a GStreamer pipeline that sends the stream out over RTP. Everything works and I'm seeing video, however I was wondering if there was a way to limit the effects of packet loss by tuning certain parameters on the encoder.
The two main parameters I'm looking at are the GOP Size and the I frame rate. Both are summarized in the documentation for the encoder (a Sensoray 2253) as follows:
V4L2_CID_MPEG_VIDEO_GOP_SIZE:
Integer range 0 to 30. The default setting of 0 means to use the codec default
GOP size. Capture only.
V4L2_CID_MPEG_VIDEO_H264_I_PERIOD:
Integer range 0 to 100. Only for H.264 encoding. Default setting of 0 will
encode first frame as IDR only, otherwise encode IDR at first frame of
every Nth GOP.
Basically, I'm trying to give the decoder as good of a chance as possible to create a smooth video playback, even given the fact that the network may drop packets. Will increasing the I frame rate do this? Namely, since the I frame doesn't have data relative to previous or future packets, will sending the "full" image help? What would be the "ideal" setting for the two above parameters given the fact that the data is being sent across a lossy network? Note that I can accept a slight (~10%) increase in bandwidth if it means the video is smoother than it is now.
I also understand that this is highly decoder dependent, so for the sake of argument let's say that my main decoder on the client side is VLC.
Thanks in advance for all the help.

Increasing the number of I-Frames will help the decoder recover quicker. You may also want to look at limiting the bandwidth of the stream since its going to be more likely to get the data through. You'll need to watch the data size though because your video quality can suffer greatly since I-Frames are considerably larger than P or B frames and the encoder will continue to target the specified bitrate.
If you had some control over playback (even locally capturing the stream and retransmitting to VLC) you could add FEC which would correct lost packets.

Pcap Dropping Packets

// Open the ethernet adapter
handle = pcap_open_live("eth0", 65356, 1, 0, errbuf);
// Make sure it opens correctly
if(handle == NULL)
{
printf("Couldn't open device : %s\n", errbuf);
exit(1);
}
// Compile filter
if(pcap_compile(handle, &bpf, "udp", 0, PCAP_NETMASK_UNKNOWN))
{
printf("pcap_compile(): %s\n", pcap_geterr(handle));
exit(1);
}
// Set Filter
if(pcap_setfilter(handle, &bpf) < 0)
{
printf("pcap_setfilter(): %s\n", pcap_geterr(handle));
exit(1);
}
// Set signals
signal(SIGINT, bailout);
signal(SIGTERM, bailout);
signal(SIGQUIT, bailout);
// Setup callback to process the packet
pcap_loop(handle, -1, process_packet, NULL);
The process_packet function gets rid of header and does a bit of processing on the data. However when it takes too long, i think it is dropping packets.
How can i use pcap to listen for udp packets and be able to do some processing on the data without losing packets?

Well, you don't have infinite storage so, if you continuously run slower than the packets arrive, you will lose data at some point.
If course, if you have a decent amount of storage and, on average, you don't run behind (for example, you may run slow during bursts buth there are quiet times where you can catch up), that would alleviate the problem.
Some network sniffers do this, simply writing the raw data to a file for later analysis.
It's a trick you too can use though not necessarily with a file. It's possible to use a massive in-memory structure like a circular buffer where one thread (the capture thread) writes raw data and another thread (analysis) reads and interprets. And, because each thread only handles one end of the buffer, you can even architect it without locks (or with very short locks).
That also makes it easy to detect if you've run out of buffer and raise an error of some sort rather than just losing data at your application level.
Of course, this all hinges on your "simple and quick as possible" capture thread being able to keep up with the traffic.
Clarifying what I mean, modify your process_packet function so that it does nothing but write the raw packet to a massive circular buffer (detecting overflow and acting accordingly). That should make it as fast as possible, avoiding pcap itself dropping packets.
Then, have an analysis thread that takes stuff off the queue and does the work formerly done in process_packet (the "gets rid of header and does a bit of processing on the data" bit).
Another possible solution is to bump up the pcap internal buffer size. As per the man page:
Packets that arrive for a capture are stored in a buffer, so that they do not have to be read by the application as soon as they arrive.
On some platforms, the buffer's size can be set; a size that's too small could mean that, if too many packets are being captured and the snapshot length doesn't limit the amount of data that's buffered, packets could be dropped if the buffer fills up before the application can read packets from it, while a size that's too large could use more non-pageable operating system memory than is necessary to prevent packets from being dropped.
The buffer size is set with pcap_set_buffer_size().
The only other possibility that springs to mind is to ensure that the processing you do on each packet is as optimised as it can be.
The splitting of processing into collection and analysis should alleviate a problem of not keeping up but it still relies on quiet time to catch up. If your network traffic is consistently more than your analysis can handle, all you're doing is delaying the problem. Optimising the analysis may be the only way to guarantee you'll never lose data.

TCP connection: received array's size is unexpectedly huge

I'm sending multiple byte arrays with size <500 over TCP. However, on the client side, I sometimes receive arrays with size >2000. Why is this happening?
Below is my code for TCPClient:
byte[] receiveData = new byte[64000];
DataInputStream input = new DataInputStream(socket.getInputStream());
int size = input.read(receiveData);
byte[] receiveDataNew = new byte[size];
System.arraycopy(receiveData,0,receiveDataNew,0,size);
System.out.println("length of receiveData is " + size);
GenericRecord result = AvroByteReader.readAvroBytes(receiveDataNew);
return result;
Any help is appreciated! Thank you!

TCP is not a record oriented protocol. It is stream oriented. That means that packets can be recombined before receiving on the other side. Likely, this means you have received 3 or 4 sets of data arrays, and are processing them all at once.
If you wish to use it as a record-oriented or framed protocol, you'll have to add that framing yourself. You could prepend a size to the data you send, and only read that much data at a time.
If your records are always the same length, you can do a fixed-length read. I don't know Java's libraries, but you should be able to provide a length to read().

TCP does not have any concept of message boundaries, since it is a stream protocol. It can (and will) merge multiple sends into a single receive, and even the opposite - split a send into multiple smaller receives.
Your application must be prepared for this.

Basically what you are seeing is multiple arrays combined back to back being received at the same time. Send the array length before the array and read that many bytes.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex