Winsock TCP application buffer - tcp

I have been working with the winsock api recently and I found that when I increase the sender's buffer size and pad the unused bytes with zero, the receiver also appends those zeroes to the end of the file. If the application buffer I use is more, say 3072 the file I receive at the other end seems to be corrupted and it seems obvious because of appending zeroes. But it worked fine with buffer of 2048, 1024 and in those cases too zeroes were appended before sending. how come one be corrupted and not the other?

when I increase the sender's buffer size and pad the unused bytes with zero, the receiver also appends those zeroes to the end of the file.
So the bytes weren't 'unused' at all. You sent them. You used the wrong count in the send function call.
If the application buffer I use is more, say 3072 the file I receive at the other end seems to be corrupted and it seems obvious because of appending zeroes.
Correct.
But it worked fine with buffer of 2048, 1024 and in those cases too zeroes were appended before sending. how come one be corrupted and not the other?
Because you have a bug in your code that was only exposed by the larger buffer size. Probably the file you sent was a multiple of 2048 bytes so your end-of-file bug wasn't exposed.
Without seeing your code it is impossibly to be sure, but most probably you are ignoring the count returned when you read from the file, and sending the entire buffer, instead of just sending 'count' bytes. This fails at end of file, but it can also fail any other time he read didn't fill the buffers which it isn't obliged to do.

Related

Extra byte in TCP- vs RTMP-level packet

I am trying to debug a RTMP client that fails to connect to some servers. I'm using Wireshark to capture the packets and compare them with a client that connects successfully (in this case, ffmpeg).
Looking at the captured packets for a successfull connection, I noticed that, when viewing at TCP level, there is an extra byte in the payload (see pics below). The extra byte has value 0xc3 and is placed at byte 0xc3 in the payload.
I Googled the best I could to find information about extra bytes in the TCP payload, but I didn't find anything like this. I tried to look in the TCP spec but no luck either. Where can I find information about this ?
TCP-level view
RTMP-level view
This happens because the message length is larger than the maximum chunk size (as per the RTMP spec, the default maximum chunk size is 128). So if no Set Chunk Size control message was sent before connect (in your case), and the connect message is greater than 128 bytes, the client will split the message into multiple chunks.
0xC3 is the header of the next chunk, looking at the bits of 0xC3 we would have 11 000011. The highest 2 bits specify the format (fmt = 3 in this case, meaning that this next chunk is a type 3 chunk as per the spec). The remaining 6 bits specify the chunk stream ID (in this case 3). So that extra byte you're seeing is the header of a new chunk. The client/server would then have to assemble these chunks to form the complete message.

Implications of Encoding/Decoding and Encrypting/Decrypting with variable buffers

I have a program that encrypts and decrypts data using a symmetric key.
During the encryption process I:
Encrypt the data
Base64 Encode it
During decryption:
Base64 decode it
Decrypt the data
It works fine. Now I'm trying to do the process on a streaming buffer. Let's assume the encryption is done with the above-mentioned program on the bulk of the data and only the decryption happens whilst streamed.
In this scenario does the buffer size/chunk-size with which I encoded the data matter when I decode it?
As in if I encoded the data in buffers of 3000 bytes should I also read up to 3000 bytes and decode? Or is it that this doesn't matter?
Also when decrypting, should I decrypt using the same buffer-size I used to pass the data into the Cipher?
I tried with varying values with the standalone program and it works fine. However, when I try to do it streamingly:
Get some bytes
Decode it
Decrypt it
Save to file
For the next set of bytes decrypted keep appending to the same file.
This way it seems to work for some sizes of data and not for others. And the final size of the data is like lexx by 2-4 bytes.
Am I missing some important principle here? Or is it that I might have made a mistake in the logic or a loop somewhere which causes some bytes being left out?
If it's the latter I will dig deeper to check it.
Thank You
Shabir
THanks for the hints above. I was able to solve the issue I had.
As mentioned in the comments above the buffer size itself did not matter much when decoding and decrypting the data as a stream.
However, the reason for the problem I had was because I was initializing the CipherOutputStream for every new chunk of incoming data.
Instead, when I initialized it once at the beginning only and maintained it for all the chunks of a single encrypted-data package, the flow worked as usual and without issue.
CipherOutputStream cipherOutputStream = new CipherOutputStream(byteArrOutputStream, cipher);
This was done once for all the chunks in the stream and it worked.
Thanks
Shabir

Handling messages over TCP

I'm trying to send and receive messages over TCP using a size of each message appended before the it starts.
Say, First three bytes will be the length and later will the message:
As a small example:
005Hello003Hey002Hi
I'll be using this method to do large messages, but because the buffer size will be a constant integer say, 200 Bytes. So, there is a chance that a complete message may not be received e.g. instead of 005Hello I get 005He nor a complete length may be received e.g. I get 2 bytes of length in message.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
My question is: Am I the only one having these difficulties to appending messages to each other, appending lengths etc.. to make them complete Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
What you're seeing is 100% normal TCP behavior. It is completely expected that you'll loop receiving bytes until you get a "message" (whatever that means in your context). It's part of the work of going from a low-level TCP byte stream to a higher-level concept like "message".
And "usr" is right above. There are higher level abstractions that you may have available. If they're appropriate, use them to avoid reinventing the wheel.
So, there is a chance that a complete message may not be received e.g.
instead of 005Hello I get 005He nor a complete length may be received
e.g. I get 2 bytes of length in message.
Yes. TCP gives you at least one byte per read, that's all.
Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
Try using higher-level primitives. For example, BinaryReader allows you to read exactly N bytes (it will internally loop). StreamReader lets you forget this peculiarity of TCP as well.
Even better is using even more higher-level abstractions such as HTTP (request/response pattern - very common), protobuf as a serialization format or web services which automate pretty much all transport layer concerns.
Don't do TCP if you can avoid it.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
Yep, this is how things are done at the socket level code. For each socket you would like to allocate a buffer of at least the same size as kernel socket receive buffer, so that you can read the entire kernel buffer in one read/recv/resvmsg call. Reading from the socket in a loop may starve other sockets in your application (this is why they changed epoll to be level-triggered by default, because the default edge-triggered forced application writers to read in a loop).
The first incomplete message is always kept in the beginning of the buffer, reading the socket continues at the next free byte in the buffer, so that it automatically appends to the incomplete message.
Once reading is done, normally a higher level callback is called with the pointers to all read data in the buffer. That callback should consume all complete messages in the buffer and return how many bytes it has consumed (may be 0 if there is only an incomplete message). The buffer management code should memmove the remaining unconsumed bytes (if any) to the beginning of the buffer. Alternatively, a ring-buffer can be used to avoid moving those unconsumed bytes, but in this case the higher level code should be able to cope with ring-buffer iterators, which it may be not ready to. Hence keeping the buffer linear may be the most convenient option.

How do you read without specifying the length of a byte slice beforehand, with the net.TCPConn in golang?

I was trying to read some messages from a tcp connection with a redis client (a terminal just running redis-cli). However, the Read command for the net package requires me to give in a slice as an argument. Whenever I give a slice with no length, the connection crashes and the go program halts. I am not sure what length my byte messages need going to be before hand. So unless I specify some slice that is ridiculously large, this connection will always close, though this seems wasteful. I was wondering, is it possible to keep a connection without having to know the length of the message before hand? I would love a solution to my specific problem, but I feel that this question is more general. Why do I need to know the length before hand? Can't the library just give me a slice of the correct size?
Or what other solution do people suggest?
Not knowing the message size is precisely the reason you must specify the Read size (this goes for any networking library, not just Go). TCP is a stream protocol. As far as the TCP protocol is concerned, the message continues until the connection is closed.
If you know you're going to read until EOF, use ioutil.ReadAll
Calling Read isn't guaranteed to get you everything you're expecting. It may return less, it may return more, depending on how much data you've received. Libraries that do IO typically read and write though a "buffer"; you would have your "read buffer", which is a pre-allocated slice of bytes (up to 32k is common), and you re-use that slice each time you want to read from the network. This is why IO functions return number of bytes, so you know how much of the buffer was filled by the last operation. If the buffer was filled, or you're still expecting more data, you just call Read again.
A bit late but...
One of the questions was how to determine the message size. The answer given by JimB was that TCP is a streaming protocol, so there is no real end.
I believe this answer is incorrect. TCP divides up a bitstream into sequential packets. Each packet has an IP header and a TCP header See Wikipedia and here. The IP header of each packet contains a field for the length of that packet. You would have to do some math to subtract out the TCP header length to arrive at the actual data length.
In addition, the maximum length of a message can be specified in the TCP header.
Thus you can provide a buffer of sufficient length for your read operation. However, you have to read the packet header information first. You probably should not accept a TCP connection if the max message size is longer than you are willing to accept.
Normally the sender would terminate the connection with a fin packet (see 1) not an EOF character.
EOF in the read operation will most likely indicate that a package was not fully transmitted within the allotted time.

what is the simplest compression techniques for a single network packet?

Need a simple compression method for a single network packet.
simple in the sense a technique which uses least computation.
Thanks!
lz4 compresses and decompresses very fast. zlib can compress better, but not quite as fast. The "least computation" would be to not compress at all.
The "PPP Predictor Compression Protocol" is one of the lowest-computation algorithms available for single-packet compression.
Source code is available in RFC1978.
The decompressor guesses what the next byte is in the current context.
If it guesses correctly, the next bit from the compressed text is "1";
If it guesses incorrectly, the next bit from the compressed text is "0" and the next byte from the compressed text is passed through literally (and the guess table for this context is updated so next time it guesses this literal byte).
The compressor attempts to compress the data field of the packet.
Even if half of the guesses are wrong, the compressed data field will end up smaller than the plaintext data, and the compressed flag for that packet is set to 1, and the compressed data is sent in the packet.
If too many more guesses are wrong, however, the "compressed" data ends up the same or even longer than the plaintext -- so the compressor instead sets the compressed flag for that packet to 0, and simply sends the raw plaintext in the packet.
There are two basic types of compression: loss-less and lossy. Loss-less means that if you have two algorithms c(msg) which is the compression algorithm and d(msg) which is the decompression algorithm then
msg == d(c(msg))
Of course then, this implies that a lossy compression would be:
msg != d(c(msg))
With some information, lossy is ok. This is typically how sound is handled. You can lose some bits without any noticeable loss. MP3 compression works this way, for example. Lossy algorithms are usually specific to the type of information that you are compressing.
So, it really depends upon the data that you are transmitting. I assume that you are speaking strictly about the payload and not any of the addressing fields and that you are interested in loss-less compression. The simplest would be run length encoding (RLE). In RLE, you basically find duplicate successive values and you replace the values with a flag followed by a count followed by the value to repeat. You should only do this if the length of the run is greater (or equal) to the length of the tuple
(sizeof(flag)+sizeof(count)+sizeof(value).
This can work really well if the data is less than 8 bits in which case you can use the 8th bit as the flag. So for instance if you had three 'A's "AAA" then in hex that would be 414141. You can encode that to C103. In this case, the max run would be 255 (FF) before you would have to start the compression sequence again if there were more than 255 characters of the same value. So in this case 3 bytes becomes 2 bytes. In a best case, 255 7-bit values of the same value would then be 2 characters instead of 255.
A simple state machine could be used to handle the RLE.
See http://en.wikipedia.org/wiki/Run-length_encoding

Resources