Endianess of network data transmissions over TCP/IP - networking

Here is a question I've been trying to solve since quite some time ago. This does not attain a particular languaje, although it's not really beneficial for some that have a VM that specifies endianess. I know, like the 99.9999% of people that use sockets to send data using TCP/IP, that the protocol specifies a endianess for the transmission elements, like destination address, port and such. The thing I don't know is if it requires the payload to be in a specific format to prevent incompatibilities.
For example, let's say I develop a protocol that is not a presentation layer, and that I, due to the inmense dominance that little endian devices have nowadays, decide to make it little endian (for example the positions of the players and such are transmitted in little endian order). For example a network module for a game engine, where latencies matter and byte conversion would cost a noticeable amount of time. Of course the address, port and all of that data that is protocol related would be specified in big endian as is mandatory, I'm talking about the payload, and only that.
Would that protocol work out of the box (translating the contents as necessary, of course, once the the transmission is received) on a big endian machine? Or would the checksums of the IP protocol or something of the kind get computed wrong since the data is in a different order, and the programmer does not have control of them if raw_sockets aren't used?
Since the whole explanation can be misleading, feel free to ask for clarifications.
Thank you very much.

The thing I don't know is if it requires the payload to be in a specific format to prevent incompatibilities.
It doesn't, and it doesn't have a way of telling. To TCP it's just a byte-stream. It is up to the application protocol to decide endian-ness, and it is up to the implementors at each end to implement it correctly. There is a convention to use big-endian, but there's no compulsion.

Application-layer protocols dictate their own endianness. However, by convention, multi-byte integer values should be sent in network-byte order (big endian) for consistency across platforms, such as by using platform-provided hton...() (host-to-network) and ntoh...() (network-to-host) function implementations in your code. On little-endian systems, they will do the necessary byte swapping. On big endian systems, they are no-ops. The functions provide an abtraction layer so code does not have to worry about that.

Related

Sending raw bytes over network. Bad?

This post to the question "What is base 64 encoding used for?" says:
When you have some binary data that you want to ship across a network, you generally don't do it by just streaming the bits and bytes over the wire in a raw format. Why? because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).
I've used sockets in Java a hundert times to send binary data over networks. And as far as I know it very common to send binary data over networks especially if you have big data. I don't see why some devices could interpret binary data wrong, since it contains TCP header etc.
SOAP MTOM also sends binary data over networks.
Am I misunderstanding something? I'm irritated, because this post has many upvotes and is accepted.
The answer you link to isn't incorrect, it just fails to explicitly mention some examples. The answer is in the quote as well:
because some media are made for streaming text
Sockets deal in bytes, they don't care what they transport. It is the higher-level protocols, or the message formats they transport, that do.
It's when this binary data is wrapped in envelopes of such protocols or formats that they can wreak havoc. A less than (<) character in image bytes is perfectly valid, but when used in an XML message, it will break the XML. Other characters, like control characters, can have an influence on how further data is to be interpreted by a protocol handler.
So base64 is used to wrap binary data in a safe-for-transport way where that would otherwise not be safe.

Minimal size of a network buffer

I'm a student and I'm taking right now an Operating Systems course. I've stumbled upon a strange answer for a question while learning for exam and I couldn't find an explanation for it.
Question: Suppose we have an Operating System which runs on low physical memory. Thus the designers decided to make the buffer (that handles all the work that is connected to the network) as small as possible. What can be the smallest size of the buffer?
Answer: Can't be implemented with one byte only, but can be implemented with 2 bytes size.
My thoughts: It has 4 answers, one of them is "3 bytes or more" so I thought that it's the right answer because in order to establish a connection you need at list to be able to send a header of tcp/udp or similar package that contains all the connection info, so I have no idea why it's the right answer (according to the reference). Maybe some degenerate case?
Thanks for help.
The buffer has to be at least as large as the packet size on the network. That will depend upon the type of hardware interface. I know of no network system, even going back to the days of dialup, that used anything close to 2 bytes.
Maybe, in theory, you could have a network system that used 2-byte packets. The same logic would allow you to use 1-byte packets (transmitting fractions of a byte in a packets).
Sometimes I wonder about the questions CS professors come up with. I guess that's why:
Those who can do, do;
Those who can't do, teach;
Those who can't do and can't teach, teach PE.

Endianness and OpenCL Transfers

In OpenCL, transfer from CPU client side to GPU server side is accomplished through clEnqueueReadBuffer(...)/clEnqueueWriteBuffer(...). However, the documentation does not specify whether any endian-related conversions take place in the underlying driver.
I'm developing on x86-64, and a NVIDIA card--both little endian, so the potential problem doesn't arise for me.
Does conversion happen, or do I need to do it myself?
The transfer do not do any conversions. The runtime does not know the type of your data.
You can probably expect conversions only on kernel arguments.
You can query the device endianness (using clGetDeviceInfo and check CL_DEVICE_ENDIAN_LITTLE ), but I am not aware of a way that allows transparent conversions.
This is the point, where INMHO the specification is not satisfactory.
At first it is clear about pointers, that is, data that a pointer is referencing can be in host or device byte order, and one can declare this by a pointer attribute, and the default byte order is that of the device.
So according to this, developers have to take care of the endianness that they feed as input to a kernel.
But than in "Appendix B - Portability" it's said that implementations may or may not automatically convert endianness of kernel arguments and that developers should consult the documentation of the vendors in case host and device byte order is different.
Sorry for me being that direct but what shit is that. I mean the intention of the OpenXX specifications is that they should make it possible to write cross platform code. But when there are that significant asspects that can vary from implementation to implementation, this is quite not possible.
The next point is, what does this all mean for OpenCL/OpenGL interoperation.
In OpenGL data for buffer objects like VBO's have to be in host byte order. So what in case such a buffer is shared between OpenCL and OpenGL. Must the data of it be transformed before and after they are processed by an OpenCL kernel or not?

How do clients on wireless networks decide who can transmit at any given time?

I've been thinking about wireless networking a little bit recently, and I came upon a realization last night that I can't find an answer to: how do clients know when they can transmit and not stomp over another clients' transmission?
I assume there is documentation for this sort of thing available, but I've been unable to find anything useful over a half hour of casual Google queries, probably because I don't know the right terms. Apologies in advance if this is a silly question . . .
Here's why I'm confused: based on my understanding of how RF hardware works, we can model the transmission medium as a safe shared register between different RF clients (because what one client broadcasts can be overwritten by other clients and get a muddle between the two). But safe registers only have consensus number 1, so how can we establish who can transmit at any given point? I'm assuming that only one client can transmit at once -- perhaps this is my fundamental misunderstanding?
Even the use of a randomized consensus protocol seems unwieldy, because the only ones I know of use atomic registers, not safe registers, and also have no upper bound, so two identical devices with the same random seed would proceed for a very long time.
Thanks!
Please check: Carrier sense multiple access with collision avoidance

Is there a good way to frame a protocol so data corruption can be detected in every case?

Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.

Resources