What is the difference between `ether_header` and `ethhdr`? - networking

I'm currently trying to parse the packets from the raw buffer. I find there seems to be two structures for ethernet header: ether_header and ethhdr. I'm kind of confused what's the difference and relationship for them? Could I use them interchangeable?
I did a quick search:
This post suggested that two identical implementation exist for IP header. Is this the case for Ethernet (and perhaps TCP, UDP, etc.)?
This patch shows the efforts to change from one implementation from one to the other. I'm not sure the incentive behind this: perhaps one implementation is somehow "better"?
Thanks!

In Linux, the ethhdr struct is defined in uapi/linux/if_ether.h It's placement in a kernel uapi (user-space API) header file solidifies it as a stable definition to default to.
The ether_header struct has been defined in numerous locations over the years. The number of definitions and references to those definitions has been declining over time as code migrates to using ethhdr (as you observed in the patch you linked to).
An easy way to see see this progression is by looking at kernel source cross referencer output over time. The following list depicts the number of ether_header definitions/references for a few kernel versions over time:
2.6.39 (2011-08-03): 5/8
3.19.8 (2015-05-11): 4/5
4.17-rc17 (2018-05-30): 1/2
For comparison, here are the results when looking up ethhdr:
4.17.rc17 - 1/193

Related

Why would VkImageView format differ from the underlying VkImage format?

VkImageCreateInfo has the following member:
VkFormat format;
And VkImageViewCreateInfo has the same member.
What I don't understand why you would ever have a different format in the VkImageView from the VkImage needed to create it.
I understand some formats are compatible with one another, but I don't know why you would use one of the alternate formats
The canonical use case and primary original motivation (in D3D10, where this idea originated) is using a single image as either R8G8B8A8_UNORM or R8G8B8A8_SRGB -- either because it holds different content at different times, or because sometimes you want to operate in sRGB-space without linearization.
More generally, it's useful sometimes to have different "types" of content in an image object at different times -- this gives engines a limited form of memory aliasing, and was introduced to graphics APIs several years before full-featured memory aliasing was a thing.
Like a lot of Vulkan, the API is designed to expose what the hardware can do. Memory layout (image) and the interpretation of that memory as data (image view) are different concepts in the hardware, and so the API exposes that. The API exposes it simply because that's how the hardware works and Vulkan is designed to be a thin abstraction; just because the API can do it doesn't mean you need to use it ;)
As you say, in most cases it's not really that useful ...
I think there are some cases where it could be more efficient, for example getting a compute shader to generate integer data for some types of image processing can be more energy efficient than either float computation or manually normalizing integer data to create unorm data. Using aliasing you the compute shader can directly write e.g. uint8 integers and a fragment shader can read the same data as unorm8 data

MPI and hierarchical collectives - missing 'hierarch' coll module from OpenMPI 2.x?

I'm working on an application that is critically dependent on the performance of MPI_Alltoall calls with very small messages (less than 4KB) flying among a large number of processes (currently about 200, while the target is in the thousands and more).
I was under the impression, and reading this paper seems to corroborate that impression, that it would be sensible to exploit the hierarchy of a modern PC cluster (as the one I have at my disposal) by segregating processes belonging to a single cluster node within separate communicators (OpenMPI even has a function, from the MPI 3 standard, that does just that) and then, instead of an MPI_Alltoall over MPI_COMM_WORLD, using a sequence MPI_Gather - MPI_Alltoall - MPI_Scatter whereas Gather and Scatter are restricted within those communicators while the Alltoall is over another communicator including one and one only 'gatherer' process per node, in this way exploiting the supposedly faster transfers in internal memory for the Gather and Scatter while hopefully increasing the efficiency of the Alltoall among nodes by having fewer, larger messages passing through the network interfaces (ConnectX InfiniBand NICs by Mellanox, in my case).
I'm trying to verify the assertions of the paper, and if someone is interested I can share my findings, but what I wanted to know is this: perusing the 1.10.x OpenMPI sources, I clearly see a 'hierarch' component among the 'coll' modules, which is also mentioned in the README that makes it seem like such a hierarchical implementation had been pulled in already.
Nevertheless, I was never able to make it work and it seems that it vanished altogether from the 2.x branch (there is none in the 'ompi_info' output).
Is there anyone that succeeded in using it? Can you tell about any improvements, if any, compared to a regular MPI_Alltoall?

Can ZeroMQ provide grounds for a bidirectional non-blocking asynchronous transmission?

I have a system which consists of two applications. Currently, two applications communicate using multiple ZeroMQ PUB/SUB patterns generated for each specific type of transmission. Sockets are programmed in C.
For example, AppX uses a SUB formal-socket archetype for receiving an information struct from AppY and uses another PUB formal-socket archetype for transmitting raw bit blocks to AppY and same applies to AppY. It uses PUB/SUB patterns for transmission and reception.
To be clear AppX and AppY perform the following communications:
AppX -> AppY :- Raw bit blocks of 1 kbits (continous),- integer command (not continuous, depends on user)
AppY -> AppX :Information struct of 10kbits (continuous)
The design target:
a) My goal is to use only one socket at each side for bidirectional communication in nonblocking mode.
b) I want two applications to process queued received packets without an excess delay.
c) I don't want AppX to crash after a crashed AppY.
Q1: Would it be possible with ZeroMQ?
Q2: Can I use ROUTER/DEALER or any other pattern for this job?
I have read the guide but I could not figure out some aspects.
Actually I'm not well experienced with ZeroMQ. I would be pleased to hear about additional tips on this problem.
A1: Yes, this is possible in ZeroMQ or nanomsg sort of tools
Both the ZeroMQ and it's younger sister nanomsg share the vision of Scaleable ( which you did not emphasise yet )Formal ( hard-wired formal behaviour )Communication ( yes, it's about this )
Pattern ( that are wisely carved and ready to re-use and combine as needed )
This said, if you prefer to have just one socket-pattern on each "side", then you have to choose such a Formal Pattern, that would leave you all the freedom from any hard-wired behaviour, so as to meet your goal.
So, a) "...only one" is doable -- by a solo of zmq.PAIR (which some parts of documentation flag as a still an experimental device) or NN.BUS or a pair of PUSH/PULL if you step back from allowing just a single one ( which in fact does eliminate all the cool powers of the sharing of the zmq.Context() instantiated IO-thread(s) for re-using the low-level IO-engine. If you spend a few minutes with examples referred to below, you will soon realise, that the very opposite policy is quite common and beneficial to the design targets, when one uses more, even many, patterns in a system architecture.
The a) "...non-blocking" is doable, by stating proper directives zmq.NOBLOCK for respective .send() / .recv() functions and by using fast, non-blocking .poll() loops in your application design architecture.
On b) "...without ... delay" is related to the very noted remark on application design architecture, as you may loose this just by relying on a poor selection and/or not possible tuning of the event-handler's internal timings and latency penalties. If you shape your design carefully, you might remain in a full control of the delay/latency your system will experience and not bacoming a victim of any framework's black-box event-loop, where you can nothing but wait for it's surprises on heavy system or traffic loads.
On c) "... X crash after a Y crashed" is doable on { ZeroMQ | nanomsg }-grounds, by a carefull combination of non-blocking mode of all functions + by your design beeing able to handle exceptions in the situations it does not receive any POS_ACK from the intended { local | remote }-functionality. In this very respect, it is fair to state, that some of the Formal Communication Patters do not have this very flexibility, due to some sort of a mandatory internal behaviour, that is "hard wired" internally, so a due care is to be taken in selecting a proper FCP-archetype for each such still scaleable but fault-resilient role.
Q2: No.
The best next step:
You might feel interested in other ZeroMQ posts here and also do not miss the link to the book, referred there >>>
Q1: yes
Q2: no, ZMQ_DEALER should be used by both AppX and AppY.
See http://zguide.zeromq.org/c:asyncsrv. Notice ZMQ_ROUTER in this example just aim to distribute request from multi-client to different thread where ZMQ_DEALER do real work.

Endianness and OpenCL Transfers

In OpenCL, transfer from CPU client side to GPU server side is accomplished through clEnqueueReadBuffer(...)/clEnqueueWriteBuffer(...). However, the documentation does not specify whether any endian-related conversions take place in the underlying driver.
I'm developing on x86-64, and a NVIDIA card--both little endian, so the potential problem doesn't arise for me.
Does conversion happen, or do I need to do it myself?
The transfer do not do any conversions. The runtime does not know the type of your data.
You can probably expect conversions only on kernel arguments.
You can query the device endianness (using clGetDeviceInfo and check CL_DEVICE_ENDIAN_LITTLE ), but I am not aware of a way that allows transparent conversions.
This is the point, where INMHO the specification is not satisfactory.
At first it is clear about pointers, that is, data that a pointer is referencing can be in host or device byte order, and one can declare this by a pointer attribute, and the default byte order is that of the device.
So according to this, developers have to take care of the endianness that they feed as input to a kernel.
But than in "Appendix B - Portability" it's said that implementations may or may not automatically convert endianness of kernel arguments and that developers should consult the documentation of the vendors in case host and device byte order is different.
Sorry for me being that direct but what shit is that. I mean the intention of the OpenXX specifications is that they should make it possible to write cross platform code. But when there are that significant asspects that can vary from implementation to implementation, this is quite not possible.
The next point is, what does this all mean for OpenCL/OpenGL interoperation.
In OpenGL data for buffer objects like VBO's have to be in host byte order. So what in case such a buffer is shared between OpenCL and OpenGL. Must the data of it be transformed before and after they are processed by an OpenCL kernel or not?

Packet data structure?

I'm designing a game server and I have never done anything like this before. I was just wondering what a good structure for a packet would be data-wise? I am using TCP if it matters. Here's an example, and what I was considering using as of now:
(each value in brackets is a byte)
[Packet length][Action ID][Number of Parameters]
[Parameter 1 data length as int][Parameter 1 data type][Parameter 1 data (multi byte)]
[Parameter 2 data length as int][Parameter 2 data type][Parameter 2 data (multi byte)]
[Parameter n data length as int][Parameter n data type][Parameter n data (multi byte)]
Like I said, I really have never done anything like this before so what I have above could be complete bull, which is why I'm asking ;). Also, is passing the total packet length even necessary?
Passing the total packet length is a good idea. It might cost two more bytes, but you can peek and wait for the socket to have a full packet ready to sip before receiving. That makes code easier.
Overall, I agree with brazzy, a language supplied serialization mechanism is preferrable over any self-made.
Other than that (I think you are using a C-ish language without serialization), I would put the packet ID as the first data on the packet data structure. IMHO that's some sort of convention because the first data member of a struct is always at position 0 and any struct can be downcast to that, identifying otherwise anonymous data.
Your compiler may or may not produce packed structures, but that way you can allocate a buffer, read the packet in and then either cast the structure depending on the first data member. If you are out of luck and it does not produce packed structures, be sure to have a serialization method for each struct that will construct from the (obviously non-destination) memory.
Endiannes is a factor, particularly on C-like languages. Be sure to make clear that packets are of the same endianness always or that you can identify a different endian based on a signature or something. An odd thing that's very cool: C# and .NET seems to always hold data in little-endian convention when you access them using like discussed in this post here. Found that out when porting such an application to Mono on a SUN. Cool, but if you have that setup you should use the serialization means of C# anyways.
Other than that, your setup looks very okay!
Start by considering a much simpler basic wrapper: Tag, Length, Value (TLV). Your basic packet will look then like this:
[Tag] [Length] [Value]
Tag is a packet identifier (like your action ID).
Length is the packet length. You may need this to tell whether you have the full packet. It will also let you figure out how long the value portion is.
Value contains the actual data. The format of this can be anything.
In your case above, the value data contains a further series of TLV structures (parameter type, length, value). You don't actually need to send the number of parameters, as you can work it from the data length and walking the data.
As others have said, I would put the packet ID (Tag) first. Unless you have cross-platform concerns, I would consider wrapping your application's serialised object in a TLV and sending it across the wire like that. If you make a mistake or want to change later, you can always create a new tag with a different structure.
See Wikipedia for more details on TLV.
To avoid reinventing the wheel, any serialization protocol will work for on the wire data (e.g. XML, JSON), and you might consider looking at BEEP for the basic protocol framework.
BEEP is summed up well in its FAQ document as 'kind of a "best hits" album of the tricks used by experienced application protocol designers since the early 80's.'
There's no reason to make something so complicated like that. I see that you have an action ID, so I suppose there would be a fixed number of actions.
For each action, you would define a data structure, and then you would put each one of those values in the structure. To send it over the wire, you just allocate sum(sizeof(struct.i)) bytes for each element in your structure. So your packet would look like this:
[action ID][item 1 (sizeof(item 1 bytes)][item 1 (sizeof(item 2 bytes)]...[item n (sizeof(item n bytes)]
The idea is, you already know the size and type of each variable on each side of the connection is, so you don't need to send that information.
For strings, you can just throw 'em in in a null terminated form, and then when you 'know' to look for a string based on your packet type, start reading and looking for a null.
--
Another option would be to use '\r\n' to delineate your variables. That would require some overhead, and you would have to use text, rather then binary values for numbers. But that way you could just use readline to read each variable. Your packets would look like this
[action ID]
[item 1 (as text)]
...
[item n (as text)]
--
Finally, simply serializing objects and passing them down the wire is a good way to do this too, with the least amount of code to write. Remember that you don't want to prematurely optimize, and that includes network traffic as well. If it turns out you need to squeeze out a little bit more performance later on you can go back and figure out a more efficient mechanism.
And check out google's protocol buffers, which are supposedly an extreemly fast way to serialize data in a platform-neutral way, kind of like a binary XML, but without nested elements. There's also JSON, which is another platform neutral encoding. Using protocol buffers or JSON would mean you wouldn't have to worry about how to specifically encode the messages.
Do you want the server to support multiple clients written in different languages? If not, it's probably not necessary to specify the structure exactly; instead use whatever facility for serializing data your language offers, simply to reduce the potential for errors.
If you do need the structure to be portable, the above looks OK, though you should specify stuff like endianness and text encoding as well in that case.

Resources