Endianness of HMAC-SHA code - encryption

I am transmitting AES messages. My understanding is that: 1. The AES algorithm treats messages as byte-wise and is endian-neutral. 2. The Initialization Vector is endian-neutral as far as transmission and reception is concerned.
I am also calculating an HMAC-SHA384 code for the message. From my reading it sounds as though HMAC-SHA384 does need byte-swapping if the transmission endianness (big-endian in my case) does not match machine endianness. The swapping should occur from byte 0 and 47, 1 and 46, so on? Can anyone more knowledgeable in the subject than I confirm or contradict this please?
I am presently using the .NET HMACSHA384 class, but on the other end I will be writing C++ code and don't yet know what library will provide the HMAC code.

leppie is right, both are send as byte arrays. And you can be pretty sure that the byte array received will conform to NIST specifications and test vectors. So you should not overly worry about endianness in this case.
If there are people that need to worry then it is the implementors of the hash function. E.g. NIST unfortunately specified a little endian machine (Intel processor) as reference platform for SHA-3. The first version of Skein had incorrect test vectors because of a bug regarding endianness in the first iteration (1.0).

Related

What to expect and how to use OpenSSL 1-bit stream cipher modes

I am trying to use the 1-bit stream cipher mode in OpenSSL, specifically EVP_aes_128_cfb1.
I initialize the context with that parameter using, EVP_EncryptInit_ex().
The rest is unclear to me. I am expecting this to work as a bit-by-bit stream cipher, meaning that if I "feed" it with a stream of bits it will gradually "build" the ciphertext bit by bit, and that ciphertext should eventually be identical to a one-shot encryption of the entire thing when using a bigger bit width like 128 bits.
Whatever I try, this is not what I am getting, and to be honest I am not sure if the problem is in the expectation or in my usage of OpenSSL.
I am using EVP_EncryptUpdate(&ctx, p_target, &outlen, &current_byte, 1) but that means calling this 8 times with the same input byte, shifted properly. Then I expect the target byte, which also stays the same for 8 iteration at a time, to gradually be built one bit index at a time from MSB to LSB, with the context object orchestrating the state.
This does no yield the expected results, and I could not find working examples.
Help will be appreciated.
Thanks.

Is this an advantage of MPI_PACK over derived datatype?

Suppose a process is going to send a number of arrays of different sizes but of the same type to another process in a single communication, so that the receiver builds the same arrays in its memory. Prior to the communication the receiver doesn't know the number of arrays and their sizes. So it seems to me that though the task can be done quite easily with MPI_Pack and MPI_Unpack, it cannot be done by creating a new datatype because the receiver doesn't know enough. Can this be regarded as an advantage of MPI_PACK over derived datatypes?
There is some passage in the official document of MPI which may be referring to this:
The pack/unpack routines are provided for compatibility with previous libraries. Also, they provide some functionality that is not otherwise available in MPI. For instance, a message can be received in several parts, where the receive operation done on a later part may depend on the content of a former part.
You are absolutely right. The way I phrase it is that with MPI_Pack I can make "self-documenting messages". First you store an integer that says how many elements are coming up, then you pack those elements. The receiver inspects that first int, then unpacks the elements. The only catch is that the receiver needs to know an upper bound on the number of bytes in the pack buffer, but you can do that with a separate message, or a MPI_Probe.
There is of course the matter that unpacking a packed message is way slower than straight copying out of a buffer.
Another advantage to packing is that it makes heterogeneous data much easier to handle. The MPI_Type_struct is rather a bother.

are CRC generator bits same for all?

In CRC if sender has to send data, It will divide the data bits with g(x) which will give some remainder bits. Those remainder bits are appended to the data bits. Now when this code word is sent to the receiver, the receiver will divide the code word with same g(x) giving some remainder. If this remainder is zero that means data is correct.
Now, If all the systems can communicate with each other does that mean every system in the world have the same g(x). because sender and receiver must have common g(x).
(please answer only if have correct knowledge with some valid proof)
It depends on the protocol. CRC by itself can work for different polynomials, the protocols that use it define the g(x) polynomial to use.
There is a list of examples on https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Standards_and_common_use
This is not an issue since systems cannot communicate using different protocols on the sending and receiving end, obviously. Potentially, a protocol could also use a variable polynomial, somehow decided at the start of the communication, but I can't see why that would be useful
That's a big no. Furthermore, there are several other variations besides just g(x).
If someone tells you to compute a CRC, you have many questions to ask. You need to ask: what is g(x)? What is the initial value of the CRC? Is it exclusive-or'ed with a constant at the end? In what order are bits fed into the CRC, least-significant or most-significant first? In what order are the CRC bits put into the message? In what order are the CRC bytes put into the message?
Here is a catalog of CRCs, with (today) 107 entries. It is not all of the CRCs in use. There are many lengths of CRCs (the degree of g(x)), many polynomials (g(x)), and among those, many choices for the bit orderings, initial value, and final exclusive-or.
The person telling you to compute the CRC might not even know! "Isn't there just one?" they might naively ask (as you have). You then have to either find a definition of the CRC for the protocol you are using, and be able to interpret that, or find examples of correct CRC calculations for your message, and attempt to deduce the parameters.
By the way, not all CRCs will give zero when computing the CRC of a message with the CRC appended. Depending on the initial value and final exclusive-or, you will get a constant for all correct messages, but that constant is not necessarily zero. Even then, you will get a constant only if you compute the bits from the CRC in the proper order. It's actually easier and faster to ignore that property and compute the CRC on just the received message and then simply compare that to the CRC received with the message.

What Are The Reasons For Bit Shifting A Float Before Sending It Via A Network

I work with Unity and C# - when making multiplayer games I've been told that when it comes to values like positions that are floats, I should use a bit shift operator on them before sending them and reverse the operation on receive. I have been told this not only allows for larger numbers values and is capable of maintaining floating point precision which may be lost. However, if I do not have to, I do not wish to run this operation every time I receive a packet unless I have to. Though the bottle necks seem to be the actual parsing of the bytes received. Especially without message framing and attempting to move from string to byte array. (But that's another story!)
My question are:
Are these valid reason to undergo the operation? Are they accurate statements?
If not should I be running bit shift ops on my floats?
If I should, what are the real reasons to do it?
Any additional information would be most appreciated.
One of the resourcesI'm referring to:
Main reasons for going back and forth to/from network byte order is to combat endianness caused problems, mainly to ensure each byte of multi byte values (long, int but also floats) is read and written in the way giving the same results regardless of architecture. This issue can be theoretically ignored if you are sure you are exchanging data between systems using the same endianness, but that's rather bad idea from very beginning as you are simply creating unneded technological debt and keep unjustified exceptions in the code ("It all works BUT on the same endianness only. What can go wrong?").
Depending on your app architecture you can rewrite the packet payload/data once you receive it and then use that version further in the code. Also note that you need to encode the data again prior sending it out.

Endianness and OpenCL Transfers

In OpenCL, transfer from CPU client side to GPU server side is accomplished through clEnqueueReadBuffer(...)/clEnqueueWriteBuffer(...). However, the documentation does not specify whether any endian-related conversions take place in the underlying driver.
I'm developing on x86-64, and a NVIDIA card--both little endian, so the potential problem doesn't arise for me.
Does conversion happen, or do I need to do it myself?
The transfer do not do any conversions. The runtime does not know the type of your data.
You can probably expect conversions only on kernel arguments.
You can query the device endianness (using clGetDeviceInfo and check CL_DEVICE_ENDIAN_LITTLE ), but I am not aware of a way that allows transparent conversions.
This is the point, where INMHO the specification is not satisfactory.
At first it is clear about pointers, that is, data that a pointer is referencing can be in host or device byte order, and one can declare this by a pointer attribute, and the default byte order is that of the device.
So according to this, developers have to take care of the endianness that they feed as input to a kernel.
But than in "Appendix B - Portability" it's said that implementations may or may not automatically convert endianness of kernel arguments and that developers should consult the documentation of the vendors in case host and device byte order is different.
Sorry for me being that direct but what shit is that. I mean the intention of the OpenXX specifications is that they should make it possible to write cross platform code. But when there are that significant asspects that can vary from implementation to implementation, this is quite not possible.
The next point is, what does this all mean for OpenCL/OpenGL interoperation.
In OpenGL data for buffer objects like VBO's have to be in host byte order. So what in case such a buffer is shared between OpenCL and OpenGL. Must the data of it be transformed before and after they are processed by an OpenCL kernel or not?

Resources