I recently cam across an AES implementation in C for embedded systems and have been looking at using it in a wireless point-to-many-points link (or point-to-point for starters).
The code compiles correctly on my platform and it seems to give the correct results when using the test code provided with CTR mode, back to back (this CRT mode is what I require as my packets are of 120 bytes each and cannot be changed).
Essentially the transmitter Tx should use the tiny-AES encryption function to generate a continuous stream of encrypted packets of 120 bytes each, transmit them with a CTR attached and the header etc, and to be received and decrypted on the other end by the receiver Rx.
It is envisaged the transmitter Tx and receiver Rx use exactly the same tiny-AES code with the same key that is known to me only. This key can be set and would be different on different Tx-Rx pairs.
My question relates to the CRT mode in particular and the use of the following functions both on the Tx and the Rx :
/* Initialize context calling (iv can be NULL): */
void AES_ctx_init(AES_ctx *ctx, uint32_t keylen, const uint8_t *key, const uint8_t *iv);
/* ... or reset IV or key at some point: */
void AES_ctx_set_key(AES_ctx *ctx, uint32_t keylen, const uint8_t *key);
void AES_ctx_set_iv(AES_ctx *ctx, const uint8_t *iv);
So
What is the purpose of AES_ctx_init() and is this done only once at the start of an encrypt/decrypt session of a secure link ?
What is the purpose of the AES_ctx_set_iv() function and again, is this done only once at the start of the secured link session ? The comment "... or reset IV or key at some point" is not clear to me – the reason for my question ... That is when does the IV need to be reset when using AES in CRT mode ?
Do the Tx and the Rx require to use the very same IV and can this be left as is or does it need to be changed with every new secured link session ? I understand that the secret key would normally be chosen for a Tx/Rx pair and only modified by the user on both when required but normally it would stay the same for every session. Is this generally correct ? Is this the case (or not) for the IV, and at what point does one reset the IV ?
In wireless links, due to noise, and if there is no implementation of a FEC, the receiver will be forced to drop packets due to bad CRCs. In my link the CRC is verified for the received 120 bytes packet (one by one as they are received) and the decryption is initiated to recover the original data if the CRC is correct, but the data is dropped if not. What are the implications of this to the stream of encrypted Tx packets as they keep getting transmitted regardless, as there is no hand-shaking protocol to tell the Tx the Rx dropped a packet due to a CRC error, if any ?
If answers to these questions exist in the form of some further documentation on the tiny-AES I would appreciate some links to it as the notes provided assume people are already familiar with AES etc.
So here's some further to the comments/replies already made. Basically the Tx / Rx pair has a pre-defined packet structure with a certain payload, header CRC etc but let’s call this the “packet”. The payload is what I need to encrypt and this is fixed to 120 bytes, not negotiable regardless and it’s in the packet.
So what you are saying is that with each transmission of a packet the Tx and Rx will need to change the nonce with every packet, and both Tx & Rx need to be using the same nonce with each treatment of a packet ?
Say I begin transmission, and have packet(1), packet(2) … packet(n) etc. then with each packet transmitted I need to update the “counter”, and that both transmitter and receiver need to be synchronized so that both use the same nonce in a session ?
This can be problematic when and if, due to noise or interference, the Tx/Rx system loses synchronization as it can happen, and somehow the two independent counters for the nonce are no longer synced and on the same page so-to-speak…
Generally a session would not require more than 2^16 packets on average, so can you see a way around this ? As I said, the option of sending a different nonce with every individual packet is totally out of the question, as the payload is already complete and full.
Another way I thought perhaps you can shed some light if this may work, is through GPS. Say if each Tx and Rx have each a GPS module (which is the case), then timing information could be derived from the GPS clock on both Tx & Rx as both would receive this independently, and some kind of synchronized counter could be rolling to update the nonce on both, say counting from 0 to 2^16 in a session… would you agree ? So even if due to noise/interference, packets are lost by the receiver, the counter would continue updating the nonce with some form of reliable “tick” in the background…
With regards to the source of entropy, well apparently the lampert circuit would provide this for a good PRNG running locally, say for a session of 2^16 packets, but this is not available on my system yet, but could be if we decide to take this further …
What are your thoughts on this as a professional in the field ?
Regards,
I'm assuming the Tiny AES you're referring to is tiny-AES-c (though your code is a little different than that code; I can't find anywhere that AES_ctx_set_key is defined). I assume when you say "CRT mode" you mean CTR mode.
What is the purpose of AES_ctx_init() and is this done only once at the start of an encrypt/decrypt session of a secure link ?
This initializes the internal data structures. Specifically it performs key expansion which creates the necessary round keys from the master key. In this library, it looks like it would be called once at the beginning of the session, to set the key and IV.
In CTR mode, it is absolutely critical that the Key+Nonce (IV) combination is never reused on two different messages. (CTR mode doesn't technically have an "IV." It has a "nonce," but it looks exactly like an IV and is passed identically in every crypto library I've ever seen.) If you need to reuse a nonce, you must change the key. This generally means that you will have to keep a persistent (i.e. across system resets) running counter on your encrypting device to keep track of the last-used nonce. There are ways to use a semi-random nonce safely, but you need a high-quality source of entropy, which is often not available on embedded devices.
If you pass NULL as the IV/nonce and reuse the key, CTR mode is a nearly worthless encryption scheme.
Since you need to be able to drop packets cleanly, what you'll need to do is send the nonce along with each packet. The nonce is not a secret; it just has to be unique for a given key. The nonce is 16 bytes long. If this is too much data, a common approach is to exchange the first 12 bytes of the nonce at the beginning of the session, and then use the bottom 4 bytes as a counter starting a 0. If sessions can be longer than 2^32 blocks, then you would need to reset the session after you run out of values.
Keep in mind that "blocks" here are 16 bytes long, and each block requires its own nonce. To fit in 120 bytes, you would probably do something like send the 4 byte starting counter, and then append 7 blocks for n, n+1, n+2, ... n+6.
Designing this well is a little complicated, and it's easy to do it wrong and undermine your cryptography. If this is important, you should probably engage someone with experience to design this for you. I do this kind of work, but I'm sure there are many people you could find.
So what you are saying is that with each transmission of a packet the Tx and Rx will need to change the nonce with every packet, and both Tx & Rx need to be using the same nonce with each treatment of a packet ?
Correct. The Tx side will determine the nonce, and the Rx side will consume it. Though they could coordinate somehow other than sending it explicitly, as long as it never repeats, and they always agree. This is one of the benefits of CTR mode; it relies on a "counter" that doesn't necessarily have to be sent.
The clock is nice because it's non-repeating. The tricky part is staying synchronized, while being absolutely certain you don't reuse a nonce. It's easier to stay synchronized if the time window for each nonce is large. It's easier to make sure you never reuse a nonce if the time window is small. Combining a clock-derived nonce with a sequential counter might be a good approach. There will still be ongoing synchronization issues on the receiving side where the receiver needs to try a few different counters (or nonce+counters) to find the right one. But it should be possible.
Note that synchronization assumes that there is a way to distinguish valid plaintext from invalid plaintext. If there is a structure to the plaintext, then you can try one nonce and if the output is gibberish, try another. But if there is no structure to the plaintext, then you have a big problem, because you won't be able to synchronize. By "structure" I mean "can you tell the difference between valid data and random noise.
Using the clock, as long as your window is small enough that you would never reuse a starting nonce, even across resets, you could avoid needing a PRNG. The nonce doesn't have to be random; it just can never repeat.
Note that if you have a 4-byte counter portion, then you have 2^32 blocks, not 2^16.
Related
I'm trying to implement simple AES encryption for small data packets (8bytes, fixed length) being sent from nodes with very little computational resources.
Ideally the encrypted data would be a single 16byte packet. The data being sent is very repetitive (for an example, wind velocity packet "S011D023", where S and D would always be in the same place, length would always be 8). The nodes may not have the ability to hold runtime variables (e.g. a frame counter, nonce etc.) but could be implemented if essential.
How do I make the solution "reasonably" secure against replay, brute force, any other well known attack? Does adding new random padding bytes to each new packet help in anyway?
I'm developing a desktop application with qt which communicates with stm32 to send and receive data.
The thing is, the data to transfer, follow a well-defined shape, with a previously defined fields. My problem is that I can't find how read () or readall() work or how Qserialport even treats the data. So my question is how can I read data (in real time, whenever there is data in the buffer) and analyze it field by field (or per byte) in order to be displayed in the GUI.
There's nothing to it. read() and readAll() give you bytes, optionally wrapped in a QByteArray. How you deal with those bytes is up to you. The serial port doesn't "treat" or interpret the data in any way.
The major point of confusion is that somehow people think of a serial port as if it was a packet oriented interface. It's not. When the readyRead() signal fires, all that you're guaranteed is that there's at least one new byte available to read. You must cope with such fragmentation.
my goal is to encrypt iBeacon data (UUID+MajorID+MinorID: 20 bytes), preferably with a timestamp included (encrypted and plain), thus summing up to around 31/32 bytes.
As you have already found out, this exceeds the maximum allowed iBeacon Data (31 bytes, including iBeacon prefix and tx power).
Thus, encryption only works by creating a custom beacon format instead of the iBeacon format?
Talking about encryption: I would think about using a symmetrical cipher using CBC operation mode (to avoid decipher due to repetition, and most important to avoid cipher text adjustments resulting in a changed UUID, Major-/MinorID). It's not problem to make the IV (Initialization Vector) public (unencrypted), right?
Remember, the iBeacons work in advertising mode (transfer only), without being connected to beforehand, thus I am not able to exchange an initialization vector (IV) before any data is sent.
Which considerations should be made using the most appropriate cipher algorithm? Is AES-128 OK? With or without padding-scheme? I also thought about a AES-GCM constellation, but I don't this it's really necessary/applicable due to the used advertising mode. How should I exchange session tokens? Moreover, there's no real session in my case, the iBeacons send data 24/7, open end, without a real initialization of a connection.
Suppose a system containing of 100 iBeacons, 20 devices and 1 application. Each iBeacon sends data periodically (i.e. 500ms), being received by near devices via BLE, that forward the data to an application via udp.
So the system overview relation is:
n iBeacons -(ble)- k devices -(udp)- 1 Application
Is it acceptable to use the same encryption key on each iBeacon? If I would work with a tuple (iBeacon Id / encryption key), i would additionally have to add the iBeacon Id to each packet, thus being able to lookup the key in a dictionary.
Should the data be decrypted on the device or only later in the application?
You can look at the Eddystone-EID spec to see how Google tried to solve a similar problem.
I'm not an expert on all the issues you bring up, but I think Google offers a good solution for your first question: how can you get your encrypted payload to fit in the small number of bytes available to a beacon packet?
Their answer is: you don't. They solve this problem by truncating the encrypted payload to make a hash, and then using an online service to "resolve" this hash and see if it matches any of your beacons. The online service simply performs the same hashing algorithm against every beacon you have registered, and see if it comes up with the same value for the time period. If so, it can convert the encrypted hash identifier into a useful one.
Before you go too far with this approach, be sure to consider #Paulw11's point that on iOS devices you can must know the 16-byte iBeacon UUID up front or the operating system blocks you from reading the rest of the packet. For this reason, you may have to stick with the Android platform if using iBeacon, or switch to the similar AltBeacon format that does not have this restriction on iOS.
thanks for your excellent post, I hadn't heard about the Eddystone spec before, it's really something to look into, and it has a maximum payload length of 31 bytes, just like iBeacon. In the paper it is written that in general, it is an encryption of a timestamp value (since a pre-defined epoch). Did I get the processing steps right?
Define secret keys (EIKs) for each beacon and share those keys with another resolver or server (if there's a constellation without a online resource);
Create lookup tables on the server with a PRF (like AES CTR);
Encrypt timestamp on beacon with PRF(i.e. AES CTR?) and send data in Eddystone format to resolver/server;
Server compares received ciphertext with entries in lookup table, thus knows about the beacon id, because each beacon has a unique secret key, which leads to a different ciphertext
Would it work the same way with AES-EAX? All I'd have to put into the algorithm is an IV(nonce value) as well as the key and message? The other end would build the same tuple to compare ciphertext and tag'?
Question: What about the IRK (identity resolution key), what is it? Which nonce would be used for AES CTR? Would each beacon use the same nonce? It has to be known to the resolver/server, too.
What about time synchronization? There's no real connection between beacon and resolver/server, but is it assumed that both ends are synchronized with the same time server? Or how should beacon and resolver/server get the same ciphertext, if one end works in the year 2011, while the resolver has arrived in 2017?
What is the purpose having created three type of parity bits that all define a state where the parity bit is precisely not used ?
"If the parity bit is present but not used, it may be referred to as mark parity (when the parity bit is always 1) or space parity (the bit is always 0)" - Wikipedia
There is a very simple and very useful reason to have mark or space parity that appears to be left out here: node address flagging.
Very low-power and/or small embedded systems sometimes utilize an industrial serial bus like RS485 or RS422. Perhaps many very tiny processors may be attached to the same bus.
These tiny devices don't want to waste power or processing time looking at every single character that comes in over the serial port. Most of the time, it's not something they're interested in.
So, you design a bus protocol that uses for example maybe 9 bits... 8 data bits and a mark/space parity bit. Each data packet contains exactly one byte or word (the node address) with the mark parity bit set. Everything else is space parity. Then, these tiny devices can simply wait around for a parity error interrupt. Once it get's the interrupt, it checks that byte. Is that my address? No, go back to sleep.
It's a very power-efficient system... and only 10% wasteful on bandwidth. In many environments, that's a very good trade-off.
So... if you've then got a PC-class system trying to TALK to these tiny devices, you need to be able to set/clear that parity bit. So, you set MARK parity when you transmit the node addresses, and SPACE parity everywhere else.
So there are five possibilities, not three: no parity, mark, space, odd and even. With no parity the extra bit is just omitted in the frame, often selected when the protocol is already checking for errors with a checksum or CRC or data corruption is not deemed likely or critical.
Nobody ever selects mark or space, that's just wasting bandwidth. Modulo some odd standard, like 9-bit data protocols that hardware vendors like to force you to buy their hardware since you have no real shot at reprogramming the UART on the fly without writing a driver.
Setting mark or space parity is useful if you're generating data to send to hardware that requires a parity bit (perhaps because it has a hard coded word length built into the electronics) but doesn't care what its value is.
RS485 requires 9 bits transmission, as described above. RS485 is widely used in industrial applications, whatever the controlled device 'size' (for instance there are many air conditioners or refrigerators offering a RS485 interface, not really 'tiny' things). RS485 allows up to 10Mbs throughput or distances up to 4000 feet. Using the parity bit to distinguish address/data bytes eases hardware implementation, each node of the network can have their own hardware to generate interrupts only if an address byte on the wire matches the node's address.
Very clear and helpful answers and remarks.
For those who find the concept perverse, relax; the term is a problem of semantics rather than information theory or engineering, the difficulty arising from the use of the word "parity".
"Mark" and "space" bits are not parity bits in those applications, and the term arises from the fact that they occupy the bit position in which a parity bit might be expected in other contexts. In reality they have nothing to do with parity, but are used for any relevant purpose where a constant bit value is needed, such as to mark the start of a byte or other signal, or as a delay,or to indicate the status of a signal as being data or address or the like.
Accordingly they sometimes are more logically called "stick (parity) bits", being stuck in "on" or "off" state. Sometimes they really are "don't cares".
Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.