How to encrypt extremely small pieces of data (2 - 4 bytes)? - encryption

I'm developing a medical device that operates under extreme low-power constraints and transmits data with BLE, where every unnecessary millisecond of air time matters. I am developing a Beacon advertising protocol where sensor data is encoded directly into the advertising packet. The data I need to transmit is very simple and small, occupying only two to four total bytes. I would like to encrypt these 2-4 bytes before adding them to the advertising packet, to be decrypted by the receiving smartphone after being pulled out of the advertising packet. I'd like to keep the encrypted data as small as possible, whereas I know many algorithms create set length strings (20+ bytes).
I am currently transmitting these 2-4 bytes as hexadecimal values, to be interpreted as integer values by the receiving smartphone application. I am not overly experienced with cryptography or encryption methods, but do know that most algorithms operate on larger pieces of data. Are there any algorithms or protocols that would be appropriate for such small pieces of data? Should I "roll my own" protocol for encryption since this may be a special case?

Related

AES encryption for repetitive small data packets

I'm trying to implement simple AES encryption for small data packets (8bytes, fixed length) being sent from nodes with very little computational resources.
Ideally the encrypted data would be a single 16byte packet. The data being sent is very repetitive (for an example, wind velocity packet "S011D023", where S and D would always be in the same place, length would always be 8). The nodes may not have the ability to hold runtime variables (e.g. a frame counter, nonce etc.) but could be implemented if essential.
How do I make the solution "reasonably" secure against replay, brute force, any other well known attack? Does adding new random padding bytes to each new packet help in anyway?

iBeacon Data Encryption (Advertising Mode)

my goal is to encrypt iBeacon data (UUID+MajorID+MinorID: 20 bytes), preferably with a timestamp included (encrypted and plain), thus summing up to around 31/32 bytes.
As you have already found out, this exceeds the maximum allowed iBeacon Data (31 bytes, including iBeacon prefix and tx power).
Thus, encryption only works by creating a custom beacon format instead of the iBeacon format?
Talking about encryption: I would think about using a symmetrical cipher using CBC operation mode (to avoid decipher due to repetition, and most important to avoid cipher text adjustments resulting in a changed UUID, Major-/MinorID). It's not problem to make the IV (Initialization Vector) public (unencrypted), right?
Remember, the iBeacons work in advertising mode (transfer only), without being connected to beforehand, thus I am not able to exchange an initialization vector (IV) before any data is sent.
Which considerations should be made using the most appropriate cipher algorithm? Is AES-128 OK? With or without padding-scheme? I also thought about a AES-GCM constellation, but I don't this it's really necessary/applicable due to the used advertising mode. How should I exchange session tokens? Moreover, there's no real session in my case, the iBeacons send data 24/7, open end, without a real initialization of a connection.
Suppose a system containing of 100 iBeacons, 20 devices and 1 application. Each iBeacon sends data periodically (i.e. 500ms), being received by near devices via BLE, that forward the data to an application via udp.
So the system overview relation is:
n iBeacons -(ble)- k devices -(udp)- 1 Application
Is it acceptable to use the same encryption key on each iBeacon? If I would work with a tuple (iBeacon Id / encryption key), i would additionally have to add the iBeacon Id to each packet, thus being able to lookup the key in a dictionary.
Should the data be decrypted on the device or only later in the application?
You can look at the Eddystone-EID spec to see how Google tried to solve a similar problem.
I'm not an expert on all the issues you bring up, but I think Google offers a good solution for your first question: how can you get your encrypted payload to fit in the small number of bytes available to a beacon packet?
Their answer is: you don't. They solve this problem by truncating the encrypted payload to make a hash, and then using an online service to "resolve" this hash and see if it matches any of your beacons. The online service simply performs the same hashing algorithm against every beacon you have registered, and see if it comes up with the same value for the time period. If so, it can convert the encrypted hash identifier into a useful one.
Before you go too far with this approach, be sure to consider #Paulw11's point that on iOS devices you can must know the 16-byte iBeacon UUID up front or the operating system blocks you from reading the rest of the packet. For this reason, you may have to stick with the Android platform if using iBeacon, or switch to the similar AltBeacon format that does not have this restriction on iOS.
thanks for your excellent post, I hadn't heard about the Eddystone spec before, it's really something to look into, and it has a maximum payload length of 31 bytes, just like iBeacon. In the paper it is written that in general, it is an encryption of a timestamp value (since a pre-defined epoch). Did I get the processing steps right?
Define secret keys (EIKs) for each beacon and share those keys with another resolver or server (if there's a constellation without a online resource);
Create lookup tables on the server with a PRF (like AES CTR);
Encrypt timestamp on beacon with PRF(i.e. AES CTR?) and send data in Eddystone format to resolver/server;
Server compares received ciphertext with entries in lookup table, thus knows about the beacon id, because each beacon has a unique secret key, which leads to a different ciphertext
Would it work the same way with AES-EAX? All I'd have to put into the algorithm is an IV(nonce value) as well as the key and message? The other end would build the same tuple to compare ciphertext and tag'?
Question: What about the IRK (identity resolution key), what is it? Which nonce would be used for AES CTR? Would each beacon use the same nonce? It has to be known to the resolver/server, too.
What about time synchronization? There's no real connection between beacon and resolver/server, but is it assumed that both ends are synchronized with the same time server? Or how should beacon and resolver/server get the same ciphertext, if one end works in the year 2011, while the resolver has arrived in 2017?

Many small UDP datagrams vs fewer, larger ones

I have a system that sends "many" (hundreds) of UDP datagrams in bursts, every once in awhile (say, 10 times a minute). According to nload, this averages about 222kBit/s. The content of these datagrams is JSON. I've considered altering the system so that it waits some time (500ms?) and combines many of the JSON objects into one datagram, before sending. But I'm not sure it's worth the effort (bandwidth, protocol, frequency of sending considered.) Would the new approach provide any real benefits over the current one?
The short answer is that it's up to you to decide that.
The long version is that it depends on your use case. Since we don't know what you're building, it's hard to say what's more important - latency? Throughput? Reliability? Something else? Let's analyze some pros and cons. Here's what I came up with:
Pros to sending larger packets:
Fewer messages means fewer system calls and less I/O against the network. That means fewer blocked/waiting threads and less time spent on interrupts.
Fewer, larger packets means less overhead for each individual packet (stuff like IP/UDP headers that's send with each packet). Therefore a higher data rate is (theoretically) achievable, although keep in mind that all of these headers (L2+IP+UDP) typically add up to no more than 60-70 bytes per packet since the UDP header is only 8 bytes long.
Since UDP doesn't guarantee ordering, larger packets with more time between them will reduce any existing reordering.
Cons to sending larger packets:
Re-writing existing code, and making it (slightly) more complicated.
UDP is unreliable, so a loss of a single (large) packet would be more significant compared to the loss of a small packet.
Latency - some data will have to wait 500ms to be sent. That means that a delay is added between the sender and the receiver.
Fragmentation - if one of the packets you create crosses the MTU boundary (typically 1450-1500 bytes including the IP+UDP header, which is normally 28 bytes long), the IP layer would need to fragment the packet into several smaller ones. IP fragmentation is considered bad for a multitude of reasons.
Processing of larger packets might take longer

Is there a good way to frame a protocol so data corruption can be detected in every case?

Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.

Tibco Rendezvous - size constraints

I am attempting to put a potentially large string into a rendezvous message and was curious about size constraints. I understand there is a physical limit (64mb?) to the message as a whole, but I'm curious about how some other variables could affect it. Specifically:
How big the keys are?
How the string is stored (in one field vs. multiple fields)
Any advice on any of the above topics or anything else that could be relevant would be greatly appreciated.
Note: I would like to keep the message as a raw string (as opposed to bytecode, etc).
From the Tibco docs on Very Large Messages:
Rendezvous software can transport very
large messages; it divides them into
small packets, and places them on the
network as quickly as the network can
accept them. In some situations, this
behavior can overwhelm network
capacity; applications can achieve
higher throughput by dividing large
messages into smaller chunks and
regulating the rate at which it sends
those chunks. You can use the
performance tool to evaluate chunk
sizes and send rates for optimal
throughput.
This example, sends one message
consisting of ten million bytes.
Rendezvous software automatically
divides the message into packets and
sends them. However, this burst of
packets might exceed network capacity,
resulting in poor throughput:
sender> rvperfm -size 10000000 -messages 1
In this second example, the
application divides the ten million
bytes into one thousand smaller
messages of ten thousand bytes each,
and automatically determines the batch
size and interval to regulate the flow
for optimal throughput:
sender> rvperfm -size 10000 -messages 1000 -auto
By varying the -messages and -size
parameters, you can determine the
optimal message size for your
applications in a specific network.
Application developers can use this
information to regulate sending rates
for improved performance.
As to actual limits the Add string function takes a C style ansi string so is theoretically unbounded but, given the signature of the AddOpaque
tibrv_status tibrvMsg_AddOpaque(
tibrvMsg message,
const char* fieldName,
const void* value,
tibrv_u32 size);
which takes a u32 it would seem sensible to state that the limit is likely to be 4GB rather than 64MB.
That said using Tib to transfer such large packets is likely to be a serious performance bottleneck as it may have to buffer significant amounts of traffic as it tries to get these sorts of messages to all consumers. By default the rvd buffer is only 60 seconds so you may find yourself suffering message loss if this is a significant amount of your traffic.
Message overhead within tibco is largely as simple as:
the fixed cost associated with each message (the header)
All the fields (type info and the field id)
Plus the cost of all variable length aspects including:
the send and receive subjects (effectively limited to 256 bytes each)
the field names. I can find no limit to the length of the field names in the docs but the smaller they are the better, better still don't use them at all and use the numerical identifiers
the array/string/opaque/user defined variable length fields in the message
Note: If you use nested messages simply recurse the above.
In your case the payload overhead will be so vast in comparison to the names (so long as they are reasonable and simple) there is little point attempting to optimize these at all.
You may find you can considerable efficiency on the wire/buffered if you transmit the strings in a compressed form, either through the use of an rvrd with compression enabled or by changing your producer/consumer to use something fast but effective like deflate (or if you're feeling esoteric things like QuickLZ,FastLZ,LZO,etc. Especially ones with fixed memory footprint compress/decompress engines)
You don't say which platform api you are targeting (.net/java/C++/C for example) and this will colour things a little. On the wire all string data will be in 1 byte per character regardless of java/.net using UTF-16 by default however you will incur a significant translation cost placing these into/reading them out of the message because the underlying buffer cannot be reused in those cases and a copy (and compaction/expansion respectively) must be performed.
If you stick to opaque byte sequences you will still have the copy overhead in the naieve implementations possible through the managed wrapper apis but this will at least be less overhead if you have no need to work with the data as a native string.
The overall maximum size of a message is 64MB as was speculated in the OP. From the "Tibco Rendezvous Concepts" document:
Although the ability to exchange large data buffers is a feature of Rendezvous
software, it is best not to make messages too large. For example, to exchange data
up to 10,000 bytes, a single message is efficient. But to send files that could be
many megabytes in length, we recommend using multiple send calls, perhaps one
for each record, block or track. Empirically determine the most efficient size for
the prevailing network conditions. (The actual size limit is 64 MB, which is rarely
an appropriate size.)

Resources