Best practices for passing encrypted data between different programming languages - encryption

I have read that if you want to encrypt a string using one programming language and decrypt that string using another programming language, then to ensure compatibility it is best to do some conversions prior to doing the encryption. I have read that it's a best practice to encrypt the byte array of a string rather than the string itself. Also, I have read that certain encryption algorithms expect each encrypted packet to be a fixed length in size. If the last packet to be encrypted isn't the required size, then encryption would fail. Therefore it seems like a good idea to encrypt data that has first been converted into a fixed length, such as hex.
I am trying to identify best practices that are generally useful regardless of the encryption algorithm being used. To maximize compatibility when encrypting and decrypting data across different languages and platforms, I would like a critique on the following steps as a process:
Encryption:
start with a plain text string
convert plain text string to byte array
convert byte array to hex
encrypt hex to encrypted string
end with an encrypted string
Decryption:
start with an encrypted string
decrypt encrypted string to hex
convert hex to byte array
convert byte array to plain text string
end with a plain text string

Really the best practice for encryption is to use a high level encryption framework, there's a lot of things you can do wrong working with the primitives. And mfanto does a good a good job of mentioning important things you need to know if you don't use a high level encryption framework. And i'm guessing that if you are trying to maximize compatibility across programming languages, it's because you need other developers to inter-operate with the encryption, and then they need to learn the low level details of working with encryption too.
So my suggestion for high level framework is to use the Google Keyczar framework, as it handles the details of, algorithm, key management, padding, iv, authentication tag, wire format all for you. And it exists for many different programming Java, Python, C++, C# and Go. Check it out.
I wrote the C# version, so I can tell you the primitives it uses behind the scenes are widely available in most other programming languages too, and it uses standards like json for key management and storage.

Your premise is correct, but in some ways it's a little easier than that. Modern crypto algorithms are meant to be language agnostic, and provided you have identical inputs with identical keys, you should get identical results.
It's true that for most ciphers and some modes, data needs to be a fixed length. Converting to hex won't do it, because the data needs to end on fixed boundaries. With AES for example, if you want to encrypt 4 bytes, you'll need to pad it out to 16 bytes, which a hex representation wouldn't do. Fortunately that'll most likely happen within the crypto API you end up using, with one of the standard padding schemes. Since you didn't tag a language, here's a list of padding modes that the AesManaged class in .NET supports.
On the flip side, encrypting data properly requires a lot more than just byte encoding. You need to choose the correct mode of operation (CBC or CTR is preferred), and then provide some type of message integrity. Encryption alone doesn't protect against tampering with data. If you want to simplify things a bit, then look at a mode like GCM, which handles both confidentiality, and integrity.
Your scheme should then look something like:
Convert plain text to string to byte array. See #rossum's comment for an important note about character encoding.
Generate a random symmetric key or use PBKDF2 to convert a passphrase to a key
Generate a random IV/nonce for use with GCM
Encrypt the byte array and store it, along with the Authentication Tag
You might optionally want to store the byte array as a Base64 string.
For decryption:
If you stored the byte array as a Base64 string, convert back to the byte array.
Decrypt encrypted byte array to plaintext
Verify the resulting Authentication Tag matches the stored Authentication Tag
Convert byte array to plain text string.

I use HashIds for this purpose. It's simple and supports wide range of programming language. We use it to pass encrypted data between our PHP, Node.js, and Golang microservices whenever we need to decrypt data in the destination.

I have read that it's a best practice to encrypt the byte array of a string rather than the string itself.
Crytographic algorithms generally work on byte arrays or byte stream, so yes. You don't encrypt objects (strings) directly, you encrypt their byte representations.
Also, I have read that certain encryption algorithms expect each encrypted packet to be a fixed length in size. If the last packet to be encrypted isn't the required size, then encryption would fail.
This is an implementation detail of the particular encryption algorithm you choose. It really depends on what the API interface is to the algorithm.
Generally speaking, yes, crytographic algorithms will break input into fixed-size blocks. If the last block isn't full then they may pad the end with arbitrary bytes to get a full chunk. To distinguish between padded data and data which just happens to have what-look-like-padding bytes at the end, they'll prepend or append the length of the plain text to the byte stream.
This is the kind of detail that should not be left up to the user, and a good encryption library will take care of these details for you. Ideally you just want to feed in your plain text bytes and get encrypted bytes out on the other side.
Therefore it seems like a good idea to encrypt data that has first been converted into a fixed length, such as hex.
Converting bytes to hex doesn't make it fixed length. It doubles the size, but that's not fixed. It makes it ASCII-safe so it can be embedded into text files and e-mails easily, but that's not relevant here. (And Base64 is a better binary→ASCII encoding than hex anyways.)
In the interest of identifying best practices for ensuring compatibility with encrypting and decrypting data across different languages and platforms, I would like a critique on the following steps as a process:
Encryption:
plain text string
convert plain text string to byte array
convert byte array to hex
encrypt hex to encrypted string
encrypted string
plain text byte array to encrypted byte array
Decryption:
encrypted string
decrypt encrypted string to hex
convert hex to byte array
encrypted byte array
decrypt encrypted byte array to plain text byte array
convert byte array to plain text string
plain text string
To encrypt, convert the plain text string into its byte representation and then encrypt these bytes. The result will be an encrypted byte array.
Transfer the byte array to the other program in the manner of your choosing.
To decrypt, decrypt the encrypted byte array into a plain text byte array. Construct your string from this byte array. Done.

Related

File Delimiters on AES 256 Encrypted fields

I have a requirement for one of my projects in which I am expecting a few of the incoming fields encrypted as AES-256 when sent to us by upstream. The incoming file is comma delimited. Is there a possibility that the AES encrypted fields may contain "," throwing off the values to different fields? What about if it is pipe delimited or some other delimiter?
Also, how what should be the datatype of these encrypted fields in order to read these encrypted fields using an ETL tool?
Thanks in Advance
AES as a block cipher is a family of permutations selected by the key. The output is expected to be random ( more precisely we believe that AES is a Pseudo-Random-Permutation)
AES ( like any block cipher) outputs binary data, usually as a byte array and bytes can take any value between 0 and 256 with equal probability.
You are not alone;
Transmitting binary data can create problems, especially in protocols that are designed to deal with textual data. To avoid it altogether, we don't transmit binary data. Many of the programming errors related to encryption on Stack Overflow are due to sending binary data over text-based protocols. Most of the time this works, but occasionally it fails and the coders wonder about the problem. The binary data corrupts the network protocol.
Therefore hex, base64, or similar encodings are necessary to mitigate this. Base64 is not totally URL safe and one can make it URL safe with a little work.
And note that has nothing to do with security; it is about visibility and interoperability.

Encrypting Text as well as Integers

Hi i have created an RSA encryption using Java that so far only encrypts and decrypts BigIntegers and would like to make it so that it can also do the same for other characters, i have a feeling that i would have to convert everything into Ascii (if this is even possible) to then encrypt but no idea how.
i would have to convert everything into Ascii
consider encryption as converting a byte array to a different byte array (even BigInt is represented as a byte array)
Still I see several issues:
You are implementing your own textbook RSA (no padding, no mitigation for side-channel attacks) and this approach is really really not secure. It's ok to do it for learning purposes (even I'd object) , but not for any real life secure encryption.
RSA is secure (when used properly) to encrypt a fixed block of data. If you want to use RSA to encrypt data of any length, you may use hybrid encryption (use symmetric encryption with a random key to encrypt data and RSA to encrypt the key)
you may have a look at my blog for that
https://gusto77.wordpress.com/2017/10/30/encryption-reference-project/

How to convert a message to an integer to encrypt it with RSA?

In encryption methods like RSA, we operate on an integer which represents our message. I've toyed around with converting the string to an array of bytes and working one character at a time, but that seems overly slow and the RSA algorithm is designed to work with the entire message.
How do we convert a string to a representation (integer, big integer etc) in which we can apply our cryptographic algorithm too?
In typical usage, you don't actually encrypt the entire message using RSA. Instead, you encrypt the encryption key for a symmetric block cipher (like AES) using RSA, then encrypt your stream of data using that block cipher.
Do not attempt to do this on your own! You have to be very careful with how you do the conversion, including setting up a secure padding scheme and using the block cipher correctly and in a secure mode. You might want to look using language-provided crypto libraries or a standard library like OpenSSL.
Hope this helps!
Think about how integers and strings are represented in memory. A 32-bit integer takes up four 8-bit bytes, and a 64-bit integer takes up eight bytes. A string is stored as bytes too, and in case of ASCII, each characters is represented by one byte. (UTF-8 and UTF-16 are variable length encodings, but it's still bytes.)
There is nothing to convert, because all datatypes are represented by bytes internally.
There's no reason this can't be extended to, say, 2048-bit integers for use with RSA.

AES Rijndael and little/big endian?

I am using the public domain reference implementation of AES Rijndael, commonly distributed under the name "rijndael-fst-3.0.zip". I plan to use this to encrypt network data, and I am wondering whether the results of encryption will be different on big/little endian architectures? In other words, can I encrypt a block of 16 bytes on a little endian machine and then decrypt that same block on big endian? And of course, the other way around as well.
If not, how should I go about swapping bytes?
Thanks in advance for your help.
Kind regards.
Byte order issues are relevant only in context of mapping multi-byte constructs to a sequence of bytes e.g. mapping a 4-byte sequence to a signed integer value is sensitive to byte order.
The AES algorithm is byte centric and insensitive to endian issues.
Rijndael is oblivious to byte order; it just sees the string of bytes you feed it. You should do the byte swapping outside of it as you always would (with ntohs or whatever interface your platform has for that purpose).

What encryption algorithm is best for small strings?

I have a string of 10-15 characters and I want to encrypt that string. The problem is I want to get a shortest encrypted string as possible. I will also want to decrypt that string back to its original string.
Which encryption algorithm fits best to this situation?
AES uses a 16-byte block size; it is admirably suited to your needs if your limit of 10-15 characters is firm. The PKCS#11 (IIRC) padding scheme would add 6-1 bytes to the data and generate an output of exactly 16 bytes. You don't really need to use an encryption mode (such as CBC) since you're only encrypting one block. There is an issue of how you'd be handling the keys - there is always an issue of how you handle encryption keys.
If you must go with shorter data lengths for shorter strings, then you probably need to consider AES in CTR mode. This uses the key and a counter to generate a byte stream which is XOR'd with the bytes of the string. It would leave your encrypted string at the same length as the input plaintext string.
You'll be hard pressed to find a general purpose compression algorithm that reliably reduces the length of such short strings, so compressing before encrypting is barely an option.
If it's just one short string, you could use a one-time pad which is mathematically perfect secrecy.
http://en.wikipedia.org/wiki/One-time_pad
Just be sure you don't use the key more than one time.
If the main goal is shortening, I would look for a compression library that allows a fixed dictionary built on a corpus of common strings.
Personally I do not have experience with that, but I bet LZMA can do that.

Resources