I want to encrypt small serialized data structures (~256 bytes) so I can pass them around (especially in URLs) safely. My current approach is to use a symmetric block cipher, and then to base 64 encode, then URL encode the cipher text. This yields an encoded cipher text that is (unsurprisingly) quite a bit longer than the original data structure. The length of these encoded ciphers is a bit of a usability problem; ideally I'd like the cipher text to be around the same length as the input text.
Is there a block cipher that can be configured to constrain the values of the output bytes to be in the URL-safe range? I assume there would be a security trade-off involved if there is.
For a given key K, a cipher has to produce a different ciphertext for each plaintext. If your message space is 256 bytes, the cipher has to be able to produce at least 256^256 different messages. This will require at least 256 bytes, and any reduction in the size of the output alphabet requires longer messages.
As you've seen, you can do some encoding afterward to avoid certain output symbols, at the cost of increased length. Furthermore, you would pay the same cost if the encoding were part of the encryption algorithm proper. That's why this isn't a feature of any encryption algorithm.
As others have mentioned, the only real answer is to reduce the size of the data you are encrypting so that you need to encode less data. (Either that or don't put the data in url's in the first place e.g. store the data in a database and put a unique id in the url). So compress > encrypt > encode.
If your data structure is 256 bytes long encrypting it with a block cipher of 8 bytes increases it up to 8 bytes (depending of the concrete input length).
Therefore before applying base64 you have up to 264 bytes which are increased by the base64 encoding up to 352 bytes.
Therefore as you can see the most overhead is created by the base64 encoding. There are some slightly more effective encodings available like base91 - but they are very uncommon.
If size matters I would recommend to compress the data before encrypting it.
URL encoding will not significantly expand a base64 encoded string, since 62 of the 64 characters do not need to be modified. However, you can use modified base64 encoding to do a little better. This encoding uses the '-' and '_' characters in place of the '+' and '/' characters to yield a slight efficiency improvement.
The cipher itself is not causing any significant data expansion. It will pad the data to be a multiple of the block length, but that is insignificant in your case. You might try compressing the input prior to encryption. 256 bytes is not much but you might see some improvement.
Related
I intercepted an HTTP request originating from Android to Instagram when I creating account. Here the post data that is sent to the instagram:
e95cef1c47aa4c85ee7555403af92acb80aca9266e8edf77a7fb75b37795c735 {"allow_contacts_sync":"true","sn_result":"API_ERROR:+null","phone_id":"7520e5f4-b4a6-4bd9-a445-972641476fde","_csrftoken":"JlMrKwuiXF6pPB5q98Srx2TZR1MrKCfe","username":"michaelabramobics2","first_name":"Michael","adid":"dac68c0e-4307-4753-8c07-3ea2c26187dd","guid":"fa13e631-1663-49cf-a507-e62dbb03012b","device_id":"android-4d0577bf20b57285","email":"michaelabramobics2#gmail.com","sn_nonce":"bWljaGFlbGFicmFtb2JpY3MyQGdtYWlsLmNvbXwxNTI2NzM1Nzk4fBgiGpUFAo8qZWzGlVPG02r9zOXztwLQnQ==","force_sign_up_code":"","waterfall_id":"52d43d05-7cac-468a-8b10-2f2499eb7cf2","qs_stamp":"","password":"123456789"}
How I can decode this parameter?
"sn_nonce":"bWljaGFlbGFicmFtb2JpY3MyQGdtYWlsLmNvbXwxNTI2NzM1Nzk4fBgiGpUFAo8qZWzGlVPG02r9zOXztwLQnQ=="
Base64 decoding returns:
michaelabramobics2#gmail.com|1526735798|"*elƕSjН
Email|Unixtime| and ?
What is the last value? How can I find out what encoding it is?
I will be very grateful for help.
A nonce is a number used once. Generally the nonce however consists of bytes, and they are often random bytes. It depends on the protocol if it is used a number or if the nonce is just binary data. It is is used as a number it is likely a statically sized, unsigned, big- or sometimes little endian number. But most often the nonce consists of random bytes.
Random bytes, or course, will not display as well as the mail address or the Unix time. Because the bytes are not encoded text, decoding it will generally result in garbage. If the decoded text is Unicode, or if there are unprintable characters then the result is generally shorter than you would expect as bytes are combined or left out entirely.
In hexadecimals the last part reads (converted using the tomeko.net online decoder:
18221A9505028F2A656CC69553C6D36AFDCCE5F3B702D09D
which looks fairly random to me, it's certainly not text in any encoding. The 24 bytes are also a common length for cryptographically secure nonces, keys and such, so that would strengthen the assumption that this is a random nonce.
I am sending some video files (size could be even in GB) as application/x-www-form-urlencodedover HTTP POST.
The following link link suggests that it would be better to transmit it over Multipart form data when we have non-alphanumeric content.
Which encoding would be better to transmit data of this kind?
Also how can I find the length of encoded data (data encoded with application/x-www-form-urlencoded)?
Will encoding the binary data consume much time?
In general, encoding skips the non-alphanumeric characters with some others. So, can we skip encoding for binary data (like video)? How can we skip it?
x-www-form-urlencoded treats the value of an entry in the form data set as a sequence of bytes (octets).
Of the possible 256 values, only 66 are left as it or still encoded as a single byte value, the others are replaced by the hexadecimal representation of the value of their code-point.
This usually takes three to five bytes depending on the encoding.
So in average (256-66)/256 or 74% of the file will be encoded to take three-to-five as much space as originally.
This encoding however has no header nor significant overhead.
multipart/form-data instead works by dividing the data into parts and then finding a string of any length that doesn't occur in said part.
Such string is called the boundary and it is used to delimit the end of the part, that is transmitted as a stream of octects.
So the file is mostly send as it, with negligible size overhead for big enough data.
The draw back is that the user-agent need to find a suitable boundary, however given a string of length k there is only a probability of 2-8k of finding that string in a uniformly generated binary file.
So the user-agent can simply generate a random string and do a quick search and exploit the network transmission time to hide the latency of the search.
You should use multipart/form-data.
This depends on the platform you are using, in general if you cannot access the request body you have to re-perform the encoding your self.
For multipart/form-data encoding there is a little, usually negligible (compared to the transmission time) overhead.
If I encrypt emails so that I can store them in a database, the resulting string is longer than the email itself. Is there a maximum length to this resulting coded string? if so, does it depend on both key length and the email length? I need to know this so I can set my database fields to the correct length.
Thanks.
As Alex K. notes, for block ciphers (like DES), common modes will pad them out to a multiple of the block size. The block size for 3DES is 64-bits (8 bytes). The most common padding scheme is PKCS7, which pads the block with "n x n bytes." This is to say, if you need one bytes of padding, it pads with 0x01. If you need four bytes of padding, it pads with 0x04040404 (4x 4s). If your data is already the right length, it pads with a full block (8 bytes of 0x08 for 3DES).
The short version is that the padded cipher text for 3DES can be up to 8 bytes longer than the plaintext. If your encryption scheme is a typical, insecure implementation, this is the length. The fact that you're using 3DES (an obsolete cipher) makes it a bit more likely that it's also insecurely implemented, and so this is the answer.
But if your scheme is implemented well, then there could be quite a few other things attached to the message. There could be 8 bytes of initialization vector. There could be a salt of arbitrary length if you're using a password. There could be an HMAC. There could be lots of things that could add an arbitrary amount of space. (The RNCryptor format, for example, adds up to 82 bytes to the message.) So you need to know how your format is implemented.
I have a string of 10-15 characters and I want to encrypt that string. The problem is I want to get a shortest encrypted string as possible. I will also want to decrypt that string back to its original string.
Which encryption algorithm fits best to this situation?
AES uses a 16-byte block size; it is admirably suited to your needs if your limit of 10-15 characters is firm. The PKCS#11 (IIRC) padding scheme would add 6-1 bytes to the data and generate an output of exactly 16 bytes. You don't really need to use an encryption mode (such as CBC) since you're only encrypting one block. There is an issue of how you'd be handling the keys - there is always an issue of how you handle encryption keys.
If you must go with shorter data lengths for shorter strings, then you probably need to consider AES in CTR mode. This uses the key and a counter to generate a byte stream which is XOR'd with the bytes of the string. It would leave your encrypted string at the same length as the input plaintext string.
You'll be hard pressed to find a general purpose compression algorithm that reliably reduces the length of such short strings, so compressing before encrypting is barely an option.
If it's just one short string, you could use a one-time pad which is mathematically perfect secrecy.
http://en.wikipedia.org/wiki/One-time_pad
Just be sure you don't use the key more than one time.
If the main goal is shortening, I would look for a compression library that allows a fixed dictionary built on a corpus of common strings.
Personally I do not have experience with that, but I bet LZMA can do that.
What is the minimum number of bits needed to represent a single character of encrypted text.
eg, if I wanted to encrypt the letter 'a', how many bits would I require. (assume there are many singly encrypted characters using the same key.)
Am I right in thinking that it would be the size of the key. eg 256 bits?
Though the question is somewhat fuzzy, first of all it would depend on whether you use a stream cipher or a block cipher.
For the stream cipher, you would get the same number of bits out that you put in - so the binary logarithm of your input alphabet size would make sense. The block cipher requires input blocks of a fixed size, so you might pad your 'a' with zeroes and encrypt that, effectively having the block size as a minimum, like you already proposed.
I'm afraid all the answers you've had so far are quite wrong! It seems I can't reply to them, but do ask if you need more information on why they are wrong. Here is the correct answer:
About 80 bits.
You need a few bits for the "nonce" (sometimes called the IV). When you encrypt, you combine key, plaintext and nonce to produce the ciphertext, and you must never use the same nonce twice. So how big the nonce needs to be depends on how often you plan on using the same key; if you won't be using the key more than 256 times, you can use an 8 bit nonce. Note that it's only the encrypting side that needs to ensure it doesn't use a nonce twice; the decrypting side only needs to care if it cares about preventing replay attacks.
You need 8 bits for the payload, since that's how many bits of plaintext you have.
Finally, you need about 64 bits for the authentication tag. At this length, an attacker has to try on average 2^63 bogus messages minimum before they get one accepted by the remote end. Do not think that you can do without the authentication tag; this is essential for the security of the whole mode.
Put these together using AES in a chaining mode such as EAX or GCM, and you get 80 bits of ciphertext.
The key size isn't a consideration.
You can have the same number of bits as the plaintext if you use a one-time pad.
This is hard to answer. You should definitely first read up on some fundamentals. You can 'encrypt' an 'a' with a single bit (Huffman encoding-style), and of course you could use more bits too. A number like 256 bits without any context is meaningless.
Here's something to get you started:
Information Theory -- esp. check out Shannon's seminal paper
One Time Pad -- infamous secure, but impractical, encryption scheme
Huffman encoding -- not encryption, but demonstrates the above point