Encryption and Compression - encryption

While the compression of plain text the size was reduced approx. 2 times but after encryption of the same text and compression the sizes where almost equal. Why the size of encrypted does not differ from origin text file?

Encryption is designed to make the encrypted data appear as a random stream of bytes. Random streams do not compress because they do not have any internal patterns for the compression algorithm to work on.
If you want both compression and encryption then always compress the plain data and then encrypt the compressed data.

Related

What to chose application/x-www-form-urlencoded / multipart/form-data for file size in GB?

I am sending some video files (size could be even in GB) as application/x-www-form-urlencodedover HTTP POST.
The following link link suggests that it would be better to transmit it over Multipart form data when we have non-alphanumeric content.
Which encoding would be better to transmit data of this kind?
Also how can I find the length of encoded data (data encoded with application/x-www-form-urlencoded)?
Will encoding the binary data consume much time?
In general, encoding skips the non-alphanumeric characters with some others. So, can we skip encoding for binary data (like video)? How can we skip it?
x-www-form-urlencoded treats the value of an entry in the form data set as a sequence of bytes (octets).
Of the possible 256 values, only 66 are left as it or still encoded as a single byte value, the others are replaced by the hexadecimal representation of the value of their code-point.
This usually takes three to five bytes depending on the encoding.
So in average (256-66)/256 or 74% of the file will be encoded to take three-to-five as much space as originally.
This encoding however has no header nor significant overhead.
multipart/form-data instead works by dividing the data into parts and then finding a string of any length that doesn't occur in said part.
Such string is called the boundary and it is used to delimit the end of the part, that is transmitted as a stream of octects.
So the file is mostly send as it, with negligible size overhead for big enough data.
The draw back is that the user-agent need to find a suitable boundary, however given a string of length k there is only a probability of 2-8k of finding that string in a uniformly generated binary file.
So the user-agent can simply generate a random string and do a quick search and exploit the network transmission time to hide the latency of the search.
You should use multipart/form-data.
This depends on the platform you are using, in general if you cannot access the request body you have to re-perform the encoding your self.
For multipart/form-data encoding there is a little, usually negligible (compared to the transmission time) overhead.

Cipher to generate URL safe ciphertext without encoding

I want to encrypt small serialized data structures (~256 bytes) so I can pass them around (especially in URLs) safely. My current approach is to use a symmetric block cipher, and then to base 64 encode, then URL encode the cipher text. This yields an encoded cipher text that is (unsurprisingly) quite a bit longer than the original data structure. The length of these encoded ciphers is a bit of a usability problem; ideally I'd like the cipher text to be around the same length as the input text.
Is there a block cipher that can be configured to constrain the values of the output bytes to be in the URL-safe range? I assume there would be a security trade-off involved if there is.
For a given key K, a cipher has to produce a different ciphertext for each plaintext. If your message space is 256 bytes, the cipher has to be able to produce at least 256^256 different messages. This will require at least 256 bytes, and any reduction in the size of the output alphabet requires longer messages.
As you've seen, you can do some encoding afterward to avoid certain output symbols, at the cost of increased length. Furthermore, you would pay the same cost if the encoding were part of the encryption algorithm proper. That's why this isn't a feature of any encryption algorithm.
As others have mentioned, the only real answer is to reduce the size of the data you are encrypting so that you need to encode less data. (Either that or don't put the data in url's in the first place e.g. store the data in a database and put a unique id in the url). So compress > encrypt > encode.
If your data structure is 256 bytes long encrypting it with a block cipher of 8 bytes increases it up to 8 bytes (depending of the concrete input length).
Therefore before applying base64 you have up to 264 bytes which are increased by the base64 encoding up to 352 bytes.
Therefore as you can see the most overhead is created by the base64 encoding. There are some slightly more effective encodings available like base91 - but they are very uncommon.
If size matters I would recommend to compress the data before encrypting it.
URL encoding will not significantly expand a base64 encoded string, since 62 of the 64 characters do not need to be modified. However, you can use modified base64 encoding to do a little better. This encoding uses the '-' and '_' characters in place of the '+' and '/' characters to yield a slight efficiency improvement.
The cipher itself is not causing any significant data expansion. It will pad the data to be a multiple of the block length, but that is insignificant in your case. You might try compressing the input prior to encryption. 256 bytes is not much but you might see some improvement.

What encryption algorithm preserves file differences?

I would like to encrypt a file with most secure algorithm that also meets the following requirement.
Let's say we have a text file that has 100 Bytes and we encrypt it.
Now we change 1 byte in original file and encrypt again.
If we make a diff of the encrypted files then ideal encryption algorithm should produce the shortest diff possible - e.g. 1 byte.
(Essentially I want to do a incremental backup of encrypted files and minimize bandwidth requirements)
If you use CTR (counter) mode, I believe you will get the result you require.

Editing binary files format and headers

I am working on e-mail security project ... It encrypts message text and attachments
I use AES 128-bit key ... the problem that it takes significant long time to encrypt large files (> 3mb ) ... for txt files I can compress it and encrypt it, but for binary files (pdf, jpg, exe) compression doesn't help (just get size >= 75% of original file)
So am thinking just to encrypt the header of binary file, how I know header size of binary file in windows?
.NET has built-in AES support. Maybe you were using it in the wrong way.

Does AES (128 or 256) encryption expand the data? If so, by how much?

I would like to add AES encryption to a software product, but am concerned by increasing the size of the data. I am guessing that the data does increase in size, and then I'll have to add a compression algorithm to compensate.
AES does not expand data. Moreover, the output will not generally be compressible; if you intend to compress your data, do so before encrypting it.
However, note that AES encryption is usually combined with padding, which will increase the size of the data (though only by a few bytes).
AES does not expand the data, except for a few bytes of padding at the end of the last block.
The resulting data are not compressible, at any rate, because they are basically random - no dictionary-based algorithm is able to effectively compress them. A best practice is to compress the data first, then encrypt them.
It is common to compress data before encrypting. Compressing it afterwards doesn't work, because AES encrypted data appears random (as for any good cipher, apart from any headers and whatnot).
However, compression can introduce side-channel attacks in some contexts, so you must analyse your own use. Such attacks have recently been reported against encrypted VOIP: the gist is that different syllables create characteristic variations in bitrate when compressed with VBR, because some sounds compress better than others. Some (or all) syllables may therefore be recoverable with sufficient analysis, since the data is transmitted at the rate it is generated. The fix is to either to use (less efficient) CBR compression, or to use a buffer to transmit at constant rate regardless of the data rate coming out of the encoder (increasing latency).
AES turns 16 byte input blocks into 16 byte output blocks. The only expansion is to round the data up to a whole number of blocks.
I am fairly sure AES encryption adds nothing to the data being encrypted, since that would give away information about the state variables, and that is a Bad Thing when it comes to cryptography.
If you want to mix compression and encryption, do them in that order. The reason is encrypted data (ideally) looks like totally random data, and compression algorithms will end up making the data bigger, due to its inability to actually compress any of it and overhead of book keeping that comes with any compressed file format.
If compression is necessary do it before you encrypt.
No. The only change will be a small amount of padding to align the data to the size of a block
However, if you are compressing the content note that you should do this before encrypting. Encrypted data should generally be indistinguishable from random data, which means that it will not compress.
#freespace and others: One of the things I remember from my cryptography classes is that you should not compress your data before encryption, because some repeatable chunks of compressed stream (like section headers for example) may make it easier to crack your encryption.

Resources