What encryption algorithm preserves file differences? - encryption

I would like to encrypt a file with most secure algorithm that also meets the following requirement.
Let's say we have a text file that has 100 Bytes and we encrypt it.
Now we change 1 byte in original file and encrypt again.
If we make a diff of the encrypted files then ideal encryption algorithm should produce the shortest diff possible - e.g. 1 byte.
(Essentially I want to do a incremental backup of encrypted files and minimize bandwidth requirements)

If you use CTR (counter) mode, I believe you will get the result you require.

Related

Fuse-type software decrypting NTFS

I have a large file(70G) which, when decrypted, is an NTFS partition. For time saving i'd like to write software which can decrypt the file on the fly, and serve it as a file system for an user to modify. As far as i have understood, you provide FUSE with directories yourself, and therefore, i would have to manually parse and handle NTFS if i wanted to use FUSE.
What would be a nice and easy way to accomplish on-the-go decryption of an NTFS filesystem, with a custom crypto algorithm(looping long xor key), and serving it as a modifyable file system to a user? Is there something like this which i should be aware of?
Ignoring the fact that the encryption method you have chosen is quite poor in terms of actual security, it is very easy to read from the file at any arbitrary position using modular arithmetic:
If your key was, say, chocolate, then you know that the position in your file mod the length of your key (9) would give the index in the key to begin XORing with.
For example, lets say you want to start decrypting from position 25 in your file. We take 25 mod 9 = 7 and so we start XORing our key with our ciphertext at position 7 in the key (techocolatecho...).
I don't see any reason why you'd want to limit that to NTFS, it just doesn't make sense to me. The most straightforward (and I'd even say simple) way would be to write virtual disk driver (see mass storage device), which would handle just several IRPs (IRP_MJ_CREATE/CLOSE, IRP_MJ_READ/WRITE, IRP_MJ_DEVICE_CONTROL should be enough for such device).
Then you'd have just another volume, where you'd handle reads and writes and the actual storage might be whatever you want. And please try use better encryption than XOR.

Break XOR type encryption with whole Known text from virus

I was hit by a ransomware infection that encrypts the first 512 bytes at the top of the file and puts them at the bottom. Upon looking at the encrypted text it seems to be some type of XOR cipher. I know the whole plain text of one of the files that was encrypted, so i figured in theory i should be able to xor it to get the key to decrypt the rest of my files. Well i am having a very hard time with this because i don't understand how the creator xor'ed it really. Im thinking he would use a binaryreader to read the first 512 bytes into an array, XOR it, and replace it. But does that mean he XOR'ed it in HEX? or Decimal? Im quite confused at this point, but i believe i am simply missing something.
I have tried Xor Tool with python, and everything it attempts to crack looks like non sense. I also tried a python script called Unxor that you give the known plain text to, but the dump file it outputs is always blank.
Good Header file dump:
Good-Header.bin
Encrypted Header file dump:
Enc-Header.bin
This may not be the best file example to see the XOR pattern, but its the only file i have that also has the original header 100% before encryption. In other headers where there is more changes the encrypted header changes with it.
Any advice on a method i should try, or application i should use to try and take this further? Thanks so much for your help!
P.S Stackoverflow yelled at me when i tried to post 4 links because im so new, so if you would rather see the hex dumps on pastebin than download the header files, please let me no. The files are in no way malicious, and are only the extracted 512 bytes and not a whole file.
To recover the keystream XOR the plaintext bytes with the cyphertext bytes. Do this with two different files so you can see if the ransomware is using the same keystream or a different keystream for each file.
If it is using the same keystream (unlikely) then your problem is solved. If the keystreams are different, then your easiest solution is to restore the affected files from backups. You did keep backups, didn't you? Alternatively research the particular infection you have got and see if anyone else has broken that particular variant, so you can derive the key(s) they used and hence regenerate the required keystreams.
If you have a lot of money then a data recovery firm might be able to help you, but they will certainly charge.
A rule of thumb to tell a decent cipher from a toy cipher is to encrypt a highly compressible file and try to compress it in its encrypted form: a dumb cipher will produce a file with a level of entropy similar to that of the original one, so the encrypted file will compress as well as the original one; on the other side, a good cipher (even without an initialization vector) will produce a file that will look like a random garbage and thus will not compress at all.
When I compressed your Enc-Header.bin of 512 bytes with PKZIP, the output was also 512 bytes, so the cipher is not as dumb as you expected — bad luck. (But it does not mean that the malware has no weak spots at all.)

Random access encrypted file

I'm implementing a web based file storage service (C#). The files will be encrypted when stored on the server, but the challenge is how to implement the decryption functionality.
The files can be of any size, from a few KB to several GB. The data transfer is done in chunks, so that the users download the data from say offset 50000, 75000 etc. This works fine for unencrypted files, but if encryption is used the entire file has to be decrypted before each chunk is read from the offset.
So, I am looking at how to solve this. My research so far shows that ECB and CBC can be used. ECB is the most basic (and most insecure), with each block encrypted separately. The way ECB works is pretty much what I am looking for, but security is a concern. CBC is similar, but you need the previous block decrypted before decrypting the current block. This is ok as long as the file is read from beginning to end, and you keep the data as you decrypt it on the server, but at the end of the day its not much better than decrypting the entire file server side before transferring.
Does anyone know of any other alternatives I should consider? I have no code at this point, because I am still only doing theoretical research.
Do not use ECB (Electronic Code Book). Any patterns in your plaintext will appear as patterns in the ciphertext. CBC (Cipher Block Chaining) has random read access (the calling code knows the key and the IV is the previous block's result) but writing a block requires rewriting all following blocks.
A better mode is Counter (CTR). In effect, each block uses the same key and the IV for each block is the sum of offset of that block from a defined start and an initial IV. For example, the IV for block n is IV + n. CTR mode is described in detail on page 15 in NIST SP800-38a. Guidance on key and IV derivation can be found in NIST SP800-108.
There are a few other similar modes such as (Galois Counter Mode) GCM.

Simple way to detect encryption

Is there a simple and quick way to detect encrypted files? I heard about enthropy calculation, but if I calculate it for every file on a drive, it will take days to detect encryption.
Is it possible to, say it, calculate some value for first 100 bytes or 1024 bytes and then decide? Anyone has a sources for that?
I would use a cross-entropy calculation. Calculate the cross-entropy value for X bytes for known encrypted data (it should be near 1, regardless of type of encryption, etc) - you may want to avoid file headers and footers as this may contain non-encrypted file meta data.
Calculate the entropy for a file; if it's close to 1, then it's either encrypted or /dev/random. If it's quite far away from 1, then it's likely not encrypted. I'm sure you could apply signifance tests to this to get a baseline.
It's about 10 lines of Perl; I can't remember what library is used (although, this may be useful: http://dingo.sbs.arizona.edu/~hammond/ling696f-sp03/addonecross.txt)
You could just make a system that recognizes particular common forms of encrypted files (ex: recognize encrypted zip, rar, vim, gpg, ssl, ecryptfs, and truecrypt). Any attempt to determine encryption based on the raw data will quickly run into a steganography discussion.
One of the advantages of good encryption is that you can design it so that it can't be detected - see the Wikipedia article on deniable encryption for example.
Every statistical approach to detect encryption will give you various "false alarms", like
compressed data or random looking data in general.
Imagine I'd write a program that outputs two files: file1 contains 1024 bit of π and file2 is an encrypted version of file1. If you don't know anything about the contents of file1 or file2, there's no way to distinguish them. In fact, it's quite likely that π contains the contents of file2 somewhere!
EDIT:
By the way, it's not even working the other way round (detecting unencrypted files). You could write a program that transforms encrypted data to readable english text by assigning words or whole sentences to bits/bytes of it.

AES Encryption - Key versus IV

The application I am working on lets the user encrypt files. The files could be of any format (spreadsheet, document, presentation, etc.).
For the specified input file, I create two output files - an encrypted data file and a key file. You need both these files to obtain your original data. The key file must work only on the corresponding data file. It should not work on any other file, either from the same user or from any other user.
AES algorithm requires two different parameters for encryption, a key and an initialization vector (IV).
I see three choices for creating the key file:
Embed hard-coded IV within the application and save the key in the key file.
Embed hard-coded key within the application and save the IV in the key file.
Save both the key and the IV in the key file.
Note that it is the same application that is used by different customers.
It appears all three choices would achieve the same end goal. However, I would like to get your feedback on what the right approach should be.
As you can see from the other answers, having a unique IV per encrypted file is crucial, but why is that?
First - let's review why a unique IV per encrypted file is important. (Wikipedia on IV). The IV adds randomness to your start of your encryption process. When using a chained block encryption mode (where one block of encrypted data incorporates the prior block of encrypted data) we're left with a problem regarding the first block, which is where the IV comes in.
If you had no IV, and used chained block encryption with just your key, two files that begin with identical text will produce identical first blocks. If the input files changed midway through, then the two encrypted files would begin to look different beginning at that point and through to the end of the encrypted file. If someone noticed the similarity at the beginning, and knew what one of the files began with, he could deduce what the other file began with. Knowing what the plaintext file began with and what it's corresponding ciphertext is could allow that person to determine the key and then decrypt the entire file.
Now add the IV - if each file used a random IV, their first block would be different. The above scenario has been thwarted.
Now what if the IV were the same for each file? Well, we have the problem scenario again. The first block of each file will encrypt to the same result. Practically, this is no different from not using the IV at all.
So now let's get to your proposed options:
Option 1. Embed hard-coded IV within the application and save the key in the key file.
Option 2. Embed hard-coded key within the application and save the IV in the key file.
These options are pretty much identical. If two files that begin with the same text produce encrypted files that begin with identical ciphertext, you're hosed. That would happen in both of these options. (Assuming there's one master key used to encrypt all files).
Option 3. Save both the key and the IV in the key file.
If you use a random IV for each key file, you're good. No two key files will be identical, and each encrypted file must have it's key file. A different key file will not work.
PS: Once you go with option 3 and random IV's - start looking into how you'll determine if decryption was successful. Take a key file from one file, and try using it to decrypt a different encryption file. You may discover that decryption proceeds and produces in garbage results. If this happens, begin research into authenticated encryption.
The important thing about an IV is you must never use the same IV for two messages. Everything else is secondary - if you can ensure uniqueness, randomness is less important (but still a very good thing to have!). The IV does not need to be (and indeed, in CBC mode cannot be) secret.
As such, you should not save the IV alongside the key - that would imply you use the same IV for every message, which defeats the point of having an IV. Typically you would simply prepend the IV to the encrypted file, in the clear.
If you are going to be rolling your own cipher modes like this, please read the relevant standards. The NIST has a good document on cipher modes here: http://dx.doi.org/10.6028/NIST.SP.800-38A IV generation is documented in Appendix C. Cryptography is a subtle art. Do not be tempted to create variations on the normal cipher modes; 99% of the time you will create something that looks more secure, but is actually less secure.
When you use an IV, the most important thing is that the IV should be as unique as possible, so in practice you should use a random IV. This means embedding it in your application is not an option. I would save the IV in the data file, as it does not harm security as long as the IV is random/unique.
Key/Iv pairs likely the most confused in the world of encryption. Simply put, password = key + iv. Meaning you need matching key and iv to decrypt an encrypted message. The internet seems to imply you only need iv to encrypt and toss it away but its also required to decrypt. The reason for spitting the key/iv values is to make it possible to encrypt same messages with the same key but use different Iv to get unequal encrypted messages. So, Encrypt("message", key, iv) != Encrypt("message", key, differentIv). The idea is to use a new random Iv value every time a message is encrypted. But how do you manage an ever changing Iv value? There's a million possibilities but the most logical way is to embed the 16 byte Iv within the encrypted message itself. So, encrypted = Iv + encryptedMessage. This way the contently changing Iv value can be pulled and removed from the encrypted message then decrypted. So decryptedMessage = Decrypt("messageWithoutIv", key, IvFromEncryptedMessage). Alternatively if storing encrypted messages in a database Iv could be stored in a field there. Although its true Iv is part of the secret, its tiny in comparison to the 32 bit key and is never reused so it is practically safe to expose publicly. Keep in mind, iv has nothing to do with encruotion, it has to do with masking encryption of messages having the same content.
IV is used for increase the security via randomness, but that does not mean it is used by all algorithm, i.e.
The trick thing is how long should the IV be? Usually it is the same size as the block size, or cipher size. For example, AES would have 16 bytes for IV. Besides, IV type can also be selected, i.e. eseqiv, seqiv, chainiv ...

Resources