Fuse-type software decrypting NTFS - encryption

I have a large file(70G) which, when decrypted, is an NTFS partition. For time saving i'd like to write software which can decrypt the file on the fly, and serve it as a file system for an user to modify. As far as i have understood, you provide FUSE with directories yourself, and therefore, i would have to manually parse and handle NTFS if i wanted to use FUSE.
What would be a nice and easy way to accomplish on-the-go decryption of an NTFS filesystem, with a custom crypto algorithm(looping long xor key), and serving it as a modifyable file system to a user? Is there something like this which i should be aware of?

Ignoring the fact that the encryption method you have chosen is quite poor in terms of actual security, it is very easy to read from the file at any arbitrary position using modular arithmetic:
If your key was, say, chocolate, then you know that the position in your file mod the length of your key (9) would give the index in the key to begin XORing with.
For example, lets say you want to start decrypting from position 25 in your file. We take 25 mod 9 = 7 and so we start XORing our key with our ciphertext at position 7 in the key (techocolatecho...).

I don't see any reason why you'd want to limit that to NTFS, it just doesn't make sense to me. The most straightforward (and I'd even say simple) way would be to write virtual disk driver (see mass storage device), which would handle just several IRPs (IRP_MJ_CREATE/CLOSE, IRP_MJ_READ/WRITE, IRP_MJ_DEVICE_CONTROL should be enough for such device).
Then you'd have just another volume, where you'd handle reads and writes and the actual storage might be whatever you want. And please try use better encryption than XOR.

Related

Encrypting asymmetrically only a part of a large file

I want to store a large file on a publicly accessible service, amazon, bittorrent, ipfs etc.
I want this file to be encrypted.
I know the common practice is to encrypt the file symmetrically with a complex password and then encrypt the password with the recipient public key, but I have a use case I need to deliver the key to each recipient so when the password leaks to public I know who did it.
So what I thought of was to encrypt the whole file with AES CBC then split it to chunks and encrypt only the first chunk asymmetrically.
Are there any logical mistakes in this idea? What should be the minimum size of the first chunk (in bytes or percentage of the whole file) so it's safe to say without decrypting the first one there is no way to decrypt the remaining ones.
Edit
Thanks for the answers
I'll elaborate a little more on the use case.
I'm planning to let users put (sell) files on decentralised storage using my platform (and I have no control over the nodes - lets assume it's global ipfs). To be compliant with the regulations files has to be encrypted and I have to have a way to block the access to it.
Because as stated before I wont be able to delete the files from all the nodes I thought of encrypting the files asymmetrically but this requires preparing a separate copy for each recipient and would take a lot of time.
That's how I came up with the idea of encrypting only a part of the file, moreover this would be done by a re-encryption proxy so the seller would only need to prepare the re-encryption key and the amount of excessive data on the network would be minimal (only one shard per buyer).
Still when the authorities approach me that I'm sharing illegal content I could tell them the file is encrypted and the only guys that downloaded it are these public keys owners.
Apparently some things are misunderstood
have a use case I need to deliver the key to each recipient so when the password leaks to public I know who did it.
Lets assume the file is encrypted with a single symetric encryption key (password in ypur case) . You may encrypt the password using recipients' personal public key, but once the password is released, you have no means to find out who leaked/released it.
split it to chunks and encrypt only the first chunk asymmetrically
that makes no sense / reason (at least I did not find any reason why this would help you to achive the stated use case)
note: the reason why hybrid encryption is used is that asymmetric encryption (RSA) is feasible to encrypt only limited amount of data (e. g. symmetric encr. key)
your problem is not solvable by the means of classic cryptography
when we take a look at your problem one might think your usecase is like so often in cryptography: confidentiality, but it is not
confidentiality in a cryptographic context means: helping n parties to keep a secret
that means, all of the original n parties share the common interest of keeping that secret ...
in your case, you suspect at least one of the parties not to share this interest ... this is where classical crypto attempts will fail to solve your problem ...
pay tv companies learned this the hard way ... their solution seemingly is to replace the content keys faster than a group of rouge actors can share the needed keys for live decryption and to manage access to the content keys by encrypting them with group keys, which are partitioned and distributet along all legitimate clients ... that only "works" (read "not really if you put in enough effort") for large dynamic content streams, not for a static file ...
your use case sounds more like digital watermarking and fingerprinting

Cryptography: Mixing CBC and CTR?

I have some offline files that have to be password-protected. My strategy is as follows:
Cipher Algorithm: AES, 128-bit block, 256-bit key (PBKDF2-SHA-256
10000 iterations with a random salt stored plainly elsewhere)
Whole file is divided into pages with page size 1024 bytes
For a complete page, CBC is used
For an incomplete page,
Use CBC with cipher text stealing if it has at least one block
Use CTR if it has less one block
With this setup, we can keep the same file size
IV or nonce will be based on the salt and deterministic. Since this is not for network communication, I reckon we don't need to concern about replay attacks?
Question: Will this kind of mixing lower the security? Would we better off just use CTR throughout the whole file?
You're better off just using CTR for the entire file. Otherwise, you're adding a lot of extra work, in supporting multiple modes (CBC, CTR, and CTS) and determining which mode to use. It's not clear there's any value in doing so, since CTR is perfectly fine for encrypting a large amount of data.
Are you planning on reusing the same IV for each page? You should expand a bit on what you mean by a page, but I'd recommend unique IV's for each page. Are these pages addressable somehow? You might want to look into some of the new disk encryption modes for an idea on generating unique IV's
You also really need to MAC your data. In CTR for example, if someone flips a bit of the ciphertext, it'll flip the bit when you decrypt, and you'll never know it was tampered with. You can use HMAC or if you want to simplify your entire scheme, use AES GCM mode, which combines CTR for encryption and GMAC for integrity
There are a few things you need to know about CTR mode. After you know them all you could happily apply a stream cipher in your situation:
never ever reuse a data key with the same nonce;
above, not even in time;
be aware that CTR mode really shows the size of the encrypted data; always encrypting full blocks can hide this somewhat (in general a 1024 byte block takes as much as a single bit block if the file system boundaries are honored);
CTR mode in itself does not provide authentication (for completion, as this was already discussed);
If you don't keep to the first two rules, an attacker will immediately see the place of the edit and the attacker will be able to retrieve data directly related to the plain text.
On a possitive node:
you can happily use the offset (in, e.g., blocks) in the file to be part of the nonce;
it is very easy to seek in files, buffer ciphertext and create multi-threaded code around CTR.
And in general:
it pays off to use a data specific key specific sets of files, in such a way that if a key is compromised or changed that you don't have to re-encrypt everything;
think very well about how your keys are used, stored, backed up etc. Key management is the hardest part;

Disassemble to identify encryption algorithm

Goal (General)
My ultimate (long term) goal is to write an importer for a binary file into another application
Question Background
I am interested in two fields within a binary file format. One is
encrypted, and the other is compressed and possibly also encrypted
(See how I arrived at this conclusion here).
I have a viewer program (I'll call it viewer.exe) which can open these files for viewing. I'm hoping this can offer up some clues.
I will (soon) have a correlated deciphered output to compare and have values to search for.
This is the most relevant stackoverflow Q/A I have found
Question Specific
What is the best strategy given the resources I have to identify the algorithm being used?
Current Ideas
I realize that without the key, identifying the algo from just data is practically impossible
Having a file and a viewer.exe, I must have the key somewhere. Whether it's public, private, symmetric etc...that would be nice to figure out.
I would like to disassemble the viewer.exe using OllyDbg with the findcrypt plugin as a first step. I'm just not proficient enough in this kind of thing to accomplish it yet.
Resources
full example file
extracted binary from the field I am interested in
decrypted data In this zip archive there is a binary list of floats representing x,y,z (model2.vertices) and a binary list of integers (model2.faces). I have also included an "stl" file which you can view with many free programs but because of the weird way the data is stored in STL's, this is not what we expect to come out of the original file.
Progress
1. I disassembled the program with Olly, then did the only thing I know how to do at this poing and "searched for all referenced text" after pausing the porgram right before it imports of of the files. Then I searched for words stings like "crypt, hash, AES, encrypt, SHA, etc etc." I came up with a bunch of things, most notably "Blowfish64" which seems to go nicely with the fact that mydata occasionally is 4 bytes too long (and since it is guranteed to be mod 12 = 0) this to me looks like padding for 64 bit block size (odd amounts of vertices result in non mod 8 amounts of bytes). I also found error messages like...
“Invalid data size, (Size-4) mod 8 must be 0"
After reading Igor's response below, here is the output from signsrch. I've updated this image with green dot's which cause no problems when replaced by int3, red if the program can't start, and orange if it fails when loading a file of interest. No dot means I haven't tested it yet.
Accessory Info
Im using windows 7 64 bit
viewer.exe is win32 x86 application
The data is base64 encoded as well as encrypted
The deciphered data is groups of 12 bytes representing 3 floats (x,y,z coordinates)
I have OllyDb v1.1 with the findcrypt plugin but my useage is limited to following along with this guys youtube videos
Many encryption algorithms use very specific constants to initialize the encryption state. You can check if the binary has them with a program like signsrch. If you get any plausible hits, open the file in IDA and search for the constants (Alt-B (binary search) would help here), then follow cross-references to try and identify the key(s) used.
You can't differentiate good encryption (AES with XTS mode for example) from random data. It's not possible. Try using ent to compare /dev/urandom data and TrueCrypt volumes. There's no way to distinguish them from each other.
Edit: Re-reading your question. The best way to determine which symmetric algorithm, hash and mode is being used (when you have a decryption key) is to try them all. Brute-force the possible combinations and have some test to determine if you do successfully decrypt. This is how TrueCrypt mounts a volume. It does not know the algo beforehand so it tries all the possibilities and tests that the first few bytes decrypt to TRUE.

Simple way to detect encryption

Is there a simple and quick way to detect encrypted files? I heard about enthropy calculation, but if I calculate it for every file on a drive, it will take days to detect encryption.
Is it possible to, say it, calculate some value for first 100 bytes or 1024 bytes and then decide? Anyone has a sources for that?
I would use a cross-entropy calculation. Calculate the cross-entropy value for X bytes for known encrypted data (it should be near 1, regardless of type of encryption, etc) - you may want to avoid file headers and footers as this may contain non-encrypted file meta data.
Calculate the entropy for a file; if it's close to 1, then it's either encrypted or /dev/random. If it's quite far away from 1, then it's likely not encrypted. I'm sure you could apply signifance tests to this to get a baseline.
It's about 10 lines of Perl; I can't remember what library is used (although, this may be useful: http://dingo.sbs.arizona.edu/~hammond/ling696f-sp03/addonecross.txt)
You could just make a system that recognizes particular common forms of encrypted files (ex: recognize encrypted zip, rar, vim, gpg, ssl, ecryptfs, and truecrypt). Any attempt to determine encryption based on the raw data will quickly run into a steganography discussion.
One of the advantages of good encryption is that you can design it so that it can't be detected - see the Wikipedia article on deniable encryption for example.
Every statistical approach to detect encryption will give you various "false alarms", like
compressed data or random looking data in general.
Imagine I'd write a program that outputs two files: file1 contains 1024 bit of π and file2 is an encrypted version of file1. If you don't know anything about the contents of file1 or file2, there's no way to distinguish them. In fact, it's quite likely that π contains the contents of file2 somewhere!
EDIT:
By the way, it's not even working the other way round (detecting unencrypted files). You could write a program that transforms encrypted data to readable english text by assigning words or whole sentences to bits/bytes of it.

How to distribute init. vector with data encrypted via a symmetric cipher?

I want to provide for the user a service of encrypting some data via symmetric cipher to a file. The user simply provide a key and he/she may provide an initialize vector for the cipher.
Is there a standard how the file should look like? It makes sense to fill the file with the encrypted data and show the corresponding initialize vector in a dialog window. It may seem reasonable to someone else that the initialize vector should be stored in the file with the encrypted data.
The important thing for me is that the result is useful for a user and he/she won't need to bother with adjustment of the result.
Thank for a comment!
It is common practice to provide the IV as the first block of the cyphertext file. That way the receiver just treats the first 8 bytes (DES) or 16 bytes (AES) as the IV and the rest of the file as the actual cyphertext.
Use the same format for the IV as you are using for the cyphertext: Base64, hex, byte data or whatever.
In principle, you can use any format you want, as long as the decrypting part of the program knows how to read it. For efficiency, having the initialization vector before the data seems a good idea.
If you want to encrypt files, a good idea would be to not create your own format (which leads to you having to do decisions like the one here), but use an existing file format (which then also is a cryptographic protocol).
I recommend the OpenPGP message format, as defined in RFC 4880 (or some subset thereof, if you don't need all features). This also has the advantage that your clients then can decrypt your files using any OpenPGP implementation (like pgp or gpg), if your program somehow ceases to work (of course, only if they have the key/password).
you should be fine if you store the IV together with the encrypted data in the file ...

Resources