I'm trying to reverse engineer a file from an application to learn more about the data it is storing on me. Based on the name, it appears to be XML data, but it is obviously either saved in binary or encrypted. I thought it may have been some form of .Net (or other) serialization, and have tried decoding it that way. But, no love. Inspection in hex has not given any clues either.
Maybe someone with more 'skilz' can give me some insight into it. Here is the file
Voted down and answering: the file is exactly N * 16 bytes in size, does not contain any repetition as far as I can see, and it seems to be filled with random bytes. The first bytes seems completely random as well, hinting that this is not a plain protocol.
This would probably hint that the file is AES CBC encrypted. DESede (or any cipher with a 8/16 blocksize) could of couse also have been deployed. Without the key (if any) this all is not going to help you much (if it was, I would not be answering you).
The entropy of first file is high above 7.7 that might indicate encryption. The first 28h bytes (320-bit) of the files match. Is that possible that's the key and the encoded data starts at 28h?
Related
My question is: Is there a reliable way to detect if a hex / base64 string is actually encrypted, or just encoded?
(I did a quick search but I only seem to find whats the difference between encryption and encoding none seems to say how to detect encryption in general...)
I don't need to know what kind of encryption it is, just detect whether it is encrypted or not and send error if not encrypted, thus enforce encryption.
String size may vary from couple of bytes to kilobytes...
Is there a C/C++ library available for that?
If you think you're working with encoded/encrypted plaintext, the most obvious thing to do would be to try and decode with various standard encodings, and see if what you get back looks like plain English, or at least what you're looking for.
Beyond that, there's a few things you could try:
If you had a perfectly encrypted string, it would be indistinguishable from random noise, so if you can see significant correlations in your string, you probably have imperfectly encrypted data, or straight up encoded plaintext.
To find this, you can find the "Index of Coincidence" for the string, or look for repeated blocks of code. If you find repeats, it's either unencrypted, or, if the repeats are multiples of 16 bytes (or another suitable block length) long, then it might be ECB encoded (i.e. with the same 16 bytes key repeated through the data).
I would say your best bet would be to see how random your string is, if it's really hard to find correlations, then it's probably well encrypted. If the same bits of encrypted/encoded text keep popping up, it's probably just encoded.
I was hit by a ransomware infection that encrypts the first 512 bytes at the top of the file and puts them at the bottom. Upon looking at the encrypted text it seems to be some type of XOR cipher. I know the whole plain text of one of the files that was encrypted, so i figured in theory i should be able to xor it to get the key to decrypt the rest of my files. Well i am having a very hard time with this because i don't understand how the creator xor'ed it really. Im thinking he would use a binaryreader to read the first 512 bytes into an array, XOR it, and replace it. But does that mean he XOR'ed it in HEX? or Decimal? Im quite confused at this point, but i believe i am simply missing something.
I have tried Xor Tool with python, and everything it attempts to crack looks like non sense. I also tried a python script called Unxor that you give the known plain text to, but the dump file it outputs is always blank.
Good Header file dump:
Good-Header.bin
Encrypted Header file dump:
Enc-Header.bin
This may not be the best file example to see the XOR pattern, but its the only file i have that also has the original header 100% before encryption. In other headers where there is more changes the encrypted header changes with it.
Any advice on a method i should try, or application i should use to try and take this further? Thanks so much for your help!
P.S Stackoverflow yelled at me when i tried to post 4 links because im so new, so if you would rather see the hex dumps on pastebin than download the header files, please let me no. The files are in no way malicious, and are only the extracted 512 bytes and not a whole file.
To recover the keystream XOR the plaintext bytes with the cyphertext bytes. Do this with two different files so you can see if the ransomware is using the same keystream or a different keystream for each file.
If it is using the same keystream (unlikely) then your problem is solved. If the keystreams are different, then your easiest solution is to restore the affected files from backups. You did keep backups, didn't you? Alternatively research the particular infection you have got and see if anyone else has broken that particular variant, so you can derive the key(s) they used and hence regenerate the required keystreams.
If you have a lot of money then a data recovery firm might be able to help you, but they will certainly charge.
A rule of thumb to tell a decent cipher from a toy cipher is to encrypt a highly compressible file and try to compress it in its encrypted form: a dumb cipher will produce a file with a level of entropy similar to that of the original one, so the encrypted file will compress as well as the original one; on the other side, a good cipher (even without an initialization vector) will produce a file that will look like a random garbage and thus will not compress at all.
When I compressed your Enc-Header.bin of 512 bytes with PKZIP, the output was also 512 bytes, so the cipher is not as dumb as you expected — bad luck. (But it does not mean that the malware has no weak spots at all.)
Goal (General)
My ultimate (long term) goal is to write an importer for a binary file into another application
Question Background
I am interested in two fields within a binary file format. One is
encrypted, and the other is compressed and possibly also encrypted
(See how I arrived at this conclusion here).
I have a viewer program (I'll call it viewer.exe) which can open these files for viewing. I'm hoping this can offer up some clues.
I will (soon) have a correlated deciphered output to compare and have values to search for.
This is the most relevant stackoverflow Q/A I have found
Question Specific
What is the best strategy given the resources I have to identify the algorithm being used?
Current Ideas
I realize that without the key, identifying the algo from just data is practically impossible
Having a file and a viewer.exe, I must have the key somewhere. Whether it's public, private, symmetric etc...that would be nice to figure out.
I would like to disassemble the viewer.exe using OllyDbg with the findcrypt plugin as a first step. I'm just not proficient enough in this kind of thing to accomplish it yet.
Resources
full example file
extracted binary from the field I am interested in
decrypted data In this zip archive there is a binary list of floats representing x,y,z (model2.vertices) and a binary list of integers (model2.faces). I have also included an "stl" file which you can view with many free programs but because of the weird way the data is stored in STL's, this is not what we expect to come out of the original file.
Progress
1. I disassembled the program with Olly, then did the only thing I know how to do at this poing and "searched for all referenced text" after pausing the porgram right before it imports of of the files. Then I searched for words stings like "crypt, hash, AES, encrypt, SHA, etc etc." I came up with a bunch of things, most notably "Blowfish64" which seems to go nicely with the fact that mydata occasionally is 4 bytes too long (and since it is guranteed to be mod 12 = 0) this to me looks like padding for 64 bit block size (odd amounts of vertices result in non mod 8 amounts of bytes). I also found error messages like...
“Invalid data size, (Size-4) mod 8 must be 0"
After reading Igor's response below, here is the output from signsrch. I've updated this image with green dot's which cause no problems when replaced by int3, red if the program can't start, and orange if it fails when loading a file of interest. No dot means I haven't tested it yet.
Accessory Info
Im using windows 7 64 bit
viewer.exe is win32 x86 application
The data is base64 encoded as well as encrypted
The deciphered data is groups of 12 bytes representing 3 floats (x,y,z coordinates)
I have OllyDb v1.1 with the findcrypt plugin but my useage is limited to following along with this guys youtube videos
Many encryption algorithms use very specific constants to initialize the encryption state. You can check if the binary has them with a program like signsrch. If you get any plausible hits, open the file in IDA and search for the constants (Alt-B (binary search) would help here), then follow cross-references to try and identify the key(s) used.
You can't differentiate good encryption (AES with XTS mode for example) from random data. It's not possible. Try using ent to compare /dev/urandom data and TrueCrypt volumes. There's no way to distinguish them from each other.
Edit: Re-reading your question. The best way to determine which symmetric algorithm, hash and mode is being used (when you have a decryption key) is to try them all. Brute-force the possible combinations and have some test to determine if you do successfully decrypt. This is how TrueCrypt mounts a volume. It does not know the algo beforehand so it tries all the possibilities and tests that the first few bytes decrypt to TRUE.
Is there a simple and quick way to detect encrypted files? I heard about enthropy calculation, but if I calculate it for every file on a drive, it will take days to detect encryption.
Is it possible to, say it, calculate some value for first 100 bytes or 1024 bytes and then decide? Anyone has a sources for that?
I would use a cross-entropy calculation. Calculate the cross-entropy value for X bytes for known encrypted data (it should be near 1, regardless of type of encryption, etc) - you may want to avoid file headers and footers as this may contain non-encrypted file meta data.
Calculate the entropy for a file; if it's close to 1, then it's either encrypted or /dev/random. If it's quite far away from 1, then it's likely not encrypted. I'm sure you could apply signifance tests to this to get a baseline.
It's about 10 lines of Perl; I can't remember what library is used (although, this may be useful: http://dingo.sbs.arizona.edu/~hammond/ling696f-sp03/addonecross.txt)
You could just make a system that recognizes particular common forms of encrypted files (ex: recognize encrypted zip, rar, vim, gpg, ssl, ecryptfs, and truecrypt). Any attempt to determine encryption based on the raw data will quickly run into a steganography discussion.
One of the advantages of good encryption is that you can design it so that it can't be detected - see the Wikipedia article on deniable encryption for example.
Every statistical approach to detect encryption will give you various "false alarms", like
compressed data or random looking data in general.
Imagine I'd write a program that outputs two files: file1 contains 1024 bit of π and file2 is an encrypted version of file1. If you don't know anything about the contents of file1 or file2, there's no way to distinguish them. In fact, it's quite likely that π contains the contents of file2 somewhere!
EDIT:
By the way, it's not even working the other way round (detecting unencrypted files). You could write a program that transforms encrypted data to readable english text by assigning words or whole sentences to bits/bytes of it.
I want to provide for the user a service of encrypting some data via symmetric cipher to a file. The user simply provide a key and he/she may provide an initialize vector for the cipher.
Is there a standard how the file should look like? It makes sense to fill the file with the encrypted data and show the corresponding initialize vector in a dialog window. It may seem reasonable to someone else that the initialize vector should be stored in the file with the encrypted data.
The important thing for me is that the result is useful for a user and he/she won't need to bother with adjustment of the result.
Thank for a comment!
It is common practice to provide the IV as the first block of the cyphertext file. That way the receiver just treats the first 8 bytes (DES) or 16 bytes (AES) as the IV and the rest of the file as the actual cyphertext.
Use the same format for the IV as you are using for the cyphertext: Base64, hex, byte data or whatever.
In principle, you can use any format you want, as long as the decrypting part of the program knows how to read it. For efficiency, having the initialization vector before the data seems a good idea.
If you want to encrypt files, a good idea would be to not create your own format (which leads to you having to do decisions like the one here), but use an existing file format (which then also is a cryptographic protocol).
I recommend the OpenPGP message format, as defined in RFC 4880 (or some subset thereof, if you don't need all features). This also has the advantage that your clients then can decrypt your files using any OpenPGP implementation (like pgp or gpg), if your program somehow ceases to work (of course, only if they have the key/password).
you should be fine if you store the IV together with the encrypted data in the file ...