I have a bank device that is used to do money transfers among accounts. I'm curious about trying to figure out what algorithm is used to cipher the input. It follows the next conditions
-The text that needs to be cipher consists of only 8 digits (e.g. 00000000, 00000001, 99999999, etc)
-It uses some key/password to cipher this input
-The key/pass is a string of n characters
-The output should be of the same length (8 digits)
Is there any standard crypto algos for this?
Yes, this is so-called Format-preserving encryption. And yes, it's standartized.
https://doi.org/10.6028%2FNIST.SP.800-38G
https://en.wikipedia.org/wiki/Format-preserving_encryption
I have an API to call where I have to encrypt my data using RSA/ECB/PKCS1 Padding & AES/CBC/PKCS5PADDING.
Sample Data: {"KEY":"VALUE"}
Step.1:
I have to generate a random number of 16 digit. eg: '1234567890123456'
Step.2:
Do RSA/ECB/PKCS1Padding to random number and base64Encode the result. we get "encrypted_key"
Step.3:
Concatenate random number & data:
DATA = 1234567890123456{"KEY":"VALUE"}
Step.4:
Do AES/CBC/PKCS5Padding on DATA (from Step 3) using random number(1234567890123456) as KEY & Base64Encoded random number as IV. we get "ENCRYPTED_DATA"
So, for Step 1 I am using JSEncrypt javascript library.
for Step 4 I am using CrytoJS.AES.encrypt() function. I am pretty sure that my JSEncrypt function is running fine as the client is able to decrypt it but client is not able to decrypt my data. I feel that I am making a mistake while using CryptoJS.
Can someone guide me properly on how to use the library.
What I am doing is:
KEY = '1234567890123456'
IV = MTIzNDU2Nzg5MDEyMzQ1Ng== (result of btoa('1234567890123456') )
DATA = "1234567890123456{"KEY":"VAL"}"
cryptedData = Crypto.AES.encrypt(DATA, KEY, {iv: IV, mode: CryptoJS.mode.CBC,padding:CryptoJS.pad.Pkcs7})
I am told to use PKCS5Padding in AES/CBC Encryption ( Step 4 ) but it seems that AES does not support PKCS5Padding but PKCS7Padding.
I think I am making a mistake in the way I am passing KEY & IV to CryptoJS.
Any help will be greatly appreciated.
For the start lets see why are you doing the exercise. RSA is intended to encode only limited amout of data. So we use "hybrid encryption", where the data are encrypted using a symmetric cipher with a random key and the key itself is encrypted using RSA
Encryption works on binary data, to safely transmit binary data, the data are encoded to printable form (hex or base64)
Step.1: I have to generate a random number of 16 digit
What we see is 16 digits 0-9. That's not really safe. Generating 16 digits you will get a key of 10^16, which is equals of approx 2^53 (if I did the math wrong, please comment).
You need to generate 16 random bytes (digits 0-256 resulting in 2^128 key). That is your DEK (data encryption key).
You may encode the DEK to be in printable form, in hexadecimal encoding it will have 32 characters.
Step.2:
ok, you now get encrypted encoded_encryption_key
Step 3, Step 4
And here you should understand what are you doing.
encrypt DATA using DEK ( not encoded random number in binary form), you will get encrypted_data. You can encode the result to encoded_encrypted_data
concatenate the encrypted key and encrypted data. It. is up to you to choose if you encode it before or after encoding. I suggest you make concatenation of encoded_encryption_key and encoded_encrypted_data with some separator, because if RSA key length changes, the length of encoded_encryption_key changes too
Make sure to discuss with the client what format is expected exactly.
Notes:
IV needs to be 16 bytes long for AES and for CryptoJS I believe it needs to be Hex encoded, so using btoa may not be the best idea. I believe the CryptoJS just trims the value to 16 bytes, but formally it is not correct.
CBC cipher needs some sort of integrity check, I suggest to add some HMAC or signature to the result (otherwise someone could change the ciphertext without you being able to detect the tamper)
but it seems that AES does not support PKCS5Padding but PKCS7Padding.
Indeed AES supports Pkcs7. Pkcs5 is functionally the same, but defined on 64 blocks. The designation is still used in Java as heritage from DES encryption.
How can I decrypt a random block of encrypted data using aes-cbc?
First information:
AES has a block size of 128-bits. So when you say "AES-128" the assumption is a key size of 128-bits and "AES-256" the assumption is a key size of 256-bits.
CBC mode requires an iv. The iv is used for the first block, each other block use the value of the previous block in a similar way. See Cipher Block Chaining (CBC).
Decryption must be done on a block boundary. The first block will use the iv, subsequent blocks will use the value of the previous encrypted block essentially for it's iv. Thus decryption can start from other than the beginning of the encrypted data.
An assumption on my part is that gpg and openssl place the iv preceding the encrypted data, that is usual procedure but this is a guess by me but may be more complicated (I am to lazy to look that up). This would explain why decryption from the first block would work and not from other starting locations.
For more information study the available documentation.
There is a good online AES calculator provided by Cryptomathic.
With info in zaph's answer I was able to do it in python like this:
from os import urandom
from Crypto.Cipher import AES
import hashlib
IV = urandom(16)
aes = AES.new(hashlib.sha256(b'123').digest(), AES.MODE_CBC, IV)
T='1234567890'*160
C=aes.encrypt(T)
# Now if we make a new aes instance with IV we will be able to decrypt first block:
aes = AES.new(hashlib.sha256(b'123').digest(), AES.MODE_CBC, IV)
aes.decrypt(q[:16]) # It returns b'1234567890123456'
# But if we need to decrypt block 4 we need to instanciate aes with contents of block 3 as iv parameter:
aes = AES.new(hashlib.sha256(b'123').digest(), AES.MODE_CBC, q[48:64])
aes.decrypt(q[64:80]) # It returns b'5678901234567890'
So to sum it up if you want to decrypt some encrypted text using aes-cbc from block n to block m for example (which is bytes n×16 to m×16), you need data from block n-1 (bytes (n-1)×16 to (n×16)-1) as IV to start decryption on block n. This way you can decrypt any chunk of data even though you don't have access to whole data except for its first block (first 16 bytes).
I was wondering:
1) if I compute the digest of some datas with SHA-512 => resulting in a hash of 64 bytes
2) and then I sign this hash with RSA-1024 => so a block of 128 bytes, which is bigger than the 64 bytes of the digest
=> does it mean in the end my signed hash will be exactly 128 bytes?
Thanks a lot for any info.
With RSA, as specified by PKCS#1, the data to be signed is first hashed with a hash function, then the result is padded (a more or less complex operation which transforms the hash result into a modular integer), and then the mathematical operation of RSA is applied on that number. The result is a n-bit integer, where n is the length in bits of the "modulus", usually called "the RSA key size". Basically, for RSA-1024, n is 1024. A 1024-bit integer is encoded as 128 bytes, exactly, as per the encoding method described in PKCS#1 (PKCS#1 is very readable and not too long).
Whether a n-bit RSA key can be used to sign data with a hash function which produces outputs of length m depends on the details of the padding. As the name suggests, padding involves adding some extra data around the hash output, hence n must be greater than m, leaving some room for the extra data. A 1024-bit key can be used with SHA-512 (which produces 512-bit strings). You could not use a 640-bit key with SHA-512 (and you would not, anyway, since 640-bit RSA keys can be broken -- albeit not trivially).
If an encrypted file exists and someone wants to decrypt it, there are several methods do try.
For example, if you would chose a brute force attack, that's easy: just try all possible keys and you will find the correct one. For this question, it doesn't matter that this might take too long.
But trying keys means the following steps:
Chose key
Decrypt data with key
Check if decryption was successful
Besides the problem that you would need to know the algorithm that was used for the encryption, I cannot imagine how one would do #3.
Here is why: After decrypting the data, I get some "other" data. In case of an encrypted plain text file in a language that I can understand, I can now check if the result is a text in that langauge.
If it would be a known file type, I could check for specific file headers.
But since one tries to decrypt something secret, it is most likely unknown what kind of information there will be if correctly decrypted.
How would one check if a decryption result is correct if it is unknown what to look for?
Like you suggest, one would expect the plaintext to be of some know format, e.g., a JPEG image, a PDF file, etc. The idea would be that it is very unlikely that a given ciphertext can be decrypted into both a valid JPEG image and a valid PDF file (but see below).
But it is actually not that important. When one talks about a cryptosystem being secure, one (roughly) talks about the odds of you being able to guess the plaintext corresponding to a given ciphertext. So I pick a random message m and encrypts it c = E(m). I give you c and if you cannot guess m, then we say the cryptosystem is secure, otherwise it's broken.
This is just a simple security definition. There are other definitions that require the system to be able to hide known plaintexts (semantic security): you give me two messages, I encrypt one of them, and you will not be able to tell which message I chose.
The point is, that in these definitions, we are not concerned with the format of the plaintexts, all we require is that you cannot guess the plaintext that was encrypted. So there is no step 3 :-)
By not considering your step 3, we make the question of security as clear as possible: instead of arguing about how hard it is to guess which format you used (zip, gzip, bzip2, ...) we only talk about the odds of breaking the system compared to the odds of guessing the key. It is an old principle that you should concentrate all your security in the key -- it simplifies things dramatically when your only assumption is the secrecy of the key.
Finally, note that some encryption schemes makes it impossible for you to verify if you have the correct key since all keys are legal. The one-time pad is an extreme example such a scheme: you take your plaintext m, choose a perfectly random key k and compute the ciphertext as c = m XOR k. This gives you a completely random ciphertext, it is perfectly secure (the only perfectly secure cryptosystem, btw).
When searching for an encryption key, you cannot know when you've found the right one. This is because c could be an encryption of any file with the same length as m: if you encrypt the message m' with the key *k' = c XOR m' you'll see that you get the same ciphertext again, thus you cannot know if m or m' was the original message.
Instead of thinking of exclusive-or, you can think of the one-time pad like this: I give you the number 42 and tell you that is is the sum of two integers (negative, positive, you don't know). One integer is the message, the other is the key and 42 is the ciphertext. Like above, it makes no sense for you to guess the key -- if you want the message to be 100, you claim the key is -58, if you want the message to be 0, you claim the key is 42, etc. One time pad works exactly like this, but on bit values instead.
About reusing the key in one-time pad: let's say my key is 7 and you see the ciphertexts 10 and 20, corresponding to plaintexts 3 and 13. From the ciphertexts alone, you now know that the difference in plaintexts is 10. If you somehow gain knowledge of one of the plaintext, you can now derive the other! If the numbers correspond to individual letters, you can begin looking at several such differences and try to solve the resulting crossword puzzle (or let a program do it based on frequency analysis of the language in question).
You could use heuristics like the unix
file
command does to check for a known file type. If you have decrypted data that has no recognizable type, decrypting it won't help you anyway, since you can't interpret it, so it's still as good as encrypted.
I wrote a tool a little while ago that checked if a file was possibly encrypted by simply checking the distribution of byte values, since encrypted files should be indistinguishable from random noise. The assumption here then is that an improperly decrypted file retains the random nature, while a properly decrypted file will exhibit structure.
#!/usr/bin/env python
import math
import sys
import os
MAGIC_COEFF=3
def get_random_bytes(filename):
BLOCK_SIZE=1024*1024
BLOCKS=10
f=open(filename)
bytes=list(f.read(BLOCK_SIZE))
if len(bytes) < BLOCK_SIZE:
return bytes
f.seek(0, 2)
file_len = f.tell()
index = BLOCK_SIZE
cnt=0
while index < file_len and cnt < BLOCKS:
f.seek(index)
more_bytes = f.read(BLOCK_SIZE)
bytes.extend(more_bytes)
index+=ord(os.urandom(1))*BLOCK_SIZE
cnt+=1
return bytes
def failed_n_gram(n,bytes):
print "\t%d-gram analysis"%(n)
N = len(bytes)/n
states = 2**(8*n)
print "\tN: %d states: %d"%(N, states)
if N < states:
print "\tinsufficient data"
return False
histo = [0]*states
P = 1.0/states
expected = N/states * 1.0
# I forgot how this was derived, or what it is suppose to be
magic = math.sqrt(N*P*(1-P))*MAGIC_COEFF
print "\texpected: %f magic: %f" %(expected, magic)
idx=0
while idx<len(bytes)-n:
val=0
for x in xrange(n):
val = val << 8
val = val | ord(bytes[idx+x])
histo[val]+=1
idx+=1
count=histo[val]
if count - expected > magic:
print "\tfailed: %s occured %d times" %( hex(val), count)
return True
# need this check because the absence of certain bytes is also
# a sign something is up
for i in xrange(len(histo)):
count = histo[i]
if expected-count > magic:
print "\tfailed: %s occured %d times" %( hex(i), count)
return True
print ""
return False
def main():
for f in sys.argv[1:]:
print f
rand_bytes = get_random_bytes(f)
if failed_n_gram(3,rand_bytes):
continue
if failed_n_gram(2,rand_bytes):
continue
if failed_n_gram(1,rand_bytes):
continue
if __name__ == "__main__":
main()
I find this works reasonable well:
$ entropy.py ~/bin/entropy.py entropy.py.enc entropy.py.zip
/Users/steve/bin/entropy.py
1-gram analysis
N: 1680 states: 256
expected: 6.000000 magic: 10.226918
failed: 0xa occured 17 times
entropy.py.enc
1-gram analysis
N: 1744 states: 256
expected: 6.000000 magic: 10.419895
entropy.py.zip
1-gram analysis
N: 821 states: 256
expected: 3.000000 magic: 7.149270
failed: 0x0 occured 11 times
Here .enc is the source ran through:
openssl enc -aes-256-cbc -in entropy.py -out entropy.py.enc
And .zip is self-explanatory.
A few caveats:
It doesn't check the entire file, just the first KB, then random blocks from the file. So if a file was random data appended with say a jpeg, it will fool the program. The only way to be sure if to check the entire file.
In my experience, the code reliably detects when a file is unencrypted (since nearly all useful data has structure), but due to its statistical nature may sometimes misdiagnose an encrypted/random file.
As it has been pointed out, this kind of analysis will fail for OTP, since you can make it say anything you want.
Use at your own risk, and most certainly not as the only means of checking your results.
One of the ways is compressing the source data with some standard algorithm like zip. If after decryption you can unzip the result - it's decrypted right. Compression is almost usually done by encryption programs prior to encryption - because it's another step the bruteforcer will need to repeat for each trial and lose time on it and because encrypted data is almost surely uncompressible (size doesn't decrease after compression with a chained algorithm).
Without a more clearly defined scenario, I can only point to cryptanalysis methods. I would say it's generally accepted that validating the result is an easy part of cryptanalysis. In comparison to decrypting even a known cypher, a thorough validation check costs little cpu.
are you seriously asking questions like this?
well if it was known whats inside then you would not need to decrypt it anywayz right?
somehow this has nothing to do with programming question its more mathematical. I took some encryption math classes at my university.
And you can not confirm without a lot of data points.
Sure if your result makes sense and its clear it is meaningful in plain english (or whatever language is used) but to answer your question.
If you were able to decrypt you should be able to encrypt as well.
So encrypt the result using reverse process of decryption and if you get same results you might be golden...if not something is possibly wrong.