Related
This is a Vigenere cipher-text
EORLL TQFDI HOEZF CHBQN IFGGQ MBVXM SIMGK NCCSV
WSXYD VTLQS BVBMJ YRTXO JCNXH THWOD FTDCC RMHEH
SNXVY FLSXT ICNXM GUMET HMTUR PENSU TZHMV LODGN
MINKA DTLOG HEVNI DXQUG AZGRM YDEXR TUYRM LYXNZ
ZGJ
The index of coincidence gave a shift of six (6): I know this is right (I used an online Java applet to decrypt the whole thing using the key 'QUARTZ').
However, in this question we are only told the first and last two letters of the Key - 'Q' and 'TZ.'
So far I have split the ciphertext into slices using this awesome applet. So the first slice is 0, k, 2k, 3k, 4k; the second is 1, k + 1, 2k + 1, 3k + 1; et cetera.
KeyPos=0: EQEQQSCXQJJHDEYIUTSVMTVUMTYJ
KeyPos=1: OFZNMICYSYCWCHFCMUULILNGYUX
KeyPos=2: RDFIBMSDBRNOCSLNERTONOIADYN
KeyPos=3: LICFVGVVVTXDRNSXTPZDKGDZERZ
KeyPos=4: LHHGXKWTBXHFMXXMHEHGAHXGXMZ
KeyPos=5: TOBGMNSLMOTTHVTGMNMNDEQRRLG
My idea was to calculate the highest-frequency letter in each block, hoping that the most frequent letter would give me some clue as to how to find 'U,' 'A' and 'R.' However, the most frequent letters in these blocks are:
KeyPos=0: Q,4 T,3 E,3, J,3
KeyPos=1: C,4 U,3 Y,3
KeyPos=2: N,4 O,3 R,3 D,3 B,2
KeyPos=3: V,4 D,3 Z,3
KeyPos=4: H,6 X,6 M,3 G,3
KeyPos=5: M,4 T,4 N,3 G,3
Which yields QCNVHM, or QUNVHM (being generous), neither of which are that close to QUARTZ. There are online applets that can crack this no problem, so it mustn't be too short a text to yield decent frequency counts from the blocks.
I guess I must be approaching this the wrong way. I just hoped one of you might be able to offer some clue as to where I am going wrong.
p.s. This is for a digital crypto class.
Interesting question...
I don't have a programmatic solution for cracking the original ciphertext, but I was able to solve it with a little mind power and some helpful JavaScript.
I started by using this page and the information you supplied. Provide the ciphertext, a key length of 6 and hit initialize. What's nice about the approach here is that unknowns in either the plaintext or key are left as hyphens.
Update the key, adding only what you know Q---TZ and click 'update plaintext'. At this point we know:
o---sua---opo---oca---nha---enc---rom---dth---ama---int---ept---our---mun---tio---ewi---eus---the---ond---loc---onf---now---hed---off---ere---nsw---esd---tmi---ght
Here's where I applied a bit of brain power. You start recognizing bits of the plaintext. the, now and off make an appearance. At the end, there's ght - this made me think the prior letter is likely a vowel. For example light or thought. I replaced the corresponding hyphen with u and clicked update keyword to find what letter would have produced that combination. The matching letter turns out to be F. I think updated the plaintext to see the results. They didn't look promising. So I tried i instead which resulted in:
o--usua--ropo--loca--onha--eenc--prom--edth--eama--eint--cept--gour--mmun--atio--wewi--beus--gthe--cond--yloc--ionf--mnow--thed--poff--mere--insw--nesd--atmi--ight
Now we're getting somewhere. At the start I see something that might be usual, and further in I see int--cept and near the end w--nesd-- at mi--ight. Voila. Filling in the letters for wednesday and updating the keyword yielded QUARTZ.
... So, how to port this approach to code? Not sure about the best way to do that just yet. The idea of using the known characters in the key, partially decrypting the ciphertext and brute forcing the rest is appealing. But without a dictionary handy, I'm not sure what the best brute-forcing method would be...
To be continued (maybe)...
An algorithm wouldn't just consider the most frequent letters but the frequency pattern of the whole alphabet. Technically you compute the index of coincidence for each possible shift and consider the maximal ones.
Is possible to calculate average of three encrypted integer? No constrain on the method of encrypting. The point of this is just to hide the three numbers and find average.
What you seem to be looking for is called Homomorphic Encryption: an encryption scheme which allows you to perform operations on encrypted data, with the encrypted result as the outcome.
Such a scheme would allow you to give encrypted data to a 3rd party, which could then do computations on it for you without knowing what they were computing.
In your case, you need two operations: addition and division. Until recently, homomorphic encryption schemes typically supported only 1 operation. But in september 2009 IMB announced the first fully homomorphic cryptosystem. Other researches published another system soon after that.
These cryptosystems might be be able to do what you want, but it is all cutting edge computer science research.
Decrypt the numbers, then calculate their average.
I don't see any simple ways to do what you ask, apart from decrypting the numbers first.
Taking the average (or the "arithmetic mean") requires adding the numbers. Now if you wanted to multiply the numbers, then you could do that neatly with RSA encryption. If p is the plaintext, c is the ciphertext, and e is the encryption key, then in RSA, c = p^e. If you have 3 separate integers, p1, p2, p3, and the product is pp then
pp^e = (p1 * p2 * p3)^e = p1^e * p2^e * p3^3 = c1 * c2 * c3 = cp
That is, you can either multiply the three plaintext integers together and then encrypt, or you can just multiply the three ciphertexts together, and get the same answer. This would get you some way towards the "geometric mean", where you multiply all the numbers together, and then take the cube-root (or nth root for n numbers). Unfortunately, calculating a cube root in modular arithmetic is non-trivial.
With ideal encryption methods: No.
With most real-world encryption methods: No.
With some stupidly simple to undo obfuscation method especially designed to allow averaging: Yes.
Calling the latter method "encryption" really would be using the wrong term.
If you could calculate the average of encrypted numbers without decrypting them, that would make decrypting the original numbers quite a lot easier, so I would be very surprised if this works with any serious encryption algorithm.
In general three encrypted numbers shouldn't maintain the same order if encrypted, so I'm pretty sure you have to decrypt them and calculate the avarage.
If, and only if, the method of encryption is a one-to-one mathematical function, then it is possible to do so while the numbers are encrypted.
For example, if my very unsecure method of encryption is to multiply every number of 2, then I would do the following:
function encrypt($number){
return $number*2;
}
$a=encrypt(3); // a= 9
$b=encrypt(5); // b= 15
$c=encrypt(6); // c= 18
$average = ($a+$b+$c)/6; // We divide by 6 because first we divide by 3 to get the average, then by 2 to do the decryption. The method will vary based on the mathematical function.
The only other possibility is to decrypt the numbers first.
Take a commonly used binary hash function - for example, SHA-256. As the name implies, it outputs a 256 bit value.
Let A be the set of all possible 256 bit binary values. A is extremely large, but finite.
Let B be the set of all possible binary values. B is infinite.
Let C be the set of values obtained by running SHA-256 on every member of B. Obviously this can't be done in practice, but I'm guessing we can still do mathematical analysis of it.
My Question: By necessity, C ⊆ A. But does C = A?
EDIT: As was pointed out by some answers, this is wholly dependent on the has function in question. So, if you know the answer for any particular hash function, please say so!
First, let's point out that SHA-256 does not accept all possible binary strings as input. As defined by FIPS 180-3, SHA-256 accepts as input any sequence of bits of length lower than 2^64 bits (i.e. no more than 18446744073709551615 bits). This is very common; all hash functions are somehow limited in formal input length. One reason is that the notion of security is defined with regards to computational cost; there is a threshold about computational power that any attacker may muster. Inputs beyond a given length would require more than that maximum computational power to simply evaluate the function. In brief, cryptographers are very wary of infinites, because infinites tend to prevent security from being even defined, let alone quantified. So your input set C should be restricted to sequences up to 2^64-1 bits.
That being said, let's see what is known about hash function surjectivity.
Hash functions try to emulate a random oracle, a conceptual object which selects outputs at random under the only constraint that it "remembers" previous inputs and outputs, and, if given an already seen input, it returns the same output than previously. By definition, a random oracle can be proven surjective only by trying inputs and exhausting the output space. If the output has size n bits, then it is expected that about 2^(2n) distinct inputs will be needed to exhaust the output space of size 2^n. For n = 256, this means that hashing about 2^512 messages (e.g. all messages of 512 bits) ought to be enough (on average). SHA-256 accepts inputs very much longer than 512 bits (indeed, it accepts inputs up to 18446744073709551615 bits), so it seems highly plausible that SHA-256 is surjective.
However, it has not been proven that SHA-256 is surjective, and that is expected. As shown above, a surjectivity proof for a random oracle requires an awful lot of computing power, substantially more than mere attacks such as preimages (2^n) and collisions (2^(n/2)). Consequently, a good hash function "should not" allow a property such as surjectivity to be actually proven. It would be very suspicious: security of hash function stems from the intractability of their internal structure, and such an intractability should firmly oppose to any attempt at mathematical analysis.
As a consequence, surjectivity is not formally proven for any decent hash function, and not even for "broken" hash functions such as MD4. It is only "highly suspected" (a random oracle with inputs much longer than the output should be surjective).
Not necessarily. The pigeonhole principle states that once one more hash beyond the size of A has been generated that there is a probability of collision of 1, but it does not state that every single element of A has been generated.
It really depends on the hash function. If you use this valid hash function:
Int256 Hash (string input) {
return 0;
}
then it is obvious that C != A. So the "for example, SHA256" is a pretty important note to consider.
To answer your actual question: I believe so, but I'm just guessing. Wikipedia does not provide any meaningful info on this.
Not necessarily. That would depend on the hash function.
It would probably be ideal if the hash function was surjective, but there are things that're usually more important, such as a low likelihood of collisions.
It is not always the case. However, quality required for an hash algorithm are:
Cardinality of B
Repartition of hashes in B (every value in B must have the same probability to be a hash)
If an encrypted file exists and someone wants to decrypt it, there are several methods do try.
For example, if you would chose a brute force attack, that's easy: just try all possible keys and you will find the correct one. For this question, it doesn't matter that this might take too long.
But trying keys means the following steps:
Chose key
Decrypt data with key
Check if decryption was successful
Besides the problem that you would need to know the algorithm that was used for the encryption, I cannot imagine how one would do #3.
Here is why: After decrypting the data, I get some "other" data. In case of an encrypted plain text file in a language that I can understand, I can now check if the result is a text in that langauge.
If it would be a known file type, I could check for specific file headers.
But since one tries to decrypt something secret, it is most likely unknown what kind of information there will be if correctly decrypted.
How would one check if a decryption result is correct if it is unknown what to look for?
Like you suggest, one would expect the plaintext to be of some know format, e.g., a JPEG image, a PDF file, etc. The idea would be that it is very unlikely that a given ciphertext can be decrypted into both a valid JPEG image and a valid PDF file (but see below).
But it is actually not that important. When one talks about a cryptosystem being secure, one (roughly) talks about the odds of you being able to guess the plaintext corresponding to a given ciphertext. So I pick a random message m and encrypts it c = E(m). I give you c and if you cannot guess m, then we say the cryptosystem is secure, otherwise it's broken.
This is just a simple security definition. There are other definitions that require the system to be able to hide known plaintexts (semantic security): you give me two messages, I encrypt one of them, and you will not be able to tell which message I chose.
The point is, that in these definitions, we are not concerned with the format of the plaintexts, all we require is that you cannot guess the plaintext that was encrypted. So there is no step 3 :-)
By not considering your step 3, we make the question of security as clear as possible: instead of arguing about how hard it is to guess which format you used (zip, gzip, bzip2, ...) we only talk about the odds of breaking the system compared to the odds of guessing the key. It is an old principle that you should concentrate all your security in the key -- it simplifies things dramatically when your only assumption is the secrecy of the key.
Finally, note that some encryption schemes makes it impossible for you to verify if you have the correct key since all keys are legal. The one-time pad is an extreme example such a scheme: you take your plaintext m, choose a perfectly random key k and compute the ciphertext as c = m XOR k. This gives you a completely random ciphertext, it is perfectly secure (the only perfectly secure cryptosystem, btw).
When searching for an encryption key, you cannot know when you've found the right one. This is because c could be an encryption of any file with the same length as m: if you encrypt the message m' with the key *k' = c XOR m' you'll see that you get the same ciphertext again, thus you cannot know if m or m' was the original message.
Instead of thinking of exclusive-or, you can think of the one-time pad like this: I give you the number 42 and tell you that is is the sum of two integers (negative, positive, you don't know). One integer is the message, the other is the key and 42 is the ciphertext. Like above, it makes no sense for you to guess the key -- if you want the message to be 100, you claim the key is -58, if you want the message to be 0, you claim the key is 42, etc. One time pad works exactly like this, but on bit values instead.
About reusing the key in one-time pad: let's say my key is 7 and you see the ciphertexts 10 and 20, corresponding to plaintexts 3 and 13. From the ciphertexts alone, you now know that the difference in plaintexts is 10. If you somehow gain knowledge of one of the plaintext, you can now derive the other! If the numbers correspond to individual letters, you can begin looking at several such differences and try to solve the resulting crossword puzzle (or let a program do it based on frequency analysis of the language in question).
You could use heuristics like the unix
file
command does to check for a known file type. If you have decrypted data that has no recognizable type, decrypting it won't help you anyway, since you can't interpret it, so it's still as good as encrypted.
I wrote a tool a little while ago that checked if a file was possibly encrypted by simply checking the distribution of byte values, since encrypted files should be indistinguishable from random noise. The assumption here then is that an improperly decrypted file retains the random nature, while a properly decrypted file will exhibit structure.
#!/usr/bin/env python
import math
import sys
import os
MAGIC_COEFF=3
def get_random_bytes(filename):
BLOCK_SIZE=1024*1024
BLOCKS=10
f=open(filename)
bytes=list(f.read(BLOCK_SIZE))
if len(bytes) < BLOCK_SIZE:
return bytes
f.seek(0, 2)
file_len = f.tell()
index = BLOCK_SIZE
cnt=0
while index < file_len and cnt < BLOCKS:
f.seek(index)
more_bytes = f.read(BLOCK_SIZE)
bytes.extend(more_bytes)
index+=ord(os.urandom(1))*BLOCK_SIZE
cnt+=1
return bytes
def failed_n_gram(n,bytes):
print "\t%d-gram analysis"%(n)
N = len(bytes)/n
states = 2**(8*n)
print "\tN: %d states: %d"%(N, states)
if N < states:
print "\tinsufficient data"
return False
histo = [0]*states
P = 1.0/states
expected = N/states * 1.0
# I forgot how this was derived, or what it is suppose to be
magic = math.sqrt(N*P*(1-P))*MAGIC_COEFF
print "\texpected: %f magic: %f" %(expected, magic)
idx=0
while idx<len(bytes)-n:
val=0
for x in xrange(n):
val = val << 8
val = val | ord(bytes[idx+x])
histo[val]+=1
idx+=1
count=histo[val]
if count - expected > magic:
print "\tfailed: %s occured %d times" %( hex(val), count)
return True
# need this check because the absence of certain bytes is also
# a sign something is up
for i in xrange(len(histo)):
count = histo[i]
if expected-count > magic:
print "\tfailed: %s occured %d times" %( hex(i), count)
return True
print ""
return False
def main():
for f in sys.argv[1:]:
print f
rand_bytes = get_random_bytes(f)
if failed_n_gram(3,rand_bytes):
continue
if failed_n_gram(2,rand_bytes):
continue
if failed_n_gram(1,rand_bytes):
continue
if __name__ == "__main__":
main()
I find this works reasonable well:
$ entropy.py ~/bin/entropy.py entropy.py.enc entropy.py.zip
/Users/steve/bin/entropy.py
1-gram analysis
N: 1680 states: 256
expected: 6.000000 magic: 10.226918
failed: 0xa occured 17 times
entropy.py.enc
1-gram analysis
N: 1744 states: 256
expected: 6.000000 magic: 10.419895
entropy.py.zip
1-gram analysis
N: 821 states: 256
expected: 3.000000 magic: 7.149270
failed: 0x0 occured 11 times
Here .enc is the source ran through:
openssl enc -aes-256-cbc -in entropy.py -out entropy.py.enc
And .zip is self-explanatory.
A few caveats:
It doesn't check the entire file, just the first KB, then random blocks from the file. So if a file was random data appended with say a jpeg, it will fool the program. The only way to be sure if to check the entire file.
In my experience, the code reliably detects when a file is unencrypted (since nearly all useful data has structure), but due to its statistical nature may sometimes misdiagnose an encrypted/random file.
As it has been pointed out, this kind of analysis will fail for OTP, since you can make it say anything you want.
Use at your own risk, and most certainly not as the only means of checking your results.
One of the ways is compressing the source data with some standard algorithm like zip. If after decryption you can unzip the result - it's decrypted right. Compression is almost usually done by encryption programs prior to encryption - because it's another step the bruteforcer will need to repeat for each trial and lose time on it and because encrypted data is almost surely uncompressible (size doesn't decrease after compression with a chained algorithm).
Without a more clearly defined scenario, I can only point to cryptanalysis methods. I would say it's generally accepted that validating the result is an easy part of cryptanalysis. In comparison to decrypting even a known cypher, a thorough validation check costs little cpu.
are you seriously asking questions like this?
well if it was known whats inside then you would not need to decrypt it anywayz right?
somehow this has nothing to do with programming question its more mathematical. I took some encryption math classes at my university.
And you can not confirm without a lot of data points.
Sure if your result makes sense and its clear it is meaningful in plain english (or whatever language is used) but to answer your question.
If you were able to decrypt you should be able to encrypt as well.
So encrypt the result using reverse process of decryption and if you get same results you might be golden...if not something is possibly wrong.
Do you know of any other ciphers that performs like the ROT47 family?
My major requirement is that it'd be keyless.
Sounds like you might be looking for some "classical cryptography" solutions.
SUBSTITUTION CIPHERS are encodings where one character is substituted with another. E.g. A->Y, B->Q, C->P, and so on. The "Caesar Cipher" is a special case where the order is preserved, and the "key" is the offset. In the rot13/47 case, the "key" is 13 or 47, respectively, though it could be something like 3 (A->D, B->E, C->F, ...).
TRANSPOSITION CIPHERS are ones that don't substitute letters, but ones that rearrange letters in a pre-defined way. For example:
CRYPTOGRAPHY
may be written as
C Y T G A H
R P O R P Y
So the ciphered output is created by reading the two lines left to right
CYTGAHRPORPY
Another property of rot13/47 is that it's REVERSABLE:
encode(encode(plaintext)) == plaintext
If this is the property you want, you could simply XOR the message with a known (previously decided) XOR value. Then, XOR-ing the ciphertext with the same value will return the original plaintext. An example of this would be the memfrob function, which just XORs a buffer with the binary representation of the number 42.
You also might check out other forms of ENCODINGS, such as Base64 if that's closer to what you're looking for.
!! Disclaimer - if you have data that you're actually trying to protect from anyone, don't use any of these methods. While entertaining, all of these methods are trivial to break.