keyless ciphers of ROT13/47 ilk - encryption

Do you know of any other ciphers that performs like the ROT47 family?
My major requirement is that it'd be keyless.

Sounds like you might be looking for some "classical cryptography" solutions.
SUBSTITUTION CIPHERS are encodings where one character is substituted with another. E.g. A->Y, B->Q, C->P, and so on. The "Caesar Cipher" is a special case where the order is preserved, and the "key" is the offset. In the rot13/47 case, the "key" is 13 or 47, respectively, though it could be something like 3 (A->D, B->E, C->F, ...).
TRANSPOSITION CIPHERS are ones that don't substitute letters, but ones that rearrange letters in a pre-defined way. For example:
CRYPTOGRAPHY
may be written as
C Y T G A H
R P O R P Y
So the ciphered output is created by reading the two lines left to right
CYTGAHRPORPY
Another property of rot13/47 is that it's REVERSABLE:
encode(encode(plaintext)) == plaintext
If this is the property you want, you could simply XOR the message with a known (previously decided) XOR value. Then, XOR-ing the ciphertext with the same value will return the original plaintext. An example of this would be the memfrob function, which just XORs a buffer with the binary representation of the number 42.
You also might check out other forms of ENCODINGS, such as Base64 if that's closer to what you're looking for.
!! Disclaimer - if you have data that you're actually trying to protect from anyone, don't use any of these methods. While entertaining, all of these methods are trivial to break.

Related

Create a deterministic hash of fixed length from unordered arbitrary strings

I am trying to come up with a checksum algorithm that produces a hash of fixed length based on arbitrary strings that are un-ordered.
By that I mean to say, the hash of the strings ["a", "b"] should result in the same hash as ["b", "a"]. Also, ["this is a really long string", "a"] should result in the same as ["a", "this is a really long string"].
Ideally, I would like this hash to look something similar to a sha256 string, but that is less important.
The hash doesn't need to be unique, but it does need to act as a checksum, so for lots of inputs, it should be at least a bit unlikely that you'd get collisions.
As I say, this is a checksum rather than a cryptographic hash, so some degree of duplication is ok.
The pseudocode, written in go for no other reason than it's syntactically light whilst being typed, would be:
func hash(inputs []string) string
One option would be to simply order the inputs and create a sha256 from it, but that is memory intensive. Another option is to convert to uuid using a v5 uuid, and XOR it, but ideally the algorithm would require no other hashing function.
Any thoughts would be greatly appreciated.
Chose a hash algorithm that suits you need, and apply it on each element .. and then xor all the hash's together. This will give you the same result, independent of the sequence of the elements.

S-box in AES CCM 128 bit

I am working on encryption & decryption of data using AES-CCM.
While studying AES, I came across a word called S-Box.
What is S-Box, and the relationship with AES? How can it be calculated? Is it depends on symmetric key or not?
How will cypher text be generated in AES-CCM 128 bit?
The S-Boxes are a system that is used in symmetric cryptographic algorithms to substitute and obscure the relationship between the key and the text that you want to cypher.
You can see more in this article. Here, you have a part:
There are different types of cyphers according to their design [68]. One of these is the ​Substitution–PermutationNetwork (SPN) that generates the ciphered text by applying substitution and permutation rounds to the original text and the symmetric key to create confusion. To do this, it must be used the Substitution boxes (S-boxes) and Permutation boxes (P-boxes). The S-boxes substitute one-to-one the bits of a block of the input text in the round with bits of the output text. This output is taken as an input in the P-boxes and then it permutes all the bits that will be used as S-box input in the next round.
As #CGG said, S-boxes are a component of a Substitution-Permutation Network. The Wikipedia entry has good diagrams which will help explain how they work.
Think of an S-box as a simple substitution cipher -- A=1, B=2, etc. In an SPN, you run input through an S-box to substitute new values, then you run that result through a P-box (permutation) to distribute the modified bits out to as many S-boxes as possible. This loop repeats to spread the changes throughout the entire cipher text.
In general, an S-box replaces the input bits with an identical number of output bits. This exchange should be 1:1 to provide invertibility (i.e. you must be able to reverse the operation in order to decrypt), should employ the avalanche effect (so changing 1 bit of input changes about half the output bits), and should depend on every bit of input.

Vigenere Cipher - decryption (by hand)

This is a Vigenere cipher-text
EORLL TQFDI HOEZF CHBQN IFGGQ MBVXM SIMGK NCCSV
WSXYD VTLQS BVBMJ YRTXO JCNXH THWOD FTDCC RMHEH
SNXVY FLSXT ICNXM GUMET HMTUR PENSU TZHMV LODGN
MINKA DTLOG HEVNI DXQUG AZGRM YDEXR TUYRM LYXNZ
ZGJ
The index of coincidence gave a shift of six (6): I know this is right (I used an online Java applet to decrypt the whole thing using the key 'QUARTZ').
However, in this question we are only told the first and last two letters of the Key - 'Q' and 'TZ.'
So far I have split the ciphertext into slices using this awesome applet. So the first slice is 0, k, 2k, 3k, 4k; the second is 1, k + 1, 2k + 1, 3k + 1; et cetera.
KeyPos=0: EQEQQSCXQJJHDEYIUTSVMTVUMTYJ
KeyPos=1: OFZNMICYSYCWCHFCMUULILNGYUX
KeyPos=2: RDFIBMSDBRNOCSLNERTONOIADYN
KeyPos=3: LICFVGVVVTXDRNSXTPZDKGDZERZ
KeyPos=4: LHHGXKWTBXHFMXXMHEHGAHXGXMZ
KeyPos=5: TOBGMNSLMOTTHVTGMNMNDEQRRLG
My idea was to calculate the highest-frequency letter in each block, hoping that the most frequent letter would give me some clue as to how to find 'U,' 'A' and 'R.' However, the most frequent letters in these blocks are:
KeyPos=0: Q,4 T,3 E,3, J,3
KeyPos=1: C,4 U,3 Y,3
KeyPos=2: N,4 O,3 R,3 D,3 B,2
KeyPos=3: V,4 D,3 Z,3
KeyPos=4: H,6 X,6 M,3 G,3
KeyPos=5: M,4 T,4 N,3 G,3
Which yields QCNVHM, or QUNVHM (being generous), neither of which are that close to QUARTZ. There are online applets that can crack this no problem, so it mustn't be too short a text to yield decent frequency counts from the blocks.
I guess I must be approaching this the wrong way. I just hoped one of you might be able to offer some clue as to where I am going wrong.
p.s. This is for a digital crypto class.
Interesting question...
I don't have a programmatic solution for cracking the original ciphertext, but I was able to solve it with a little mind power and some helpful JavaScript.
I started by using this page and the information you supplied. Provide the ciphertext, a key length of 6 and hit initialize. What's nice about the approach here is that unknowns in either the plaintext or key are left as hyphens.
Update the key, adding only what you know Q---TZ and click 'update plaintext'. At this point we know:
o---sua---opo---oca---nha---enc---rom---dth---ama---int---ept---our---mun---tio---ewi---eus---the---ond---loc---onf---now---hed---off---ere---nsw---esd---tmi---ght
Here's where I applied a bit of brain power. You start recognizing bits of the plaintext. the, now and off make an appearance. At the end, there's ght - this made me think the prior letter is likely a vowel. For example light or thought. I replaced the corresponding hyphen with u and clicked update keyword to find what letter would have produced that combination. The matching letter turns out to be F. I think updated the plaintext to see the results. They didn't look promising. So I tried i instead which resulted in:
o--usua--ropo--loca--onha--eenc--prom--edth--eama--eint--cept--gour--mmun--atio--wewi--beus--gthe--cond--yloc--ionf--mnow--thed--poff--mere--insw--nesd--atmi--ight
Now we're getting somewhere. At the start I see something that might be usual, and further in I see int--cept and near the end w--nesd-- at mi--ight. Voila. Filling in the letters for wednesday and updating the keyword yielded QUARTZ.
... So, how to port this approach to code? Not sure about the best way to do that just yet. The idea of using the known characters in the key, partially decrypting the ciphertext and brute forcing the rest is appealing. But without a dictionary handy, I'm not sure what the best brute-forcing method would be...
To be continued (maybe)...
An algorithm wouldn't just consider the most frequent letters but the frequency pattern of the whole alphabet. Technically you compute the index of coincidence for each possible shift and consider the maximal ones.

how to find the number of possibilities of a hash

if i have a hash say like this: 0d47aeda9d97686ab3da96bae2c93d078a5ab253
how do i do the math to find out the number of possibilities to try if i start with 0000000000000000000000000000000000000000 to 9999999999999999999999999999999999999999 which is the general length of a sha1.
The number of possibilities would be 2^(X) where X is the number of bits in the hash.
In the normal hexadecimal string representation of the hash value like the one you gave, each character is 4 bits, so it would be 2^(4*len) where len is the string length of the hash value. In your example, you have a 40 character SHA1 digest, which corresponds to 160 bits, or 2^160 == 1.4615016373309029182036848327163e+48 values.
An SHA-1 hash is 160 bits, so there are 2^160 possible hashes.
Your hexadecimal digit range is 0 through f.
Then it's simply 16^40 or however many characters it contains
Recall that a hash function accepts inputs of arbitrary length. A good cryptographic hash function will seem to assign a "random" hash result to any input. So if the digest is N bits long (for SHA-1, N=160), then every input will be hashed to one of 2^N possible results, in a manner we'll treat as random.
That means that the expectation for finding a preimage for your hash result is running though 2^N inputs. They don't have to be specifically the range that you suggested - any 2^N distinct inputs are fine.
This also means that 2^N inputs don't guarantee that you'll find a preimage - each try is random, so you might miss your 1-in-2^N chance in every single one of those 2^N inputs (just like flipping a coin twice doesn't guarantee you'll get heads at least once). But you can figure out how many inputs are required to find a preimage for the hash with probability p or greater - with p being as close to one as you desire (just not actually 1).
maximum variations, with repeating and with attention to the order are defined as n^k. in your case this would mean 10^40, which can't be correct for SHA1. Reading Wikipedia it sais SHA1 has a max. complexity for a collision based attack of 2^80, using different technices researches were allready successfull with 2^51 collisions, so 10^40 seems a bit much.

Do cryptographic hash functions reach each possible values, i.e., are they surjective?

Take a commonly used binary hash function - for example, SHA-256. As the name implies, it outputs a 256 bit value.
Let A be the set of all possible 256 bit binary values. A is extremely large, but finite.
Let B be the set of all possible binary values. B is infinite.
Let C be the set of values obtained by running SHA-256 on every member of B. Obviously this can't be done in practice, but I'm guessing we can still do mathematical analysis of it.
My Question: By necessity, C ⊆ A. But does C = A?
EDIT: As was pointed out by some answers, this is wholly dependent on the has function in question. So, if you know the answer for any particular hash function, please say so!
First, let's point out that SHA-256 does not accept all possible binary strings as input. As defined by FIPS 180-3, SHA-256 accepts as input any sequence of bits of length lower than 2^64 bits (i.e. no more than 18446744073709551615 bits). This is very common; all hash functions are somehow limited in formal input length. One reason is that the notion of security is defined with regards to computational cost; there is a threshold about computational power that any attacker may muster. Inputs beyond a given length would require more than that maximum computational power to simply evaluate the function. In brief, cryptographers are very wary of infinites, because infinites tend to prevent security from being even defined, let alone quantified. So your input set C should be restricted to sequences up to 2^64-1 bits.
That being said, let's see what is known about hash function surjectivity.
Hash functions try to emulate a random oracle, a conceptual object which selects outputs at random under the only constraint that it "remembers" previous inputs and outputs, and, if given an already seen input, it returns the same output than previously. By definition, a random oracle can be proven surjective only by trying inputs and exhausting the output space. If the output has size n bits, then it is expected that about 2^(2n) distinct inputs will be needed to exhaust the output space of size 2^n. For n = 256, this means that hashing about 2^512 messages (e.g. all messages of 512 bits) ought to be enough (on average). SHA-256 accepts inputs very much longer than 512 bits (indeed, it accepts inputs up to 18446744073709551615 bits), so it seems highly plausible that SHA-256 is surjective.
However, it has not been proven that SHA-256 is surjective, and that is expected. As shown above, a surjectivity proof for a random oracle requires an awful lot of computing power, substantially more than mere attacks such as preimages (2^n) and collisions (2^(n/2)). Consequently, a good hash function "should not" allow a property such as surjectivity to be actually proven. It would be very suspicious: security of hash function stems from the intractability of their internal structure, and such an intractability should firmly oppose to any attempt at mathematical analysis.
As a consequence, surjectivity is not formally proven for any decent hash function, and not even for "broken" hash functions such as MD4. It is only "highly suspected" (a random oracle with inputs much longer than the output should be surjective).
Not necessarily. The pigeonhole principle states that once one more hash beyond the size of A has been generated that there is a probability of collision of 1, but it does not state that every single element of A has been generated.
It really depends on the hash function. If you use this valid hash function:
Int256 Hash (string input) {
return 0;
}
then it is obvious that C != A. So the "for example, SHA256" is a pretty important note to consider.
To answer your actual question: I believe so, but I'm just guessing. Wikipedia does not provide any meaningful info on this.
Not necessarily. That would depend on the hash function.
It would probably be ideal if the hash function was surjective, but there are things that're usually more important, such as a low likelihood of collisions.
It is not always the case. However, quality required for an hash algorithm are:
Cardinality of B
Repartition of hashes in B (every value in B must have the same probability to be a hash)

Resources