repetition in encrypted data -- red flag?

repetition in encrypted data -- red flag? - encryption

I have some base-64 encoded encrypted data and noticed a fair amount of repetition. In a (approx) 200-character-long string, a certain base-64 character is repeated up to 7 times in several separate repeated runs.
Is this a red flag that there is a problem in the encryption? According to my understanding, encrypted data should never show significant repetition, even if the plaintext is entirely uniform (i.e. even if I encrypt 2 GB of nothing but the letter A, there should be no significant repetition in the encrypted version).

According to the binomial distribution, there is about a 2.5% chance that you'd see one character from a set of 64 appear seven times in a series of 200 random characters. That's a small chance, but not negligible. With more information, you might raise your confidence from 97.5% to something very close to 100% … or find that the cipher text really is uniformly distributed.
You say that the "character is repeated up to 7 times" in several separate repeated runs. That's not enough information to say whether the cipher text has a bias. Instead, tell us the total number of times the character appeared, and the total number of cipher text characters. For example, "it appeared a total of 3125 times in 1000 runs of 200 characters each."
Also, you need to be sure that you are talking about the raw output of a cipher. Cipher text is often encapsulated in an "envelope" like that defined by the Cryptographic Message Syntax. Of course, this enclosing structure will have predictable patterns.

Well I guess it depends. Repetition in general is bad thing if it represents the same data.
Considering you are encoding it have you looked at data to see if you have something that repeats in those counts?
In order to understand better you gotta know what kind of encryption does it use.
It could be just coincidence that they are repeating.
But if repetition comes from same data, then it can be a red flag because then frequency counts can be used to decode it.
What kind of encryption are you using? Home made or some industry standard?

It depends on how are you encrypting your data.
Base64 encoding a string may count as light obfuscation, but it is NOT encryption. The purpose of Base64 encoding is to allow any sort of binary data to be encoded as a safe ASCII string.

Related

How to create fixed length decryption

I would like to ask if there is a way to encrypt text (no matter how long it is) and ALWAYS get a fixed length decryption? I am not referring to hashing but to encryption/decryption.
Example:
Suppose that we want to encrypt (not hash) a text which is 60 characters long. The result will be a string which is 32 characters long. We can then decrypt the string to get the original text!
We now want to encrypt (not hash) a text which is 200 characters long. The result will be a string which is again 32 characters long. We can then decrypt the string to get the original text!
Is that somehow possible?
Thank you

As the comments indicate, this is impossible. For the underlying reason that this is impossible, see the Pigeonhole Principle. In your example, there are 256^200 inputs and 256^32 outputs. Therefore there must be at least 1 output that has more than 1 input, and therefore is impossible to reverse. Since the number of inputs is massively larger than the number of outputs (and in the general case, is unbounded), almost all cipher texts are necessarily impossible to decrypt.

Finding Aes256 keys

I have a question about aes keys.
I have a binary file which contains an aes256 key (32 bytes) at an unknown offset.
Would it be somehow possible to find this key in the file? Is it somehow possible to tell whether the next 32 bytes would be a valid aes key?
Thanks in advance
EDIT:
Thanks for all of your answers,
The key is stored in the file as normal bytes.
I finally managed to create a way to get it.
I basically filter out all strings, which actually made it work.
Thanks again

Well, yes and no. AES-256 keys should consist of just 32 bytes that are indistinguishable from random. Most files do not consist of just random bytes, so it could be possible t find a sequence that is most likely random, and this could be that key you are looking for. However, it might very well be that there are other random sequences in the file, or sequences that look like random but aren't random at all (such as the binary representation of the number Pi).
It may also be that you are unlucky and that the AES key doesn't look all that random. Or that the key is stored in hexadecimals (text) rather than binary byte values. Then there is the issue of finding the exact offset that might be the problem (is that initial byte with value 0x20 indicating the size of the AES key, a space character or part of the key value)?
Most files have a specific format, so you should first have a look at that. Just looking for random sequences may give you both false positives (rather likely) or false negatives (less likely). If you expect 64 bytes of randomness (two keys) then I suggest you search for that first, as it brings down the chance of false positives by a rather large amount.

No - unless you have a way to verify the key against a known plaintext/ciphertext pair - an AES key is not distinguishable from random noise. Any set of 16, 24 or 32 bytes is a valid AES key.

S-box in AES CCM 128 bit

I am working on encryption & decryption of data using AES-CCM.
While studying AES, I came across a word called S-Box.
What is S-Box, and the relationship with AES? How can it be calculated? Is it depends on symmetric key or not?
How will cypher text be generated in AES-CCM 128 bit?

The S-Boxes are a system that is used in symmetric cryptographic algorithms to substitute and obscure the relationship between the key and the text that you want to cypher.
You can see more in this article. Here, you have a part:
There are different types of cyphers according to their design [68]. One of these is the Substitution–PermutationNetwork (SPN) that generates the ciphered text by applying substitution and permutation rounds to the original text and the symmetric key to create confusion. To do this, it must be used the Substitution boxes (S-boxes) and Permutation boxes (P-boxes). The S-boxes substitute one-to-one the bits of a block of the input text in the round with bits of the output text. This output is taken as an input in the P-boxes and then it permutes all the bits that will be used as S-box input in the next round.

As #CGG said, S-boxes are a component of a Substitution-Permutation Network. The Wikipedia entry has good diagrams which will help explain how they work.
Think of an S-box as a simple substitution cipher -- A=1, B=2, etc. In an SPN, you run input through an S-box to substitute new values, then you run that result through a P-box (permutation) to distribute the modified bits out to as many S-boxes as possible. This loop repeats to spread the changes throughout the entire cipher text.
In general, an S-box replaces the input bits with an identical number of output bits. This exchange should be 1:1 to provide invertibility (i.e. you must be able to reverse the operation in order to decrypt), should employ the avalanche effect (so changing 1 bit of input changes about half the output bits), and should depend on every bit of input.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.

I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

Generate very very large random numbers

How would you generate a very very large random number? I am thinking on the order of 2^10^9 (one billion bits). Any programming language -- I assume the solution would translate to other languages.
I would like a uniform distribution on [1,N].
My initial thoughts:
--You could randomly generate each digit and concatenate. Problem: even very good pseudorandom generators are likely to develop patterns with millions of digits, right?
You could perhaps help create large random numbers by raising random numbers to random exponents. Problem: you must make the math work so that the resulting number is still random, and you should be able to compute it in a reasonable amount of time (say, an hour).
If it helps, you could try to generate a possibly non-uniform distribution on a possibly smaller range (using the real numbers, for instance) and transform. Problem: this might be equally difficult.
Any ideas?

Generate log2(N) random bits to get a number M,
where M may be up to twice as large as N.
Repeat until M is in the range [1;N].
Now to generate the random bits you could either use a source of true randomness, which is expensive.
Or you might use some cryptographically secure random number generator, for example AES with a random key encrypting a counter for subsequent blocks of bits. The cryptographically secure implies that there can be no noticeable patterns.

It depends on what you need the data for. For most purposes, a PRNG is fast and simple. But they are not perfect. For instance I remember hearing that Monte Carlos simulations of chaotic systems are really good at revealing the underlying pattern in a PRNG.
If that is the sort of thing that you are doing, though, there is a simple trick I learned in grad school for generating lots of random data. Take a large (preferably rapidly changing) file. (Some big data structures from the running kernel are good.) Compress it to increase the entropy. Throw away the headers. Then for good measure, encrypt the result. If you're planning to use this for cryptographic purposes (and you didn't have a perfect entropy data set to work with), then reverse it and encrypt again.
The underlying theory is simple. Information theory tells us that there is no difference between a signal with no redundancy and pure random data. So if we pick a big file (ie lots of signal), remove redundancy with compression, and strip the headers, we have a pretty good random signal. Encryption does a really good job at removing artifacts. However encryption algorithms tend to work forward in blocks. So if someone could, despite everything, guess what was happening at the start of the file, that data is more easily guessable. But then reversing the file and encrypting again means that they would need to know the whole file, and our encryption, to find any pattern in the data.
The reason to pick a rapidly changing piece of data is that if you run out of data and want to generate more, you can go back to the same source again. Even small changes will, after that process, turn into an essentially uncorrelated random data set.

NTL: A Library for doing Number Theory
This was recommended by my Coding Theory and Cryptography teacher... so I guess it does the work right, and it's pretty easy to use.
RandomBnd, RandomBits, RandomLen -- routines for generating pseudo-random numbers
ZZ RandomLen_ZZ(long l);
// ZZ = psuedo-random number with precisely l bits,
// or 0 of l <= 0.

If you have a random number generator that generates random numbers of X bits. And concatenated bits of [X1, X2, ... Xn ] create the number you want of N bits, as long as each X is random, I don't see why your large number wouldn't be random as well for all intents and purposes. And if standard C rand() method is not secure enough, I'm sure there's plenty of other libraries (like the ones mentioned in this thread) whose pseudo-random numbers are "more random".

even very good pseudorandom generators are likely to develop patterns with millions of digits, right?
From the wikipedia on pseudo-random number generation:
A PRNG can be started from an arbitrary starting state using a seed state. It will always produce the same sequence thereafter when initialized with that state. The maximum length of the sequence before it begins to repeat is determined by the size of the state, measured in bits. However, since the length of the maximum period potentially doubles with each bit of 'state' added, it is easy to build PRNGs with periods long enough for many practical applications.
You could perhaps help create large random numbers by raising random numbers to random exponents
I assume you're suggesting something like populating the values of a scientific notation with random values?
E.g.: 1.58901231 x 10^5819203489
The problem with this is that your distribution is going to be logarithmic (or is that exponential? :) - same difference, it isn't even). You will never get a value that has the millionth digit set, yet contains a digit in the one's column.
you could try to generate a possibly non-uniform distribution on a possibly smaller range (using the real numbers, for instance) and transform
Not sure I understand this. Sounds like the same thing as the exponential solution, with the same problems. If you're talking about multiplying by a constant, then you'll get a lumpy distribution instead of a logarithmic (exponential?) one.
Suggested Solution
If you just need really big pseudo-random values, with a good distribution, use a PRNG algorithm with a larger state. The Periodicity of a PRNG is often the square of the number of bits, so it doesn't take that many bits to fill even a really large number.
From there, you can use your first solution:
You could randomly generate each digit and concatenate
Although I'd suggest that you use the full range of values returned by your PRNG (possibly 2^31 or 2^32), and populate a byte array with those values, splitting it up as necessary. Otherwise you might be throwing away a lot of bits of randomness. Also, scaling your values to a range (or using modulo) can easily screw up your distribution, so there's another reason to try to keep the max number of bits your PRNG can return. Be careful to pack your byte array full of the bits returned, though, or you'll again introduce lumpiness to your distribution.
The problem with those solution, though, is how to fill that (larger than normal) seed state with random-enough values. You might be able to use standard-size seeds (populated via time or GUID-style population), and populate your big-PRNG state with values from the smaller-PRNG. This might work if it isn't mission critical how well distributed your numbers are.
If you need truly cryptographically secure random values, the only real way to do it is use a natural form of randomness, such as that at http://www.random.org/. The disadvantages of natural randomness are availability, and the fact that many natural-random devices take a while to generate new entropy, so generating large amounts of data might be really slow.
You can also use a hybrid and be safe - natural-random seeds only (to avoid the slowness of generation), and PRNG for the rest of it. Re-seed periodically.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex