How to create fixed length decryption - encryption

I would like to ask if there is a way to encrypt text (no matter how long it is) and ALWAYS get a fixed length decryption? I am not referring to hashing but to encryption/decryption.
Example:
Suppose that we want to encrypt (not hash) a text which is 60 characters long. The result will be a string which is 32 characters long. We can then decrypt the string to get the original text!
We now want to encrypt (not hash) a text which is 200 characters long. The result will be a string which is again 32 characters long. We can then decrypt the string to get the original text!
Is that somehow possible?
Thank you

As the comments indicate, this is impossible. For the underlying reason that this is impossible, see the Pigeonhole Principle. In your example, there are 256^200 inputs and 256^32 outputs. Therefore there must be at least 1 output that has more than 1 input, and therefore is impossible to reverse. Since the number of inputs is massively larger than the number of outputs (and in the general case, is unbounded), almost all cipher texts are necessarily impossible to decrypt.

Related

Is it possible to write an Enigma encryption algorithm that can use all alphanumeric as input but does not output ambiguous characters?

This is about Enigma encryption, I'm guessing the number of rotors doesn't matter but I'm using 3.
I am working with what's basically a coded version of the old mechanical enigma style encryption machines. The concept is rather old but before I get too far into learning it, I was wondering if it would be possible to be able to encrypt using all characters 0-9 a-z and A-Z but the encrypted text itself will only be a subset of these characters? I'm trying to replace a subset of characters (around 10 total) from the encrypted output, while still being able to get back to those characters if they were part of the input?
You can disambiguate by adding 1 to 2-character mapping for ambiguous symbols: O -> A1; 0 -> A2; other ambiguous symbols; A->AA. This is basically just like escaping in strings: we usually can’t put new line inside the string, so we represent it as \n. \ is represented as \\
If you’re working with encrypted data (so the probabilities of all characters are uniformly distributed and characters cannot be predicted) then you can’t compress the ciphertext. If you can compress it, then you’ve noticed some kind of pattern in the text and partially broken the encryption.
If you want to reduce the ciphertext’s alphabet, then you must increase the length of the ciphertext, otherwise you’ve successfully compressed it.

Is it possible to ensure that strings encrypted with AES128/CBC/PKCS5Padding never have trailing equals characters

Is there a deterministic way to ensure that any encrypted/encoded String created with AES128/CBC/PKCS5Padding never has '=' characters padding the end?
Given a crypto util which is a black-box:
String originalValue = "this is a test";
String encryptedValue = TheCryptoUtil.encrypt(original);
The encryptedValue will often look like:
R2gDfGwGvkqZWHH4UF81rg==
Is there a way of varying "originalValue" e.g. by padding the input with whitespaces, such that, regardless of the keys used by TheCryptoUtil, the output will not have any "=" at the end?
Yes, it is possible. I have no idea why you want to do this, but it is possible. Things to keep in mind:
The output of AES-128 with PKCS5Padding will always be some multiple of 16. That is, len(ciphertext) % 16 == 0.
The equals signs that you see on the end of the ciphertext have nothing to do with AES. They are actually base64 padding.
Base64 takes, as input, blocks of 3 bytes and converts them into blocks of 4 characters, where these output 4 characters are any of the 64 defined characters.
This means that the number of bytes of output determines whether or not the base64 of the output will have padding. For example, if you encrypt the message The quick brown fox jumps over the lazy dog., I'd say it is fairly likely (depending on your "black box encryption") that the result will not have any base64 padding.
So, the fact that base64 always produces an encoded string of length that is divisible by 4 means that we can easily determine that length of the original with or without padding. In fact, base64 padding is only part of the spec to help with concatenation issues.
I expect you'll be able to figure out the rest from here!
Luke's answer is correct to say that AES has nothing to do with the padding, it is however missing some of the information below.
Base 64 usually produces padding if the input is not dividable by 3. The padding is just there to make sure that the output only consists of blocks of 4 characters from the base 64 alphabet. Base 64 encodes 64 values or 6 bits per character. 3 bytes are 24 bits, which is dividable by 6 giving you 4 characters required.
The naive thing to do would be to adjust the AES output so that it is dividable by 3. This can however require up to 34 additional padding bytes on the input of AES/CBC/PKCS5Padding, which is stupidity at best.
Not all base 64 schemes actually use padding, so the easiest way of accomplishing this is simply to select a base 64 scheme that doesn't perform the padding at all. Java for instance has a withoutPadding() configuration method.
If the base 64 encoder cannot handle that then it is possible to simply remove the padding after it has been generated. Before decoding you can add back one or two '=' characters until you count a number of base 64 characters (excluding whitespace, see notes) that is dividable by 4.
Note that base 64 usually also uses / and + in the alphabet (there are only 2 * 26 + 10 = 62 characters if you use upper-/lowercase chars & digits). You could look up a URL-safe base 64 encoding - called base64url - if that is what you are after.
Base64 for MIME may also use spaces and end-of-line characters, by the way.

RSA on ASCII message problems with '\0'

I want to encrypt and decrypt ASCII messages using an RSA algorithm written in assembly.
I read that for security and efficiency reasons the encryption is normally not called character-wise but a number of characters is grouped and encrypted together (e.g. wikipedia says that 3 chars are grouped).
Let us assume that we want to encrypt the message "aaa" grouping 2 characters.
"aaa" is stored as 61616100.
If we group two characters and encrypt the resulting halfwords the result for the 6161 block can in fact be something like 0053. This will result in an artificial second '\0' character which corrupts the resulting message.
Is there any way to work around this problem?
Using padding or anything similar is unfortunately not an option since I am required to use the same function for encrypting and decrypting.
The output of RSA is a number. Usually this number is encoded as an octet string (or byte array). You should not treat the result as a character string. You need to treat it as a byte array with the same length as the modulus (or at least the length of the modulus in bytes).
Besides the result containing a zero (null-terminator) the characters may have any value, including non-printable characters such as control characters and 7F. If you want to treat the result as a printable string, convert to hex or base64.

What is the name for encoding/encrypting with noise padding?

I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.

repetition in encrypted data -- red flag?

I have some base-64 encoded encrypted data and noticed a fair amount of repetition. In a (approx) 200-character-long string, a certain base-64 character is repeated up to 7 times in several separate repeated runs.
Is this a red flag that there is a problem in the encryption? According to my understanding, encrypted data should never show significant repetition, even if the plaintext is entirely uniform (i.e. even if I encrypt 2 GB of nothing but the letter A, there should be no significant repetition in the encrypted version).
According to the binomial distribution, there is about a 2.5% chance that you'd see one character from a set of 64 appear seven times in a series of 200 random characters. That's a small chance, but not negligible. With more information, you might raise your confidence from 97.5% to something very close to 100% … or find that the cipher text really is uniformly distributed.
You say that the "character is repeated up to 7 times" in several separate repeated runs. That's not enough information to say whether the cipher text has a bias. Instead, tell us the total number of times the character appeared, and the total number of cipher text characters. For example, "it appeared a total of 3125 times in 1000 runs of 200 characters each."
Also, you need to be sure that you are talking about the raw output of a cipher. Cipher text is often encapsulated in an "envelope" like that defined by the Cryptographic Message Syntax. Of course, this enclosing structure will have predictable patterns.
Well I guess it depends. Repetition in general is bad thing if it represents the same data.
Considering you are encoding it have you looked at data to see if you have something that repeats in those counts?
In order to understand better you gotta know what kind of encryption does it use.
It could be just coincidence that they are repeating.
But if repetition comes from same data, then it can be a red flag because then frequency counts can be used to decode it.
What kind of encryption are you using? Home made or some industry standard?
It depends on how are you encrypting your data.
Base64 encoding a string may count as light obfuscation, but it is NOT encryption. The purpose of Base64 encoding is to allow any sort of binary data to be encoded as a safe ASCII string.

Resources