ASP.net query encryption method that doesn't produce slash character - asp.net

I searched a lot to find an encryption algorithm which its encrypted results do not include slash character. Anything I've tested so far (like this, this and this) generate strings which include slash character and therefore they make asp.net (web forms) routing misunderstand the way it should interpret the route.
Can you please help by introducing a symmetric encryption algorithm which generate encrypted strings that can safely be used for encrypting query strings without misguiding asp.net routing?

Encryption algorithms generally produce random (looking) bytes. These bytes can have any value. You can encode this value, for instance using hexadecimals or base 64. With hexadecimals you have already code that only contains 0..9 and a..f (in upper or lower case). However, hexadecimal encoding is not very efficient, doubling the ciphertext.
Base 64 uses 64 characters: A..Z, a..z, 0..9, + and /, and sometimes a padding character =. It is however very easy to replace the URL unsafe + and / characters with other ones, e.g. - and _ according to RFC 4648. You can also remove any = characters at the end, although you may have to put them back (until you get a multiple of 4 base 64 characters) depending on the base 64 decoding routine. Base 64 uses 4 characters for 3 bytes, so it expands the ciphertext by 33%.

Related

Is it possible to write an Enigma encryption algorithm that can use all alphanumeric as input but does not output ambiguous characters?

This is about Enigma encryption, I'm guessing the number of rotors doesn't matter but I'm using 3.
I am working with what's basically a coded version of the old mechanical enigma style encryption machines. The concept is rather old but before I get too far into learning it, I was wondering if it would be possible to be able to encrypt using all characters 0-9 a-z and A-Z but the encrypted text itself will only be a subset of these characters? I'm trying to replace a subset of characters (around 10 total) from the encrypted output, while still being able to get back to those characters if they were part of the input?
You can disambiguate by adding 1 to 2-character mapping for ambiguous symbols: O -> A1; 0 -> A2; other ambiguous symbols; A->AA. This is basically just like escaping in strings: we usually can’t put new line inside the string, so we represent it as \n. \ is represented as \\
If you’re working with encrypted data (so the probabilities of all characters are uniformly distributed and characters cannot be predicted) then you can’t compress the ciphertext. If you can compress it, then you’ve noticed some kind of pattern in the text and partially broken the encryption.
If you want to reduce the ciphertext’s alphabet, then you must increase the length of the ciphertext, otherwise you’ve successfully compressed it.

Is it possible to ensure that strings encrypted with AES128/CBC/PKCS5Padding never have trailing equals characters

Is there a deterministic way to ensure that any encrypted/encoded String created with AES128/CBC/PKCS5Padding never has '=' characters padding the end?
Given a crypto util which is a black-box:
String originalValue = "this is a test";
String encryptedValue = TheCryptoUtil.encrypt(original);
The encryptedValue will often look like:
R2gDfGwGvkqZWHH4UF81rg==
Is there a way of varying "originalValue" e.g. by padding the input with whitespaces, such that, regardless of the keys used by TheCryptoUtil, the output will not have any "=" at the end?
Yes, it is possible. I have no idea why you want to do this, but it is possible. Things to keep in mind:
The output of AES-128 with PKCS5Padding will always be some multiple of 16. That is, len(ciphertext) % 16 == 0.
The equals signs that you see on the end of the ciphertext have nothing to do with AES. They are actually base64 padding.
Base64 takes, as input, blocks of 3 bytes and converts them into blocks of 4 characters, where these output 4 characters are any of the 64 defined characters.
This means that the number of bytes of output determines whether or not the base64 of the output will have padding. For example, if you encrypt the message The quick brown fox jumps over the lazy dog., I'd say it is fairly likely (depending on your "black box encryption") that the result will not have any base64 padding.
So, the fact that base64 always produces an encoded string of length that is divisible by 4 means that we can easily determine that length of the original with or without padding. In fact, base64 padding is only part of the spec to help with concatenation issues.
I expect you'll be able to figure out the rest from here!
Luke's answer is correct to say that AES has nothing to do with the padding, it is however missing some of the information below.
Base 64 usually produces padding if the input is not dividable by 3. The padding is just there to make sure that the output only consists of blocks of 4 characters from the base 64 alphabet. Base 64 encodes 64 values or 6 bits per character. 3 bytes are 24 bits, which is dividable by 6 giving you 4 characters required.
The naive thing to do would be to adjust the AES output so that it is dividable by 3. This can however require up to 34 additional padding bytes on the input of AES/CBC/PKCS5Padding, which is stupidity at best.
Not all base 64 schemes actually use padding, so the easiest way of accomplishing this is simply to select a base 64 scheme that doesn't perform the padding at all. Java for instance has a withoutPadding() configuration method.
If the base 64 encoder cannot handle that then it is possible to simply remove the padding after it has been generated. Before decoding you can add back one or two '=' characters until you count a number of base 64 characters (excluding whitespace, see notes) that is dividable by 4.
Note that base 64 usually also uses / and + in the alphabet (there are only 2 * 26 + 10 = 62 characters if you use upper-/lowercase chars & digits). You could look up a URL-safe base 64 encoding - called base64url - if that is what you are after.
Base64 for MIME may also use spaces and end-of-line characters, by the way.

RSA on ASCII message problems with '\0'

I want to encrypt and decrypt ASCII messages using an RSA algorithm written in assembly.
I read that for security and efficiency reasons the encryption is normally not called character-wise but a number of characters is grouped and encrypted together (e.g. wikipedia says that 3 chars are grouped).
Let us assume that we want to encrypt the message "aaa" grouping 2 characters.
"aaa" is stored as 61616100.
If we group two characters and encrypt the resulting halfwords the result for the 6161 block can in fact be something like 0053. This will result in an artificial second '\0' character which corrupts the resulting message.
Is there any way to work around this problem?
Using padding or anything similar is unfortunately not an option since I am required to use the same function for encrypting and decrypting.
The output of RSA is a number. Usually this number is encoded as an octet string (or byte array). You should not treat the result as a character string. You need to treat it as a byte array with the same length as the modulus (or at least the length of the modulus in bytes).
Besides the result containing a zero (null-terminator) the characters may have any value, including non-printable characters such as control characters and 7F. If you want to treat the result as a printable string, convert to hex or base64.

should I use utf-8 or utf-16 or utf-32 for my multilingual cms?

Besides the difference in how characters are stored, are there any special characters in any language utf-32 can display and utf-8 cannot?
All UTF encodings can represent the same range of code points (0 to 0x10FFFF). So, the same characters can be encoded by any of them.
Whether they can be "displayed" is an entirely different question. That's nothing to do with the encoding, and a function of the font family used. I am not sure that any font has glyphs for every single Unicode code point. But I assume you meant "represented".
They do vary in how many bytes they'll need to represent a given string. UTF-8 is almost always the shortest for non-Asian languages. For those, UTF-16 might win (I haven't really "benchmarked".) I can't imagine a realistic case where UTF-32 would be optimal.
Is there any character one of them can't represent?
In theory: No.
All of those formats can represent all Unicode code points.
In practice: Depends.
The Windows API uses UCS-2 (which is pretty much the first UTF-16 chunk) and doesn't always handle surrogates correctly. So you might want to use UTF-16 to have your program act as "normal" as possible compared to other programs, instead of truncating high-ranging UTF-32 code points manually.
Anything else?
Yes: Use UTF-8!
It's endian-less, so you it avoids byte-order issues, which are a pain in the rear.
Of course, if you're on Windows then you need to convert to UTF-16 before using them.
UTF-8, UTF-16 and UTF-32 all can be used to represent all Unicode datapoints. So no, there are no special characters that can be represented in UTF-32 and not in UTF-8.
1) UTF-8 can be backward compatible with ASCII for regular english characters, this can be an advantage when your client just have english characters.
2) UTF-8 is good in saving network bandwidth if you have ASCII characters more than non-English characters.
3) UTF-16 would be good if you have more non-English characters in terms of saving Storage space.
I suggest to use UTF-8 based on #1 above.

How to encode a large number (in an URL)?

Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.
If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.
But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?
I would use The "URL and Filename safe" Base 64 Alphabet.
Base 64 uses two character sets.
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.
If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.
Depending on the language you are using, a lot of them have built in encode and decode functions.
You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.
If you want it tight you can use a base-36 encoding (from 0 to Z).
Using the hint of base36 I currently use something like this (in Python):
>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.

Resources