Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.
If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.
But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?
I would use The "URL and Filename safe" Base 64 Alphabet.
Base 64 uses two character sets.
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.
If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.
Depending on the language you are using, a lot of them have built in encode and decode functions.
You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.
If you want it tight you can use a base-36 encoding (from 0 to Z).
Using the hint of base36 I currently use something like this (in Python):
>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.
Related
This is about Enigma encryption, I'm guessing the number of rotors doesn't matter but I'm using 3.
I am working with what's basically a coded version of the old mechanical enigma style encryption machines. The concept is rather old but before I get too far into learning it, I was wondering if it would be possible to be able to encrypt using all characters 0-9 a-z and A-Z but the encrypted text itself will only be a subset of these characters? I'm trying to replace a subset of characters (around 10 total) from the encrypted output, while still being able to get back to those characters if they were part of the input?
You can disambiguate by adding 1 to 2-character mapping for ambiguous symbols: O -> A1; 0 -> A2; other ambiguous symbols; A->AA. This is basically just like escaping in strings: we usually can’t put new line inside the string, so we represent it as \n. \ is represented as \\
If you’re working with encrypted data (so the probabilities of all characters are uniformly distributed and characters cannot be predicted) then you can’t compress the ciphertext. If you can compress it, then you’ve noticed some kind of pattern in the text and partially broken the encryption.
If you want to reduce the ciphertext’s alphabet, then you must increase the length of the ciphertext, otherwise you’ve successfully compressed it.
Is there a deterministic way to ensure that any encrypted/encoded String created with AES128/CBC/PKCS5Padding never has '=' characters padding the end?
Given a crypto util which is a black-box:
String originalValue = "this is a test";
String encryptedValue = TheCryptoUtil.encrypt(original);
The encryptedValue will often look like:
R2gDfGwGvkqZWHH4UF81rg==
Is there a way of varying "originalValue" e.g. by padding the input with whitespaces, such that, regardless of the keys used by TheCryptoUtil, the output will not have any "=" at the end?
Yes, it is possible. I have no idea why you want to do this, but it is possible. Things to keep in mind:
The output of AES-128 with PKCS5Padding will always be some multiple of 16. That is, len(ciphertext) % 16 == 0.
The equals signs that you see on the end of the ciphertext have nothing to do with AES. They are actually base64 padding.
Base64 takes, as input, blocks of 3 bytes and converts them into blocks of 4 characters, where these output 4 characters are any of the 64 defined characters.
This means that the number of bytes of output determines whether or not the base64 of the output will have padding. For example, if you encrypt the message The quick brown fox jumps over the lazy dog., I'd say it is fairly likely (depending on your "black box encryption") that the result will not have any base64 padding.
So, the fact that base64 always produces an encoded string of length that is divisible by 4 means that we can easily determine that length of the original with or without padding. In fact, base64 padding is only part of the spec to help with concatenation issues.
I expect you'll be able to figure out the rest from here!
Luke's answer is correct to say that AES has nothing to do with the padding, it is however missing some of the information below.
Base 64 usually produces padding if the input is not dividable by 3. The padding is just there to make sure that the output only consists of blocks of 4 characters from the base 64 alphabet. Base 64 encodes 64 values or 6 bits per character. 3 bytes are 24 bits, which is dividable by 6 giving you 4 characters required.
The naive thing to do would be to adjust the AES output so that it is dividable by 3. This can however require up to 34 additional padding bytes on the input of AES/CBC/PKCS5Padding, which is stupidity at best.
Not all base 64 schemes actually use padding, so the easiest way of accomplishing this is simply to select a base 64 scheme that doesn't perform the padding at all. Java for instance has a withoutPadding() configuration method.
If the base 64 encoder cannot handle that then it is possible to simply remove the padding after it has been generated. Before decoding you can add back one or two '=' characters until you count a number of base 64 characters (excluding whitespace, see notes) that is dividable by 4.
Note that base 64 usually also uses / and + in the alphabet (there are only 2 * 26 + 10 = 62 characters if you use upper-/lowercase chars & digits). You could look up a URL-safe base 64 encoding - called base64url - if that is what you are after.
Base64 for MIME may also use spaces and end-of-line characters, by the way.
I searched a lot to find an encryption algorithm which its encrypted results do not include slash character. Anything I've tested so far (like this, this and this) generate strings which include slash character and therefore they make asp.net (web forms) routing misunderstand the way it should interpret the route.
Can you please help by introducing a symmetric encryption algorithm which generate encrypted strings that can safely be used for encrypting query strings without misguiding asp.net routing?
Encryption algorithms generally produce random (looking) bytes. These bytes can have any value. You can encode this value, for instance using hexadecimals or base 64. With hexadecimals you have already code that only contains 0..9 and a..f (in upper or lower case). However, hexadecimal encoding is not very efficient, doubling the ciphertext.
Base 64 uses 64 characters: A..Z, a..z, 0..9, + and /, and sometimes a padding character =. It is however very easy to replace the URL unsafe + and / characters with other ones, e.g. - and _ according to RFC 4648. You can also remove any = characters at the end, although you may have to put them back (until you get a multiple of 4 base 64 characters) depending on the base 64 decoding routine. Base 64 uses 4 characters for 3 bytes, so it expands the ciphertext by 33%.
This is something I have been thinking while reading programming books and in computer science class at school where we learned how to convert decimal values into hexadecimal.
Can someone please tell me what are the advantages of using hexadecimal values and why we use them in programmnig?
Thank you.
In many cases (like e.g. bit masks) you need to use binary, but binary is hard to read because of its length. Since hexadecimal values can be much easier translated to/from binary than decimals, you could look at hex values as kind of shorthand notation for binary values.
It certainly depends on what you're doing.
It comes as an extension of base 2, which you probably are familiar with as essential to computing.
Check this out for a good discussion of
several applications...
https://softwareengineering.stackexchange.com/questions/170440/why-use-other-number-bases-when-programming/
The hexadecimal digit corresponds 1:1 to a given pattern of 4 bits. With experience, you can map them from memory. E.g. 0x8 = 1000, 0xF = 1111, correspondingly, 0x8F = 10001111.
This is a convenient shorthand where the bit patterns do matter, e.g. in bit maps or when working with i/o ports. To visualize the bit pattern for 169d is in comparison more difficult.
A byte consists of 8 binary digits and is the smallest piece of data that computers normally work with. All other variables a computer works with are constructed from bytes. For example; a single character can be stored in a single byte, and a 32bit integer consists of 4 bytes.
As bytes are so fundamental we want a way to write down their value as neatly and efficiently as possible. One option would be to use binary, but then we would need a lot of digits. This takes up a lot of space and can be confusing when many numbers are written in sequence:
200 201 202 == 11001000 11001001 11001010
Using hexadecimal notation, we can write every byte using just two digits:
200 == C8
Also, as 16 is a power of 2, it is easy to convert between hexadecimal and binary representations in your head. This is useful as sometimes we are only interested in a single bit within the byte. As a simple example, if the first digit of a hexadecimal representation is 0 we know that the first four binary digits are 0.
the password string is some kind of like that
MTY5LTYtNjEtMjAxLTkwLTE3MS05My0yMDAtMTMxLTE5Mi01My0xNjItMC0yMjAtMTgxLTIyNg==
I tried base 64 encoder and it gives me:
169-6-61-201-90-171-93-200-131-192-53-162-0-220-181-226
Looks like encode by ASCII Code
I put the numbers on ASCII code list gives me :
©=ÉZ«]ȃÀ5¢Üµâ
But this not the password that i looked.
Does anyone know the solution.
I am not an expert sorry for bad explaining.
The string contains 16 numbergroups and each number is between 0 and 255. So it looks like 16 bytes. And 16 bytes / 128 bits is the size of an md5 hash. So that would be my guess.
While a crypto hash function can't be easily reversed, there are online rainbowtable services which can revert them for short or common inputs. But if the programmer who wrote it did it right (used a salt and many iterations) they won't help.
I'd split it in 16 numbers, than convert these to a byte array of size 16, and then hexencode them, since that's the form most programs will accept. Edit: See Kenny's comment
And then search for some website which allows search in rainbow tables. And pray...