How should I encode two integers into lowercase alphanumeric characters? - math

I have two integers from 0 to infinity (in practice probably less than 1 million, but don't want to have any limitation). I want to encode the two integers into a lowercase alphanumeric string (which may contain a dash, but shouldn't be just numbers). Also I want the strings to be somewhat random (i.e. don't want to always prefix every int with "a" for example). The most important requirement is that I need to be able to easily decode this alphanumeric string.
I would normally just use md5 hashing but it doesn't work for this case as I can't go back from md5 to the original integers. I also considered Base64 but it doesn't work because strings may include uppercase.
Is there a known hashing algorithm that satisfies these requirements?

If you're just looking to change the integer's base
Instead of base64 you can use base 16 (aka hexadecimal):
>>> hex(1234)[2:]
'4d2'
>>> int('4d2', 16)
1234
or base32:
>>> b32_encode(1234)
b'atja===='
>>> b32_decode(b'atja====')
1234
If you're looking to obscure the integer
The simplest method is to multiple the integer by some number and then xor with some greater, randomized key:
>>> key = 0xFa907fA06 # The result of punching my keyboard.
>>> number = 15485863
>>> obscured = (1234 * number) ^ key
50902290680
>>> hex(obscured)
'bda0350f8'
>>> (50902290680 ^ key) / number
1234
Wanting more robust obfuscation than that requires a tad bit more research, in which case this similar question may be a good place to start.

Related

Is it possible to write an Enigma encryption algorithm that can use all alphanumeric as input but does not output ambiguous characters?

This is about Enigma encryption, I'm guessing the number of rotors doesn't matter but I'm using 3.
I am working with what's basically a coded version of the old mechanical enigma style encryption machines. The concept is rather old but before I get too far into learning it, I was wondering if it would be possible to be able to encrypt using all characters 0-9 a-z and A-Z but the encrypted text itself will only be a subset of these characters? I'm trying to replace a subset of characters (around 10 total) from the encrypted output, while still being able to get back to those characters if they were part of the input?
You can disambiguate by adding 1 to 2-character mapping for ambiguous symbols: O -> A1; 0 -> A2; other ambiguous symbols; A->AA. This is basically just like escaping in strings: we usually can’t put new line inside the string, so we represent it as \n. \ is represented as \\
If you’re working with encrypted data (so the probabilities of all characters are uniformly distributed and characters cannot be predicted) then you can’t compress the ciphertext. If you can compress it, then you’ve noticed some kind of pattern in the text and partially broken the encryption.
If you want to reduce the ciphertext’s alphabet, then you must increase the length of the ciphertext, otherwise you’ve successfully compressed it.

RSA on ASCII message problems with '\0'

I want to encrypt and decrypt ASCII messages using an RSA algorithm written in assembly.
I read that for security and efficiency reasons the encryption is normally not called character-wise but a number of characters is grouped and encrypted together (e.g. wikipedia says that 3 chars are grouped).
Let us assume that we want to encrypt the message "aaa" grouping 2 characters.
"aaa" is stored as 61616100.
If we group two characters and encrypt the resulting halfwords the result for the 6161 block can in fact be something like 0053. This will result in an artificial second '\0' character which corrupts the resulting message.
Is there any way to work around this problem?
Using padding or anything similar is unfortunately not an option since I am required to use the same function for encrypting and decrypting.
The output of RSA is a number. Usually this number is encoded as an octet string (or byte array). You should not treat the result as a character string. You need to treat it as a byte array with the same length as the modulus (or at least the length of the modulus in bytes).
Besides the result containing a zero (null-terminator) the characters may have any value, including non-printable characters such as control characters and 7F. If you want to treat the result as a printable string, convert to hex or base64.

how to find the number of possibilities of a hash

if i have a hash say like this: 0d47aeda9d97686ab3da96bae2c93d078a5ab253
how do i do the math to find out the number of possibilities to try if i start with 0000000000000000000000000000000000000000 to 9999999999999999999999999999999999999999 which is the general length of a sha1.
The number of possibilities would be 2^(X) where X is the number of bits in the hash.
In the normal hexadecimal string representation of the hash value like the one you gave, each character is 4 bits, so it would be 2^(4*len) where len is the string length of the hash value. In your example, you have a 40 character SHA1 digest, which corresponds to 160 bits, or 2^160 == 1.4615016373309029182036848327163e+48 values.
An SHA-1 hash is 160 bits, so there are 2^160 possible hashes.
Your hexadecimal digit range is 0 through f.
Then it's simply 16^40 or however many characters it contains
Recall that a hash function accepts inputs of arbitrary length. A good cryptographic hash function will seem to assign a "random" hash result to any input. So if the digest is N bits long (for SHA-1, N=160), then every input will be hashed to one of 2^N possible results, in a manner we'll treat as random.
That means that the expectation for finding a preimage for your hash result is running though 2^N inputs. They don't have to be specifically the range that you suggested - any 2^N distinct inputs are fine.
This also means that 2^N inputs don't guarantee that you'll find a preimage - each try is random, so you might miss your 1-in-2^N chance in every single one of those 2^N inputs (just like flipping a coin twice doesn't guarantee you'll get heads at least once). But you can figure out how many inputs are required to find a preimage for the hash with probability p or greater - with p being as close to one as you desire (just not actually 1).
maximum variations, with repeating and with attention to the order are defined as n^k. in your case this would mean 10^40, which can't be correct for SHA1. Reading Wikipedia it sais SHA1 has a max. complexity for a collision based attack of 2^80, using different technices researches were allready successfull with 2^51 collisions, so 10^40 seems a bit much.

How to encode a large number (in an URL)?

Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.
If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.
But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?
I would use The "URL and Filename safe" Base 64 Alphabet.
Base 64 uses two character sets.
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.
If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.
Depending on the language you are using, a lot of them have built in encode and decode functions.
You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.
If you want it tight you can use a base-36 encoding (from 0 to Z).
Using the hint of base36 I currently use something like this (in Python):
>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.

keyless ciphers of ROT13/47 ilk

Do you know of any other ciphers that performs like the ROT47 family?
My major requirement is that it'd be keyless.
Sounds like you might be looking for some "classical cryptography" solutions.
SUBSTITUTION CIPHERS are encodings where one character is substituted with another. E.g. A->Y, B->Q, C->P, and so on. The "Caesar Cipher" is a special case where the order is preserved, and the "key" is the offset. In the rot13/47 case, the "key" is 13 or 47, respectively, though it could be something like 3 (A->D, B->E, C->F, ...).
TRANSPOSITION CIPHERS are ones that don't substitute letters, but ones that rearrange letters in a pre-defined way. For example:
CRYPTOGRAPHY
may be written as
C Y T G A H
R P O R P Y
So the ciphered output is created by reading the two lines left to right
CYTGAHRPORPY
Another property of rot13/47 is that it's REVERSABLE:
encode(encode(plaintext)) == plaintext
If this is the property you want, you could simply XOR the message with a known (previously decided) XOR value. Then, XOR-ing the ciphertext with the same value will return the original plaintext. An example of this would be the memfrob function, which just XORs a buffer with the binary representation of the number 42.
You also might check out other forms of ENCODINGS, such as Base64 if that's closer to what you're looking for.
!! Disclaimer - if you have data that you're actually trying to protect from anyone, don't use any of these methods. While entertaining, all of these methods are trivial to break.

Resources