triple DES result length - encryption

If I encrypt emails so that I can store them in a database, the resulting string is longer than the email itself. Is there a maximum length to this resulting coded string? if so, does it depend on both key length and the email length? I need to know this so I can set my database fields to the correct length.
Thanks.

As Alex K. notes, for block ciphers (like DES), common modes will pad them out to a multiple of the block size. The block size for 3DES is 64-bits (8 bytes). The most common padding scheme is PKCS7, which pads the block with "n x n bytes." This is to say, if you need one bytes of padding, it pads with 0x01. If you need four bytes of padding, it pads with 0x04040404 (4x 4s). If your data is already the right length, it pads with a full block (8 bytes of 0x08 for 3DES).
The short version is that the padded cipher text for 3DES can be up to 8 bytes longer than the plaintext. If your encryption scheme is a typical, insecure implementation, this is the length. The fact that you're using 3DES (an obsolete cipher) makes it a bit more likely that it's also insecurely implemented, and so this is the answer.
But if your scheme is implemented well, then there could be quite a few other things attached to the message. There could be 8 bytes of initialization vector. There could be a salt of arbitrary length if you're using a password. There could be an HMAC. There could be lots of things that could add an arbitrary amount of space. (The RNCryptor format, for example, adds up to 82 bytes to the message.) So you need to know how your format is implemented.

Related

Cipher Feedback Mode: s-bit size confusion

I'm new to Crypto and am trying to make a block cipher encryption program. And I've stumbled across a doubt while writing the CFB part.
Supposing we have a 64-bit block cipher with. And we use 7-bit CFB along with it. Then, for each block, the block will be run nine times wherein nine 7-bit left shifts to the block (starting with IV) and subsequent additions take place.
This means after 9 rounds of CFB, 63 bits of the 64-bit block are processed. At the end we have one bit remaining. How do I process this bit? Do I take seven bits again or just encrypt the one remaining bit?
I have this same question for any case where the s is not a factor of the block size.
Looking back, this question seems silly.
I've now understood that the plaintext, if it's length isn't a multiple of the block size, must be padded to be so.

brute force encrypted file (XOR encryption)

I've got a file (GIF type if that matters) that is encrypted using the XOR algorithm.
The only thing i have is the encrypted text so no key or plain text. Now i was wondering how i can brute force this file to get the (symmetrical) key to eventually decrypt it.
IF i'm not mistaken it should be a 10 byte key. I've looked into using john the ripper but i almost only see that being used to brute force accounts.
Also if it is relevant, i do not have a file which could contain the key so it would have to self generate it's possible keys.
update:
now i found a way to generate all possible hexadecimal keys, now i'll have to encrypt the file again with the xor algorithm to decrypt it if this makes sense. Now performing this operation is not gonna be a problem but how do i check if the encryption to decrypt worked when it had the correct key so basicly it stops trying any further?
You (and #gusto2) are exactly correct on using the magic number: you immediately get the first 6 bytes of the key by knowing that the first 6 bytes are GIF89a.
Following the gif specification, we can learn more of the key. Here are a few tips, where I am numbering the bytes of your file from index 0 (so bytes 0-5 correspond to the magic number):
The last byte of a plaintext gif file is 0x3B. This possibly gives you one more byte of the key (depending on the file size, e.g. if the file size is equiv to 7, 8, or 9 modulo 10 then you get the key byte)
After the magic number is a 7 byte Logical Screen Descriptor. The first 4 bytes tell the width and height: if you knew the width and height of your gif, then you would be able to derive the remaining 4 unknown bytes of the key. Let's assume you don't know it.
Byte 10 of the file you will know because it corresponds to key byte 0 in your XOR encryption. When you decrypt that byte, the most significant bit is the Global Color Table Flag. If this bit is 0, then there is no Global Color Table -- which means that the next byte (byte 11) is either an image (byte is 0x2C) block or an extension (0x21) block. Again, you can decrypt this byte (because it corresponds to key byte 1) so you know exactly what it is.
Images come in image blocks starting with 0x2C and ending with 00.
There are two approaches you can do to decrypt this:
(1) Work by hand, as I am describing it above. You should be able to interpret the blocks, and look for the expected key byte values of 0x2c, 0x21, 0x00, and 0x3b. From there you can figure out what makes sense to be next, and derive key bytes by hand; or
(2) You brute force the last 4 bytes (2^32 possible values). For each guess, you decrypt the candidate gif image and then feed the result into a gif parser (example parser ) to see if it barfs or not. If it barfs, then you know that candidate is wrong. If it does not, then you have a possible real decryption and you save it. At the end, you look through your real candidates one-by-one (you don't expect many candidates) to see which one is the right decryption.
EDIT: You said that the width and height are 640 and 960. That means that bytes 6 and 7 will be the little endian representation of 640 and then 960 in little endian for bytes 8 and 9. You should have the entire key from this. Try it and let us know if it works. Make sure you get the endianess right!

Is it possible to tell which hash algorithm generated these strings?

I have pairs of email addresses and hashes, can you tell what's being used to create them?
aaaaaaa#aaaaa.com
BeRs114JrR0sBpueyEmnOWZfnLuigYTA
and
aaaaaaaaaaaaa.bbbbbbbbbbbb#cccccccccccc.com
4KoujQHr3N2wHWBLQBy%2b26t8GgVRTqSEmKduST9BqPYV6wBZF4IfebJS%2fxYVvIvR
and
r.r#a.com
819kwGAcTsMw3DndEVzu%2fA%3d%3d
First, the obvious even if you know nothing about cryptography: the percent signs are URL encoding; decoding that gives
BeRs114JrR0sBpueyEmnOWZfnLuigYTA
4KoujQHr3N2wHWBLQBy+26t8GgVRTqSEmKduST9BqPYV6wBZF4IfebJS/xYVvIvR
819kwGAcTsMw3DndEVzu/A==
And that in turn is base64. The lengths of the encodings wrt the length of the original strings are
plaintext encoding
17 24
43 48
10 16
More samples would give more confidence, but it's fairly clear that the encoding pads the plaintext to a multiple of 8 bytes. That suggest a block cipher (it can't be a hash since a hash would be fixed-size). The de facto standard block algorithm is AES which uses 16-byte blocks; 24 is not a multiple of 16 so that's out. The most common block algorithm with a block size of 8 (which fits the data) is DES; 3DES or blowfish or something even rarer is also a possibility but DES is what I'd put my money on.
Since it's a cipher, there must be a key somewhere. It might be in a configuration file, or hard-coded in the source code. If all you have is the binary, you should be able to locate it with the help of a debugger. With DES, you could find the key by brute force (because a key is only 56 bits and that's doable by renting a bit of CPU time on Amazon) but finding it in the program would be easier.
If you want to reproduce the algorithm then you'll also need to figure out the mode of operation. Here one clue is that the encoding is never more than 7 bytes longer than the plaintext, so there's no room for an initialization vector. If the developers who made that software did a horrible job they might have used ECB. If they made a slightly less horrible job they might have used CBC or (much less likely) some other mode with a constant IV. If they did an again slightly less horrible job then the IV may be derived from some other characteristic of the account. You can refine the analysis by testing some patterns:
If the encoding of abcdefghabcdefgh#example.com (starting with two identical 8-byte blocks) starts with two identical 8-byte blocks, it's ECB.
If the encoding of abcdefgh1#example.com and abcdefgh2#example.com (differing at the 9th character) have identical first blocks, it's CBC (probably) with a constant IV.
Another thing you'll need to figure out is the padding mode. There are a few common ones. That's a bit harder to figure out as a black box except with ECB.
There are some tools online, and also some open source projects. For example:
https://code.google.com/archive/p/hash-identifier/
http://www.insidepro.com/

Cipher to generate URL safe ciphertext without encoding

I want to encrypt small serialized data structures (~256 bytes) so I can pass them around (especially in URLs) safely. My current approach is to use a symmetric block cipher, and then to base 64 encode, then URL encode the cipher text. This yields an encoded cipher text that is (unsurprisingly) quite a bit longer than the original data structure. The length of these encoded ciphers is a bit of a usability problem; ideally I'd like the cipher text to be around the same length as the input text.
Is there a block cipher that can be configured to constrain the values of the output bytes to be in the URL-safe range? I assume there would be a security trade-off involved if there is.
For a given key K, a cipher has to produce a different ciphertext for each plaintext. If your message space is 256 bytes, the cipher has to be able to produce at least 256^256 different messages. This will require at least 256 bytes, and any reduction in the size of the output alphabet requires longer messages.
As you've seen, you can do some encoding afterward to avoid certain output symbols, at the cost of increased length. Furthermore, you would pay the same cost if the encoding were part of the encryption algorithm proper. That's why this isn't a feature of any encryption algorithm.
As others have mentioned, the only real answer is to reduce the size of the data you are encrypting so that you need to encode less data. (Either that or don't put the data in url's in the first place e.g. store the data in a database and put a unique id in the url). So compress > encrypt > encode.
If your data structure is 256 bytes long encrypting it with a block cipher of 8 bytes increases it up to 8 bytes (depending of the concrete input length).
Therefore before applying base64 you have up to 264 bytes which are increased by the base64 encoding up to 352 bytes.
Therefore as you can see the most overhead is created by the base64 encoding. There are some slightly more effective encodings available like base91 - but they are very uncommon.
If size matters I would recommend to compress the data before encrypting it.
URL encoding will not significantly expand a base64 encoded string, since 62 of the 64 characters do not need to be modified. However, you can use modified base64 encoding to do a little better. This encoding uses the '-' and '_' characters in place of the '+' and '/' characters to yield a slight efficiency improvement.
The cipher itself is not causing any significant data expansion. It will pad the data to be a multiple of the block length, but that is insignificant in your case. You might try compressing the input prior to encryption. 256 bytes is not much but you might see some improvement.

encryption of a single character

What is the minimum number of bits needed to represent a single character of encrypted text.
eg, if I wanted to encrypt the letter 'a', how many bits would I require. (assume there are many singly encrypted characters using the same key.)
Am I right in thinking that it would be the size of the key. eg 256 bits?
Though the question is somewhat fuzzy, first of all it would depend on whether you use a stream cipher or a block cipher.
For the stream cipher, you would get the same number of bits out that you put in - so the binary logarithm of your input alphabet size would make sense. The block cipher requires input blocks of a fixed size, so you might pad your 'a' with zeroes and encrypt that, effectively having the block size as a minimum, like you already proposed.
I'm afraid all the answers you've had so far are quite wrong! It seems I can't reply to them, but do ask if you need more information on why they are wrong. Here is the correct answer:
About 80 bits.
You need a few bits for the "nonce" (sometimes called the IV). When you encrypt, you combine key, plaintext and nonce to produce the ciphertext, and you must never use the same nonce twice. So how big the nonce needs to be depends on how often you plan on using the same key; if you won't be using the key more than 256 times, you can use an 8 bit nonce. Note that it's only the encrypting side that needs to ensure it doesn't use a nonce twice; the decrypting side only needs to care if it cares about preventing replay attacks.
You need 8 bits for the payload, since that's how many bits of plaintext you have.
Finally, you need about 64 bits for the authentication tag. At this length, an attacker has to try on average 2^63 bogus messages minimum before they get one accepted by the remote end. Do not think that you can do without the authentication tag; this is essential for the security of the whole mode.
Put these together using AES in a chaining mode such as EAX or GCM, and you get 80 bits of ciphertext.
The key size isn't a consideration.
You can have the same number of bits as the plaintext if you use a one-time pad.
This is hard to answer. You should definitely first read up on some fundamentals. You can 'encrypt' an 'a' with a single bit (Huffman encoding-style), and of course you could use more bits too. A number like 256 bits without any context is meaningless.
Here's something to get you started:
Information Theory -- esp. check out Shannon's seminal paper
One Time Pad -- infamous secure, but impractical, encryption scheme
Huffman encoding -- not encryption, but demonstrates the above point

Resources