This should be a simple question, but I can't find any examples or figure out the answer from the openssl docs.
I want to encrypt exactly 128 bits, which should fit in one encryption block.
So I call EVP_EncyptInit_ex, and then what?
Do I call EVP_EncryptUpdate_ex (to encrypt the 128 bit block) and EVP_EncryptFinal_ex (even though there is nothing more left to encrypt)?
Or only EVP_EncryptUpdate_ex?
Or only EVP_EncryptFinal_ex?
Here, you have already figured out the steps.
So, it will be
EVP_encryptInit_ex
EVP_EncryptUpdate_ex
EVP_EncryptFinal_ex
EVP_EncryptFinal_ex also take care of the fact that data is not in multiple of block lengths.
In my opinion, if you have only to use AES with no padding (EVP_ interfaces takes care of padding), then go for AES_encrypt.
They are fairly easy to use.
//Step 1: Set encrypt key.
AES_KEY aeskey;
AES_set_encrypt_key(key, bits, &aeskey);
//Step2: Encrypt exactly 128 bits.
AES_encrypt(data, dataout, &aeskey);
AES encryption of 16 bytes without padding
Use the EVP_* interfaces and disable padding on the block.
Use the EVP_* interface because it supports engines and hardware acceleration, like AES-NI. The AES_encrypt functions are software based and do not support alternate implementations. Also, its not readily apparent, but AES_encrypt is not portable - some platforms suffer endianess issues.
You need to call EVP_CIPHER_CTX_set_padding to ensure no padding is added. From the EVP_CIPHER_CTX_set_padding(3) man page:
EVP_CIPHER_CTX_set_padding() enables or disables padding. By default
encryption operations are padded using standard block padding and the
padding is checked and removed when decrypting. If the pad parameter
is zero then no padding is performed, the total amount of data
encrypted or decrypted must then be a multiple of the block size or an
error will occur.
This function should be called after the context is set up for
encryption or decryption with EVP_EncryptInit_ex().
So your steps are:
Call EVP_CIPHER_CTX_new to create a context
Call EVP_EncryptInit_ex with the context
Call EVP_CIPHER_CTX_set_padding on the context
Call EVP_EncryptUpdate_ex to encrypt the data
Call EVP_EncryptFinal_ex to retrieve the cipher text
Also see EVP Symmetric Encryption and Decryption on the OpenSSL wiki.
Related
I feel I have a pretty good understanding of hash functions and the contracts they entail.
SHA1 on Input X will ALWAYS produce the same output. You could use a Python library, a Java library, or pen and paper. It's a function, it is deterministic. My SHA1 does the same as yours and Alice's and Bob's.
As I understand it, AES is also a function. You put in some values, it spits out the ciphertext.
Why, then, could there ever be fears that Truecrypt (for instance) is "broken"? They're not saying AES is broken, they're saying the program that implements it may be. AES is, in theory, solid. So why can't you just run a file through Truecrypt, run it through a "reference AES" function, and verify that the results are the same? I know it absolutely does not work like that, but I don't know why.
What makes AES different from SHA1 in this way? Why might Truecrypt AES spit out a different file than Schneier-Ifier* AES, when they were both given all the same inputs?
In the end, my question boils down to:
My_SHA1(X) == Bobs_SHA1(X) == ...etc
But TrueCrypt_AES(X) != HyperCrypt_AES(X) != VeraCrypt_AES(X) etc. Why is that? Do all those programs wrap AES, but have different ways of determining stuff like an initialization vector or something?
*this would be the name of my file encryption program if I ever wrote one
In the SHA-1 example you give, there is only a single input to the function, and any correct SHA-1 implementation should produce the same output as any other when provided the same input data.
For AES however things are a bit tricker, and since you don't specify what you mean exactly by "AES", this itself seems likely to be the source of the perceived differences between implementations.
Firstly, "AES" isn't a single algorithm, but a family of algorithms that take different key sizes (128, 192 or 256 bits). AES is also a block cipher, it takes a single block of 128 bits/16 bytes of plaintext input, and encrypts this using the key to produce a single 16 byte block of output.
Of course in practice we often want to encrypt more than 16 bytes of data at once, so we must find a way to repeatedly apply the AES algorithm in order to encrypt all the data. Naively we could split it into 16 byte chunks and encrypt each one in turn, but this mode (described as Electronic Codebook or ECB) turns out to be horribly insecure. Instead, various other more secure modes are usually used, and most of these require an Initialization Vector (IV) which helps to ensure that encrypting the same data with the same key doesn't result in the same ciphertext (which would otherwise leak information).
Most of these modes still operate on fixed-sized blocks of data, but again we often want to encrypt data that isn't a multiple of the block size, so we have to use some form of padding, and again there are various different possibilities for how we pad a message to a length that is a multiple of the block size.
So to put all of this together, two different implementations of "AES" should produce the same output if all of the following are identical:
Plaintext input data
Key (and hence key size)
IV
Mode (including any mode-specific inputs)
Padding
Iridium covered many of the causes for a different output between TrueCrypt and other programs using nominally the same (AES) algorithm. If you are just checking actual initialization vectors, these tend to be done using ECB. It is the only good time to use ECB -- to make sure the algorithm itself is implemented correctly. This is because ECB, while insecure, does work without an IV and therefore makes it easier to check "apples to apples" though other stumbling blocks remain as Iridium pointed out.
With a test vector, the key is specified along with the plain text. And test vectors are specified as exact multiples of the block size. Or more specifically, they tend to be exactly 1 block in size for the plain text. This is done to remove padding and mode from the list of possible differences. So if you use standard test vectors between two AES encryption programs, you eliminate the issue with the plain text data differences, key differences, IV, mode, and padding.
But note you can still have differences. AES is just as deterministic as hashing, so you can get the same result every time with AES just as you can with hashing. It's just that there are more variables to control to get the same output result. One item Iridium did not mention but which can be an issue is endianness of the input (key and plain text). I ran into exactly this when checking a reference implementation of Serpent against TrueCrypt. They gave the same output to the text vectors only if I reversed the key and plain text between them.
To elaborate on that, if you have plain text that is all 16 bytes as 0s, and your key is 31 bytes of 0s and one byte of '33' (in the 256 bit version), if the '33' byte was on the left end of the byte string for the reference implementation, you had to feed TrueCrypt 31 '00' bytes and then the '33' byte on the right-hand side to get the same output. So as I mentioned, an endianness issue.
As for TrueCrypt maybe not being secure even if AES still is, that is absolutely true. I don't know the specifics on TrueCrypt's alleged weaknesses, but let me present a couple ways a program can have AES down right and still be insecure.
One way would be if, after the user keys in their password, the program stores it for the session in an insecure manner. If it is not encrypted in memory or if it encrypts your key using its own internal key but fails to protect that key well enough, you can have Windows write it out on the hard drive plain for all to read if it swaps memory to the hard drive. Or as such swaps are less common than they used to be, unless the TrueCrypt authors protect your key during a session, it is also possible for a malicious program to come and "debug" the key right out of the TrueCrypt software. All without AES being broken at all.
Another way it could be broken (theoretically) would be in a way that makes timing attacks possible. As a simple example, imagine a very basic crypto that takes your 32 bit key and splits it into 2 each chunks of 16 bytes. It then looks at the first chunk by byte. It bit-rotates the plain text right a number of bits corresponding to the value of byte 0 of your key. Then it XORs the plain text with the right-hand 16 bytes of your key. Then it bit-rotates again per byte 1 of your key. And so on, 16 shifts and 16 XORs. Well, if a "bad guy" were able to monitor your CPU's power consumption, they could use side channel attacks to time the CPU and / or measure its power consumption on a per-bit-of-the-key basis. The fact is it would take longer (usually, depending on the code that handles the bit-rotate) to bit-rotate 120 bits than it takes to bit-rotate 121 bits. That difference is tiny, but it is there and it has been proven to leak key information. The XOR steps would probably not leak key info, but half of your key would be known to an attacker with ease based on the above attack, even on an implementation of an unbroken algorithm, if the implementation itself is not done right -- a very difficult thing to do.
So I do not know if TrueCrypt is broken in one of these ways or in some other way altogether. But crypto is a lot harder than it looks. If the people on the inside say it is broken, it is very easy for me to believe them.
If I use different encryption methods but provide no indication in the ciphertext output of which method I use (for example, attaching an unencrypted header to the ciphertext) does that make the ciphertext harder to decrypt than just the difficulty implied by, for example, the keylength? The lack of information as to what encryption protocol and parameters to use should add difficulty by requiring a potential decrypter to try some or all the various encryption methods and parameters.
Well, in general you should not rely on information in the algorithm / protocol itself. Such information is generic for any key you use, so you should consider it public knowledge. OK, so that's that out of the way.
Now say you use 16 methods and you somehow have created a protocol that keeps the used encryption method confidential (let's say by encrypting a single block half filled with random and a magic, decrypting blocks at the receiver until you find the correct one). Now if you would want to brute force the key used you would need 16 more tries. In other words, you just have increased the key length with 4 bits, as 2 ^ 4 = 16. So say you would have AES-256 equivalent ciphers. You would now have equivalent encryption of 256 + 4 = 260 bits. That hardly registers, especially since AES-256 is already considered safe against attacks using a quantum computer.
Now those 4 bits comes at a very high price. A highly complex protocol using multiple ciphers. Each of these ciphers have their weaknesses. None of them will have received as much scrutiny as AES, and if one breaks you are in trouble (at least for 1 out of 16 encrypted messages). Speeds will differ, parameters and block sizes will differ, platforms may not support them all...
All in all, just use AES-256 if you are not willing to accept AES-128. If you must, encrypt things twice using AES and SERPENT. Adding an authentication tag over IV & ciphertext probably makes much more of a difference though. See this answer by Thomas over at the security site.
Try GCM or EAX mode of operation. Much more useful.
I'm currently working with mcrypt.java To encrypt and decrypt data from server side and cryptojs on client side but I have some problems because when I encrypt any string, both java and JavaScript display different results.
Well, I was reading about methods and padding schemes of AES encryption and some blogs talking about is incorrect to use CBC mode with NoPadding and is better/correct use CBC with Pkcs7 or another padding.
Anyone can explain me something related with that?
Padding your plaintext is required if you perform AES encryption in ECB/CBC block cipher mode, unless your plaintext is a multiple of the blocksize. You could of course make sure that your plaintext is always precisely N blocks, but in effect you would be creating your own padding mode.
Many libraries (e.g. mcrypt in PHP) don't specify any padding while they secretly do pad. They just fill up the last block with 00 valued bytes. The effect of this is that you can encrypt ASCII compatible text, which will then be null terminated. In most languages (that do not use null termination) it is also possible to use a trim method to remove this padding. This is however not an official padding mode. Of course this scheme only works if your plain text does not end with control characters. So it is not suitable for any binary plaintext.
It is definitely better to use PKCS#7 padding. Removing PKCS#7 padding is deterministic for any plaintext. This means you can encrypt any value, including UTF-16 encoded text and any binary value. If PKCS#7 padding is not available it is relatively easy to implement it yourself - this is certainly worth the effort. The only disadvantage of PKCS#7 padding for CBC mode is that it may require an additional block of padding when the plaintext is already N times the block size. The reason for this is that the plaintext may otherwise be misinterpreted as being padding.
Note that padding and padding errors are not suitable to detect if the ciphertext was changed in transit. Padding Oracles are very easy to implement and may reveal your plaintext in 128 times the size of your plaintext in bytes (!!!). So use an authenticated mode of operation or a MAC (HMAC or CMAC) if you want to provide integrity and authenticity to your plaintext.
If you really cannot miss the bytes used for padding, please look at CTR or a similar stream mode of operation for your block cipher.
EDIT
there is also ciphertext stealing or CTS that can be used for CBC mode. It is not used much and as there are three different versions of it, you should make sure which one is used.
Nowadays it is more common to use counter mode (CTR mode) or an authenticated mode which is based on it (if a block cipher is used at all). CTR mode doesn't require any padding as it is a streaming mode of operation.
What is the minimum number of bits needed to represent a single character of encrypted text.
eg, if I wanted to encrypt the letter 'a', how many bits would I require. (assume there are many singly encrypted characters using the same key.)
Am I right in thinking that it would be the size of the key. eg 256 bits?
Though the question is somewhat fuzzy, first of all it would depend on whether you use a stream cipher or a block cipher.
For the stream cipher, you would get the same number of bits out that you put in - so the binary logarithm of your input alphabet size would make sense. The block cipher requires input blocks of a fixed size, so you might pad your 'a' with zeroes and encrypt that, effectively having the block size as a minimum, like you already proposed.
I'm afraid all the answers you've had so far are quite wrong! It seems I can't reply to them, but do ask if you need more information on why they are wrong. Here is the correct answer:
About 80 bits.
You need a few bits for the "nonce" (sometimes called the IV). When you encrypt, you combine key, plaintext and nonce to produce the ciphertext, and you must never use the same nonce twice. So how big the nonce needs to be depends on how often you plan on using the same key; if you won't be using the key more than 256 times, you can use an 8 bit nonce. Note that it's only the encrypting side that needs to ensure it doesn't use a nonce twice; the decrypting side only needs to care if it cares about preventing replay attacks.
You need 8 bits for the payload, since that's how many bits of plaintext you have.
Finally, you need about 64 bits for the authentication tag. At this length, an attacker has to try on average 2^63 bogus messages minimum before they get one accepted by the remote end. Do not think that you can do without the authentication tag; this is essential for the security of the whole mode.
Put these together using AES in a chaining mode such as EAX or GCM, and you get 80 bits of ciphertext.
The key size isn't a consideration.
You can have the same number of bits as the plaintext if you use a one-time pad.
This is hard to answer. You should definitely first read up on some fundamentals. You can 'encrypt' an 'a' with a single bit (Huffman encoding-style), and of course you could use more bits too. A number like 256 bits without any context is meaningless.
Here's something to get you started:
Information Theory -- esp. check out Shannon's seminal paper
One Time Pad -- infamous secure, but impractical, encryption scheme
Huffman encoding -- not encryption, but demonstrates the above point
I checked out TripleDES. It's block size is of 64 bits.
Is there any algorithm for 8 bits block size?
Thanks
EDIT : I intend not to use this for perfect protection, but for a just-in-case situation where one who sees the code should not find the plaintext. So 8 bit is kinda okay for me.
A block cipher with 8-bit blocks means that each input block can be encrypted into 256 possible values -- which means that an attacker has a 1/256 chance of guessing the input value. It turns out to be very difficult to use such an algorithm securely. Nevertheless it is possible to define a block cipher over 8-bit blocks, and to do it "perfectly"; just do not expect it to be generally useful.
There also are "block-less" ciphers, known as "stream ciphers" which encrypt data "byte by byte" (or even "bit by bit"); most are just pseudo-random generators which produce an arbitrary amount of bytes from a key. That generated stream is just to be combined with the data to encrypt with a XOR. The traditional stream cipher is RC4; but newer and better stream ciphers have been designed.
A block cipher, by itself, is a mathematical tool. In order to actually encrypt data, the block cipher must be used properly. The keywords are chaining and padding. Chaining is about defining what actually goes into the block cipher and what to do with the output. Padding is about adding some bytes to the data, in a reversible way, so that the padded message length is appropriate for the chosen chaining mode. The traditional chaining mode is called CBC. A newer (and arguably better) chaining mode is CTR (same link), which has the added bonus of avoiding the need for padding (CTR just turns a block cipher into a stream cipher).
As for block ciphers, you should use AES instead of TripleDES. It is faster, more secure, and the current American standard.
RSA with 8-bit key :)
Seriously though, the block-based cyphers are stateless - the ciphertext of a block depends only on the cleartext of the block, not on the previous blocks (otherwise it would be a stream cypher). A block cypher that acts on 8-bit blocks can be brute-forced easily, so there's no point.