long keys and data are same size - encryption

I'm trying to make a symmetric encryption algorithm. My key is 256 bits, and the block size and resulting ciphertext are also 256 bit. Is there a disadvantage because the key, plaintext and ciphertext have the same size?

Do not use home-brewed cryptographic algorithms for anything worth protecting. This is a complex area, where techniques that you'd never dream of are routine for crackers. There are plenty of time-tested and expert-vetted algorithms around, use one of those (after looking it up for kwnown weaknesses, and possible recommendations).

Most (if not all) block ciphers assume that the message size is a multiple of the block size, much as yours. There's no intrinsic disadvantage to that, AFAIK, and it makes it much easier to process the data. If you don't want to process the data in blocks, you need a stream cipher.
As #vonbrand has mentioned, you should never use such a custom cipher to encrypt any kind of sensitive data, as it will be trivially broken. If all you want is to have a working block cipher, you're looking forAES, which is uncrackable, as far the top universities know.

Related

Why do different implementations of AES produce different output?

I feel I have a pretty good understanding of hash functions and the contracts they entail.
SHA1 on Input X will ALWAYS produce the same output. You could use a Python library, a Java library, or pen and paper. It's a function, it is deterministic. My SHA1 does the same as yours and Alice's and Bob's.
As I understand it, AES is also a function. You put in some values, it spits out the ciphertext.
Why, then, could there ever be fears that Truecrypt (for instance) is "broken"? They're not saying AES is broken, they're saying the program that implements it may be. AES is, in theory, solid. So why can't you just run a file through Truecrypt, run it through a "reference AES" function, and verify that the results are the same? I know it absolutely does not work like that, but I don't know why.
What makes AES different from SHA1 in this way? Why might Truecrypt AES spit out a different file than Schneier-Ifier* AES, when they were both given all the same inputs?
In the end, my question boils down to:
My_SHA1(X) == Bobs_SHA1(X) == ...etc
But TrueCrypt_AES(X) != HyperCrypt_AES(X) != VeraCrypt_AES(X) etc. Why is that? Do all those programs wrap AES, but have different ways of determining stuff like an initialization vector or something?
*this would be the name of my file encryption program if I ever wrote one
In the SHA-1 example you give, there is only a single input to the function, and any correct SHA-1 implementation should produce the same output as any other when provided the same input data.
For AES however things are a bit tricker, and since you don't specify what you mean exactly by "AES", this itself seems likely to be the source of the perceived differences between implementations.
Firstly, "AES" isn't a single algorithm, but a family of algorithms that take different key sizes (128, 192 or 256 bits). AES is also a block cipher, it takes a single block of 128 bits/16 bytes of plaintext input, and encrypts this using the key to produce a single 16 byte block of output.
Of course in practice we often want to encrypt more than 16 bytes of data at once, so we must find a way to repeatedly apply the AES algorithm in order to encrypt all the data. Naively we could split it into 16 byte chunks and encrypt each one in turn, but this mode (described as Electronic Codebook or ECB) turns out to be horribly insecure. Instead, various other more secure modes are usually used, and most of these require an Initialization Vector (IV) which helps to ensure that encrypting the same data with the same key doesn't result in the same ciphertext (which would otherwise leak information).
Most of these modes still operate on fixed-sized blocks of data, but again we often want to encrypt data that isn't a multiple of the block size, so we have to use some form of padding, and again there are various different possibilities for how we pad a message to a length that is a multiple of the block size.
So to put all of this together, two different implementations of "AES" should produce the same output if all of the following are identical:
Plaintext input data
Key (and hence key size)
IV
Mode (including any mode-specific inputs)
Padding
Iridium covered many of the causes for a different output between TrueCrypt and other programs using nominally the same (AES) algorithm. If you are just checking actual initialization vectors, these tend to be done using ECB. It is the only good time to use ECB -- to make sure the algorithm itself is implemented correctly. This is because ECB, while insecure, does work without an IV and therefore makes it easier to check "apples to apples" though other stumbling blocks remain as Iridium pointed out.
With a test vector, the key is specified along with the plain text. And test vectors are specified as exact multiples of the block size. Or more specifically, they tend to be exactly 1 block in size for the plain text. This is done to remove padding and mode from the list of possible differences. So if you use standard test vectors between two AES encryption programs, you eliminate the issue with the plain text data differences, key differences, IV, mode, and padding.
But note you can still have differences. AES is just as deterministic as hashing, so you can get the same result every time with AES just as you can with hashing. It's just that there are more variables to control to get the same output result. One item Iridium did not mention but which can be an issue is endianness of the input (key and plain text). I ran into exactly this when checking a reference implementation of Serpent against TrueCrypt. They gave the same output to the text vectors only if I reversed the key and plain text between them.
To elaborate on that, if you have plain text that is all 16 bytes as 0s, and your key is 31 bytes of 0s and one byte of '33' (in the 256 bit version), if the '33' byte was on the left end of the byte string for the reference implementation, you had to feed TrueCrypt 31 '00' bytes and then the '33' byte on the right-hand side to get the same output. So as I mentioned, an endianness issue.
As for TrueCrypt maybe not being secure even if AES still is, that is absolutely true. I don't know the specifics on TrueCrypt's alleged weaknesses, but let me present a couple ways a program can have AES down right and still be insecure.
One way would be if, after the user keys in their password, the program stores it for the session in an insecure manner. If it is not encrypted in memory or if it encrypts your key using its own internal key but fails to protect that key well enough, you can have Windows write it out on the hard drive plain for all to read if it swaps memory to the hard drive. Or as such swaps are less common than they used to be, unless the TrueCrypt authors protect your key during a session, it is also possible for a malicious program to come and "debug" the key right out of the TrueCrypt software. All without AES being broken at all.
Another way it could be broken (theoretically) would be in a way that makes timing attacks possible. As a simple example, imagine a very basic crypto that takes your 32 bit key and splits it into 2 each chunks of 16 bytes. It then looks at the first chunk by byte. It bit-rotates the plain text right a number of bits corresponding to the value of byte 0 of your key. Then it XORs the plain text with the right-hand 16 bytes of your key. Then it bit-rotates again per byte 1 of your key. And so on, 16 shifts and 16 XORs. Well, if a "bad guy" were able to monitor your CPU's power consumption, they could use side channel attacks to time the CPU and / or measure its power consumption on a per-bit-of-the-key basis. The fact is it would take longer (usually, depending on the code that handles the bit-rotate) to bit-rotate 120 bits than it takes to bit-rotate 121 bits. That difference is tiny, but it is there and it has been proven to leak key information. The XOR steps would probably not leak key info, but half of your key would be known to an attacker with ease based on the above attack, even on an implementation of an unbroken algorithm, if the implementation itself is not done right -- a very difficult thing to do.
So I do not know if TrueCrypt is broken in one of these ways or in some other way altogether. But crypto is a lot harder than it looks. If the people on the inside say it is broken, it is very easy for me to believe them.

Cryptography: Mixing CBC and CTR?

I have some offline files that have to be password-protected. My strategy is as follows:
Cipher Algorithm: AES, 128-bit block, 256-bit key (PBKDF2-SHA-256
10000 iterations with a random salt stored plainly elsewhere)
Whole file is divided into pages with page size 1024 bytes
For a complete page, CBC is used
For an incomplete page,
Use CBC with cipher text stealing if it has at least one block
Use CTR if it has less one block
With this setup, we can keep the same file size
IV or nonce will be based on the salt and deterministic. Since this is not for network communication, I reckon we don't need to concern about replay attacks?
Question: Will this kind of mixing lower the security? Would we better off just use CTR throughout the whole file?
You're better off just using CTR for the entire file. Otherwise, you're adding a lot of extra work, in supporting multiple modes (CBC, CTR, and CTS) and determining which mode to use. It's not clear there's any value in doing so, since CTR is perfectly fine for encrypting a large amount of data.
Are you planning on reusing the same IV for each page? You should expand a bit on what you mean by a page, but I'd recommend unique IV's for each page. Are these pages addressable somehow? You might want to look into some of the new disk encryption modes for an idea on generating unique IV's
You also really need to MAC your data. In CTR for example, if someone flips a bit of the ciphertext, it'll flip the bit when you decrypt, and you'll never know it was tampered with. You can use HMAC or if you want to simplify your entire scheme, use AES GCM mode, which combines CTR for encryption and GMAC for integrity
There are a few things you need to know about CTR mode. After you know them all you could happily apply a stream cipher in your situation:
never ever reuse a data key with the same nonce;
above, not even in time;
be aware that CTR mode really shows the size of the encrypted data; always encrypting full blocks can hide this somewhat (in general a 1024 byte block takes as much as a single bit block if the file system boundaries are honored);
CTR mode in itself does not provide authentication (for completion, as this was already discussed);
If you don't keep to the first two rules, an attacker will immediately see the place of the edit and the attacker will be able to retrieve data directly related to the plain text.
On a possitive node:
you can happily use the offset (in, e.g., blocks) in the file to be part of the nonce;
it is very easy to seek in files, buffer ciphertext and create multi-threaded code around CTR.
And in general:
it pays off to use a data specific key specific sets of files, in such a way that if a key is compromised or changed that you don't have to re-encrypt everything;
think very well about how your keys are used, stored, backed up etc. Key management is the hardest part;

Is there a 8 bit block sized Public-Private key encryption algorithm?

I checked out TripleDES. It's block size is of 64 bits.
Is there any algorithm for 8 bits block size?
Thanks
EDIT : I intend not to use this for perfect protection, but for a just-in-case situation where one who sees the code should not find the plaintext. So 8 bit is kinda okay for me.
A block cipher with 8-bit blocks means that each input block can be encrypted into 256 possible values -- which means that an attacker has a 1/256 chance of guessing the input value. It turns out to be very difficult to use such an algorithm securely. Nevertheless it is possible to define a block cipher over 8-bit blocks, and to do it "perfectly"; just do not expect it to be generally useful.
There also are "block-less" ciphers, known as "stream ciphers" which encrypt data "byte by byte" (or even "bit by bit"); most are just pseudo-random generators which produce an arbitrary amount of bytes from a key. That generated stream is just to be combined with the data to encrypt with a XOR. The traditional stream cipher is RC4; but newer and better stream ciphers have been designed.
A block cipher, by itself, is a mathematical tool. In order to actually encrypt data, the block cipher must be used properly. The keywords are chaining and padding. Chaining is about defining what actually goes into the block cipher and what to do with the output. Padding is about adding some bytes to the data, in a reversible way, so that the padded message length is appropriate for the chosen chaining mode. The traditional chaining mode is called CBC. A newer (and arguably better) chaining mode is CTR (same link), which has the added bonus of avoiding the need for padding (CTR just turns a block cipher into a stream cipher).
As for block ciphers, you should use AES instead of TripleDES. It is faster, more secure, and the current American standard.
RSA with 8-bit key :)
Seriously though, the block-based cyphers are stateless - the ciphertext of a block depends only on the cleartext of the block, not on the previous blocks (otherwise it would be a stream cypher). A block cypher that acts on 8-bit blocks can be brute-forced easily, so there's no point.

Would CSPRNG + XOR be a secure encryption method?

Similarily to RC4 (RC4_PRNG+XOR ), would it be secure to use another CSPRNG(Cryptographically secure pseudorandom number generator)[Isaac, BlumBlumShub, etc) instead of RC4's and XOR the data with the resulting keystream?
Essentially this is just using Blum Blum Shub (or whatever PRNG) as a stream cipher. This isn't how they're designed to be used, and they might be weak to attacks that make sense in a stream cipher context but not in a CSPRNG context (eg. related-key attacks).
If this is what you want, you're better off just using a modern stream cipher. For example, DJB's Salsa20 is well-regarded.
Well, it depends.
Most encryption algorithms do significantly more than XOR. But that's because the key is shorter than the plaintext. If the key is as large as the plaintext, and truly random, then it is impossible to crack it (it's called a One Time Pad).
So, you need to explain more.
But I'm going to guess that you're key length is not the same as your input length, and that even if it was, almost certainly the random number service you are using is not truly secure, so I'd advise against your approach (furthermore, it goes without saying (maybe) that the problem with OTP is key-exchange).
Swapping out the CSPRNG in this scheme would probably be just as secure, and have the exact same set of assumptions, weaknesses and practical issues.

AES, Cipher Block Chaining Mode, Static Initialization Vector, and Changing Data

When using AES (or probably most any cipher), it is bad practice to reuse an initialization vector (IV) for a given key. For example, suppose I encrypt a chunk of data with a given IV using cipher block chaining (CBC) mode. For the next chunk of data, the IV should be changed (e.g., the nonce might be incremented or something). I'm wondering, though, (and mostly out of curiosity) how much of a security risk it is if the same IV is used if it can be guaranteed that the first four bytes of the chunks are incrementing. In other words, suppose two chunks of data to be encrypted are:
0x00000000someotherdatafollowsforsomenumberofblocks
0x00000001someotherdatathatmaydifferormaynotfollows
If the same IV is used for both chunks of data, how much information would be leaked?
In this particular case, it's probably OK (but don't do it, anyway). The "effective IV" is your first encrypted block, which is guaranteed to be different for each message (as long as the nonce truly never repeats under the same key), because the block cipher operation is a bijection. It's also not predictable, as long as you change the key at the same time as you change the "IV", since even with fully predictable input the attacker should not be able to predict the output of the block cipher (block cipher behaves as a pseudo-random function).
It is, however, very fragile. Someone who is maintaining this protocol long after you've moved on to greener pastures might well not understand that the security depends heavily on that non-repeating nonce, and could "optimise" it out. Is sending that single extra block each message for a real IV really an overhead you can't afford?
Mark,
what you describe is pretty much what is proposed in Appendix C of NIST SP800-38a.
In particular, there are two ways to generate an IV:
Generate a new IV randomly for
each message.
For each message use a new unique nonce (this may be a counter), encrypt the nonce, and use the result as IV.
The second option looks very similar to what you are proposing.
Well, that depends on the block size of the encryption algorithm. For the usual block size of 64 bytes i dont think that would make any difference. The first bits would be the same for many blocks, before entering the block cipher, but the result would not have any recognisable pattern. For block sizes < 4 bytes (i dont think that happens) it would make a difference, because the first block(s) would always be the same, leaking information about patterns. Just my opinion.
edit:
Found this
"For CBC and CFB, reusing an IV leaks some information about the first block
of plaintext, and about any common prefix shared by the two messages"
Source: lectures of my university :)

Resources