I'm using Googles Tink DeterministicAead. What I'd like to achieve, is to use the same key over a long time, but change the ciphertext for a certain plaintext every 24 hours (for GDPR reasons).
Can I pick a new initialization vector somehow once every 24 hours? The code I'm using is:
KeysetHandle keysetHandle = KeysetHandle.generateNew(
DeterministicAeadKeyTemplates.AES256_SIV);
DeterministicAead daead = keysetHandle.getPrimitive(DeterministicAead.class);
byte[] ciphertext = daead.encryptDeterministically(clearText.getBytes(), "".getBytes());
ciphertext is indeed deterministic, so at least that part is good :-)
I want to be able to decrypt the cyphertext in BigQuery, so custom primitives worry me a bit.
Note: this is not about financial data or such, it is about being able to update recommendation models on the behavior of the ciphertext while at the same time complying with privacy requirements for the user.
By definition (from what I'm reading in those docs) there doesn't seem to be a way to specify an IV (as then it wouldn't be deterministic).
The DeterministicAead primitive seems to be AES with a MAC, without IV.
So, I think the only option here would be to rotate the key, or, add your own prepended/appended salt to the plaintext prior to encryption and store it for decryption.
Remember deterministic schemes aren't semantically secure.
Related
I have a tcl/tk based tool, which uses network password for authentication. Issue is that, it is saving password in the logs/history. So objective is to encrypt the password.
I tried to use aes package. But at the very beginning aes::init asks for keydata and initialization vector (16 byte). So how to generate IV and keydata. Is is some Random number? I am a novice in encryption algorithms.
If you have the password in the logs/history, why not fix the bug of logging/storing it in the first place?
Otherwise there are distinct things you might want:
A password hashing scheme like PBKDF2, bcrypt, argon2 etc. to store a password in a safe way and compare some user input to it. This is typically the case when you need to implement some kind of authentication with passwords on the server side.
A password encryption and protection scheme like AES. You need a password to authenticate to some service automatically, and it requires some form of cleartext password.
You have some secret data and need to securly store it to in non cleartext form.
If you have case 1, don't use the aespackage, it is the wrong tool for the job. If you have case 2, the aes package might help you, but you just exchanged the problem of keeping the password secret with the other problem of keeping the key secret (not a huge win). So the only viable case where aes is an option might be 3.
Lets assume you need to store some secret data in a reversible way, e.g. case 3 from above.
AES has a few possible modes of operation, common ones you might see are ECB, CBC, OFB, GCM, CTR. The Tcllib package just supports ECB and CBC, and only CBC (which is the default) is really an option to use.
Visit Wikipedia for an example why you should never use ECB mode.
Now back to your actual question:
Initialization Vector (IV)
This is a random value you pick for each encryption, it is not secret, you can just publish it together with the encrypted data. Picking a random IV helps to make two encrypted blocks differ, even if you use the same key and cleartext.
Secret Key
This is also a random value, but you must keep it secret, as it can be used for encryption and decryption. You often have the same key for multiple encryptions.
Where to get good randomness?
If you are on Linux, BSD or other unixoid systems just read bytes from /dev/urandom or use a wrapper for getrandom(). Do NOT use Tcls expr {rand()} or similar pseudorandom number generators (PRNG). On Windows TWAPI and the CryptGenRandom function would be the best idea, but sadly there is no Tcl high level wrapper included.
Is that enough?
Depends. If you just want to hide a bit of plaintext from cursory looks, maybe. If you have attackers manipulating your data or actively trying to hack your system, less so. Plain AES-CBC has a lot of things you can do wrong, and even experts did wrong (read about SSL/TLS 1.0 problems with AES-CBC).
Final words: If you are a novice in encryption algorithms, be sure you understand what you want and need to protect, there are a lot of pitfalls.
If I read the Tcler's Wiki page on aes, I see that I encrypt by doing this:
package require aes
set plaintext "Some super-secret bytes!"
set key "abcd1234dcba4321"; # 16 bytes
set encrypted [aes::aes -dir encrypt -key $key $plaintext]
and I decrypt by doing:
# Assuming the code above was run...
set decrypted [aes::aes -dir decrypt -key $key $encrypted]
Note that the decrypted text has NUL (zero) bytes added on the end (8 of them in this example) because the encryption algorithm always works on blocks of 16 bytes, and if you're working with non-ASCII text then encoding convertto and encoding convertfrom might be necessary.
You don't need to use aes::init directly unless you are doing large-scale streaming encryption. Your use case doesn't sound like it needs that sort of thing. (The key data is your “secret”, and the initialisation vector is something standardised that usually you don't need to set.)
I need to encrypt two-way (symmetric) distinct tokens. These tokens are expected to be repeated (e.g. They are people first names), but I do not want an attacker to conclude which encrypted tokens came from the same original tokens. Salt is the way to go for one-way cryptography (hashing).
Is there a method that can work in symmetric cryptography, a workaround or an alternative?
Yes. Properly used, symmetric encryption does not reveal anything about the plaintext, not even the fact that multiple plaintexts are the same.
Proper usage means choosing a mode of operation that uses an initialization vector (IV) or nonce (that is, not ECB), and choosing the IV appropriately (usually random bytes). Encrypting multiple plaintexts with the same key and IV allows this attack pretty much just like with ECB mode, and using a static IV is a common mistake.
As mentioned above, properly utilizing a symmetric encryption scheme would NOT reveal information about the plaintext. You mention the need to protect the users against a dictionary attack on the hidden tokens, and a properly utilized encryption scheme such as GCM would provide you with this property.
I recommend utilizing GCM mode as it is an efficient authenticated encryption scheme. Performing cryptographic functions on unauthenticated data may lead to security flaws so utilizing an authenticated encryption scheme such as GCM is your best bet. Note that this encryption scheme along with other CPA-SECURE schemes will provide you security against an adversary that wishes to learn the value of an encrypted token.
For example, in correctly implemented GCM mode, the encryption of the same last name will result in a different ciphertext i.e GCM Mode is Non-Deterministic.
Make sure to utilize a secure padding scheme and fix a length for the ciphertexts to make sure an attacker can't use the lenght of the ciphertext to learn some information about the contents of what generated this token.
Be careful however, you can't interchangeably use hash functions and symmetric encryption schemes as they are created for very different purposes. Be careful with how you share the key, and remember that once an adversary has knowledge of the key, there is nothing random about the ciphertext.
-NOTE-
Using encryption incorrectly : If every user is utilizing the same key to encrypt their token then they can simply decrypt everyone else's token and see the name that generated it.
To be safe, every user must encrypt with a different key so now you have to somehow store and manage the key for each user. This may be very painful and you have to be very careful with this.
However if you are utilizing salts and hash functions, then even if every user is utilizing the same salt to compute hash(name||salt), a malicious user would have to brute force all possible names with the salt to figure out what generated these tokens.
So keep this into consideration and be careful as hash functions and symmetric encryptions schemes can't be used interchangeably.
Assuming that the only items to be ciphered are the tokens (that is, they are not embedded in a larger data structure), then Inicialization Vectors (IV's) are the way to go.
They are quite simple to understand: let M be your token, padded to fit the block size used in the symmetric ciphering algorithm (I'm assuming it's AES) and IV be a random array of bits also the size of the ciphering block.
Then compute C = AES_ENCRYPT(M xor IV, K) where C is the ciphered data and K the symmetric key. That way, the same message M will not be ciphered the same way multiple times since IV is randomly obtained every time.
To decrypt M, just compute M = (AES_DECRYPT(C, K) xor IV).
Of course, both IV and K must be known at decryption time. The most usual way to transmit the IV is to just send it along the ciphered text. This does not compromise security, it's pretty much like storing a salt value, since the encryption key will remain unknown for everybody else.
Right now, this is what I am doing:
1. SHA-1 a password like "pass123", use the first 32 characters of the hexadecimal decoding for the key
2. Encrypt with AES-256 with just whatever the default parameters are
^Is that secure enough?
I need my application to encrypt data with a password, and securely. There are too many different things that come up when I google this and some things that I don't understand about it too. I am asking this as a general question, not any specific coding language (though I'm planning on using this with Java and with iOS).
So now that I am trying to do this more properly, please follow what I have in mind:
Input is a password such as "pass123" and the data is
what I want to encrypt such as "The bank account is 038414838 and the pin is 5931"
Use PBKDF2 to derive a key from the password. Parameters:
1000 iterations
length of 256bits
Salt - this one confuses me because I am not sure where to get the salt from, do I just make one up? As in, all my encryptions would always use the salt "F" for example (since apparently salts are 8bits which is just one character)
Now I take this key, and do I hash it?? Should I use something like SHA-256? Is that secure? And what is HMAC? Should I use that?
Note: Do I need to perform both steps 2 and 3 or is just one or the other okay?
Okay now I have the 256-bit key to do the encryption with. So I perform the encryption using AES, but here's yet another confusing part (the parameters).
I'm not really sure what are the different "modes" to use, apparently there's like CBC and EBC and a bunch of others
I also am not sure about the "Initialization Vector," do I just make one up and always use that one?
And then what about other options, what is PKCS7Padding?
For your initial points:
Using hexadecimals clearly splits the key size in half. Basically, you are using AES-128 security wise. Not that that is bad, but you might also go for AES-128 and use 16 bytes.
SHA-1 is relatively safe for key derivation, but it shouldn't be used directly because of the existence/creation of rainbow tables. For this you need a function like PBKDF2 which uses an iteration count and salt.
As for the solution:
You should not encrypt PIN's if that can be avoided. Please make sure your passwords are safe enough, allow pass phrases.
Create a random number per password and save the salt (16 bytes) with the output of PBKDF2. The salt does not have to be secret, although you might want to include a system secret to add some extra security. The salt and password are hashed, so they may have any length to be compatible with PBKDF2.
No, you just save the secret generated by the PBKDF2, let the PBKDF2 generate more data when required.
Never use ECB (not EBC). Use CBC as minimum. Note that CBC encryption does not provide integrity checking (somebody might change the cipher text and you might never know it) or authenticity. For that, you might want to add an additional MAC, HMAC or use an encryption mode such as GCM. PKCS7Padding (identical to PKCS5Padding in most occurences) is a simple method of adding bogus data to get N * [blocksize] bytes, required by block wise encryption.
Don't forget to prepend a (random) IV to your cipher text in case you reuse your encryption keys. An IV is similar to a salt, but should be exactly [blocksize] bytes (16 for AES).
What is the difference between Obfuscation, Hashing, and Encryption?
Here is my understanding:
Hashing is a one-way algorithm; cannot be reversed
Obfuscation is similar to encryption but doesn't require any "secret" to understand (ROT13 is one example)
Encryption is reversible but a "secret" is required to do so
Hashing is a technique of creating semi-unique keys based on larger pieces of data. In a given hash you will eventually have "collisions" (e.g. two different pieces of data calculating to the same hash value) and when you do, you typically create a larger hash key size.
obfuscation generally involves trying to remove helpful clues (i.e. meaningful variable/function names), removing whitespace to make things hard to read, and generally doing things in convoluted ways to make following what's going on difficult. It provides no serious level of security like "true" encryption would.
Encryption can follow several models, one of which is the "secret" method, called private key encryption where both parties have a secret key. Public key encryption uses a shared one-way key to encrypt and a private recipient key to decrypt. With public key, only the recipient needs to have the secret.
That's a high level explanation. I'll try to refine them:
Hashing - in a perfect world, it's a random oracle. For the same input X, you always recieve the same output Y, that is in NO WAY related to X. This is mathematically impossible (or at least unproven to be possible). The closest we get is trapdoor functions. H(X) = Y for with H-1(Y) = X is so difficult to do you're better off trying to brute force a Z such that H(Z) = Y
Obfuscation (my opinion) - Any function f, such that f(a) = b where you rely on f being secret. F may be a hash function, but the "obfuscation" part implies security through obscurity. If you never saw ROT13 before, it'd be obfuscation
Encryption - Ek(X) = Y, Dl(Y) = X where E is known to everyone. k and l are keys, they may be the same (in symmetric, they are the same). Y is the ciphertext, X is the plaintext.
A hash is a one way algorithm used to compare an input with a reference without compromising the reference.
It is commonly used in logins to compare passwords and you can also find it on your reciepe if you shop using credit-card. There you will find your credit-card-number with some numbers hidden, this way you can prove with high propability that your card was used to buy the stuff while someone searching through your garbage won't be able to find the number of your card.
A very naive and simple hash is "The first 3 letters of a string".
That means the hash of "abcdefg" will be "abc". This function can obviously not be reversed which is the entire purpose of a hash. However, note that "abcxyz" will have exactly the same hash, this is called a collision. So again: a hash only proves with a certain propability that the two compared values are the same.
Another very naive and simple hash is the 5-modulus of a number, here you will see that 6,11,16 etc.. will all have the same hash: 1.
Modern hash-algorithms are designed to keep the number of collisions as low as possible but they can never be completly avoided. A rule of thumb is: the longer your hash is, the less collisions it has.
Obfuscation in cryptography is encoding the input data before it is hashed or encrypted.
This makes brute force attacks less feasible, as it gets harder to determine the correct cleartext.
That's not a bad high-level description. Here are some additional considerations:
Hashing typically reduces a large amount of data to a much smaller size. This is useful for verifying the contents of a file without having to have two copies to compare, for example.
Encryption involves storing some secret data, and the security of the secret data depends on keeping a separate "key" safe from the bad guys.
Obfuscation is hiding some information without a separate key (or with a fixed key). In this case, keeping the method a secret is how you keep the data safe.
From this, you can see how a hash algorithm might be useful for digital signatures and content validation, how encryption is used to secure your files and network connections, and why obfuscation is used for Digital Rights Management.
This is how I've always looked at it.
Hashing is deriving a value from
another, using a set algorithm. Depending on the algo used, this may be one way, may not be.
Obfuscating is making something
harder to read by symbol
replacement.
Encryption is like hashing, except the value is dependent on another value you provide the algorithm.
A brief answer:
Hashing - creating a check field on some data (to detect when data is modified). This is a one way function and the original data cannot be derived from the hash. Typical standards for this are SHA-1, SHA256 etc.
Obfuscation - modify your data/code to confuse anyone else (no real protection). This may or may not loose some of the original data. There are no real standards for this.
Encryption - using a key to transform data so that only those with the correct key can understand it. The encrypted data can be decrypted to obtain the original data. Typical standards are DES, TDES, AES, RSA etc.
All fine, except obfuscation is not really similar to encryption - sometimes it doesn't even involve ciphers as simple as ROT13.
Hashing is one-way task of creating one value from another. The algorithm should try to create a value that is as short and as unique as possible.
obfuscation is making something unreadable without changing semantics. It involves value transformation, removing whitespace, etc. Some forms of obfuscation can also be one-way,so it's impossible to get the starting value
encryption is two-way, and there's always some decryption working the other way around.
So, yes, you are mostly correct.
Obfuscation is hiding or making something harder to understand.
Hashing takes an input, runs it through a function, and generates an output that can be a reference to the input. It is not necessarily unique, a function can generate the same output for different inputs.
Encryption transforms the input into an output in a unique manner. There is a one-to-one correlation so there is no potential loss of data or confusion - the output can always be transformed back to the input with no ambiguity.
Obfuscation is merely making something harder to understand by intruducing techniques to confuse someone. Code obfuscators usually do this by renaming things to remove anything meaningful from variable or method names. It's not similar to encryption in that nothing has to be decrypted to be used.
Typically, the difference between hashing and encryption is that hashing generally just employs a formula to translate the data into another form where encryption uses a formula requiring key(s) to encrypt/decrypt. Examples would be base 64 encoding being a hash algorithm where md5 being an encryption algorithm. Anyone can unhash base64 encoded data, but you can't unencrypt md5 encrypted data without a key.
Is it recommended that I use an initialization vector to encrypt/decrypt my data? Will it make things more secure? Is it one of those things that need to be evaluated on a case by case basis?
To put this into actual context, the Win32 Cryptography function, CryptSetKeyParam allows for the setting of an initialization vector on a key prior to encrypting/decrypting. Other API's also allow for this.
What is generally recommended and why?
An IV is essential when the same key might ever be used to encrypt more than one message.
The reason is because, under most encryption modes, two messages encrypted with the same key can be analyzed together. In a simple stream cipher, for instance, XORing two ciphertexts encrypted with the same key results in the XOR of the two messages, from which the plaintext can be easily extracted using traditional cryptanalysis techniques.
A weak IV is part of what made WEP breakable.
An IV basically mixes some unique, non-secret data into the key to prevent the same key ever being used twice.
In most cases you should use IV. Since IV is generated randomly each time, if you encrypt same data twice, encrypted messages are going to be different and it will be impossible for the observer to say if this two messages are the same.
Take a good look at a picture (see below) of CBC mode. You'll quickly realize that an attacker knowing the IV is like the attacker knowing a previous block of ciphertext (and yes they already know plenty of that).
Here's what I say: most of the "problems" with IV=0 are general problems with block encryption modes when you don't ensure data integrity. You really must ensure integrity.
Here's what I do: use a strong checksum (cryptographic hash or HMAC) and prepend it to your plaintext before encrypting. There's your known first block of ciphertext: it's the IV of the same thing without the checksum, and you need the checksum for a million other reasons.
Finally: any analogy between CBC and stream ciphers is not terribly insightful IMHO.
Just look at the picture of CBC mode, I think you'll be pleasantly surprised.
Here's a picture:
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
link text
If the same key is used multiple times for multiple different secrets patterns could emerge in the encrypted results. The IV, that should be pseudo random and used only once with each key, is there to obfuscate the result. You should never use the same IV with the same key twice, that would defeat the purpose of it.
To not have to bother keeping track of the IV the simplest thing is to prepend, or append it, to the resulting encrypted secret. That way you don't have to think much about it. You will then always know that the first or last N bits is the IV.
When decrypting the secret you just split out the IV, and then use it together with the key to decrypt the secret.
I found the writeup of HTTP Digest Auth (RFC 2617) very helpful in understanding the use and need for IVs / nonces.
Is it one of those things that need to be evaluated on a case by case
basis?
Yes, it is. Always read up on the cipher you are using and how it expects its inputs to look. Some ciphers don't use IVs but do require salts to be secure. IVs can be of different lengths. The mode of the cipher can change what the IV is used for (if it is used at all) and, as a result, what properties it needs to be secure (random, unique, incremental?).
It is generally recommended because most people are used to using AES-256 or similar block ciphers in a mode called 'Cipher Block Chaining'. That's a good, sensible default go-to for a lot of engineering uses and it needs you to have an appropriate (non-repeating) IV. In that instance, it's not optional.
The IV allows for plaintext to be encrypted such that the encrypted text is harder to decrypt for an attacker. Each bit of IV you use will double the possibilities of encrypted text from a given plain text.
For example, let's encrypt 'hello world' using an IV one character long. The IV is randomly selected to be 'x'. The text that is then encrypted is then 'xhello world', which yeilds, say, 'asdfghjkl'. If we encrypt it again, first generate a new IV--say we get 'b' this time--and encrypt like normal (thus encrypting 'bhello world'). This time we get 'qwertyuio'.
The point is that the attacker doesn't know what the IV is and therefore must compute every possible IV for a given plain text to find the matching cipher text. In this way, the IV acts like a password salt. Most commonly, an IV is used with a chaining cipher (either a stream or block cipher). In a chaining block cipher, the result of each block of plain text is fed to the cipher algorithm to find the cipher text for the next block. In this way, each block is chained together.
So, if you have a random IV used to encrypt the plain text, how do you decrypt it? Simple. Pass the IV (in plain text) along with your encrypted text. Using our fist example above, the final cipher text would be 'xasdfghjkl' (IV + cipher text).
Yes you should use an IV, but be sure to choose it properly. Use a good random number source to make it. Don't ever use the same IV twice. And never use a constant IV.
The Wikipedia article on initialization vectors provides a general overview.