Clarifying Vault key decryption process - encryption

I'm trying to understand Vault workflow w.r.t. keys, e.g.: https://www.vaultproject.io/docs/concepts/seal
As I understand,
unseal (shared) keys are provided on init
they're used to acquire the combined key
combined key is then used to decrypt a root (master) key (which is apparently stored in the sealed vault)
root key is then used to decrypt the data encryption key (or a keyring which contains it?..)
the data encryption key is then used to en/decrypt the data in Vault
I get the unseal keys on init, how can I inspect the other keys? Is it impossible / are those keys just stored somewhere internally in Vault?
Unsealing is the process of obtaining the plaintext root key necessary to read the decryption key to decrypt the data, allowing access to the Vault.
Is the data encryption key / keyring also decrypted during the unseal, or is it... maybe decrypted on each Vault operation (so only the root key is stored somewhere in plaintext after the unseal)?
Is it ok that the root key is stored in plaintext after the unseal? Or is it still protected by passwords/tokens?.. Or if it's just transiently used to decrypt the data encryption key / keyring, then how are those protected? I guess it has smth to do with the lock icon on the image :)
I'm somewhat confused about how it all works.

Vault, like any other software, has the concept of a super user. That concept is called the "root token" (not be confused with a root key, we'll come back to that).
A full lecture on Vault internal architecture is way beyond the scope of a StackOverflow answer. Here is my attempt to clear up the confusion, leaving the details for you to explore.
Two takeaway to begin with:
Vault stores a strong hash of the root token (if it is still valid)
Vault does not store shards at all, anywhere, ever.
Super user access (aka root token)
When Vault is initialized, it will give you two set of data:
The almighty powerful root token
A number of Shamir shards (5 is the default, 1 in DEV mode)
The inital root token is always given out in plain text, never encrypted. It is meant to be used right away to perform the initial configuration.
Best practice is to use the root token only to:
Create a very powerful policy (let's call it almost-root)
Setup at least one authentication method
Bind one account of the authentication method to the almost-root policy
Revoke the root token (with vault token revoke -self)
At this point you can be almost-root, which should be behind some two factor auth, strong audit, and have a limited validity period (30 minutes), etc. Your CI/CD is often able to be almost-root.
Now let's say that for security reason, the almost-root policy does not allow to change the audit devices, adding or removing them.
To change the audit device configuration, you need to get a new root token. To get one, you must generate one from the shards you got at initialization time. It is a security measure. No one can become root on his own. You must collude with other which decreases the payout of a wrongdoing and increases the risk to get caught.
But let's move on and talk about generating a root token.
Shards and getting the root token
Vault will give you the shards in plain text. Shards are points on a Shamir curve.
When you initialize Vault, you should send public keys so that the shards come out encrypted. You can provide public keys on the command line (as file names) or with keybase.io aliases.
You can then safely distribute one shard per "shard holder". Have them decrypt and test their shard right away by generating a root token and revoking it right away. You don't want to find out that a user misplaced his private key or whatever.
You must test the shards on a regular basis. People come and go, computer crash. When you get close to your threshold, you must rekey to get new shards to new "shard holders". If your enterprise has a physical safe, consider generating enough extra shards to generate a root token, decrypt them, store them on a CD and print them out, and put all of that in a sealed enveloppe in the safe.
Generating a root token should be rare, but we use actually use it at the end of every sprint to generate an almost-root token that we give our CI/CD tool. That means that in day to day use, nobody has super powers in Vault. Define "super powers" to fit your operational reality.
Internal encryption
So the root token does not participate in the encryption at all, or else the system would stop when the root token is revoked. And you can't have 3 or more shard holders around for every decryption Vault does (and it can do a lot).
So Vault has an internal encryption key that is encrypted with key material outside of Vault. Suffice to say that it is either:
Encrypted with a key made from the shards
Encrypted with a KMS of HSM
When Vault starts, it cannot read its own storage, whatever storage backend you use. It is behind a cryptographic "barrier". You can give Vault credentials to your KMS so that it can decrypt the Key Encryption Key (KEK) that will give it its internal encryption key. That process is called auto-unseal.
Where do you store the Vault credentials to auto-unseal with a KMS or HSM you might ask? Good question, you have to start somewhere. Maybe you set your cloud policy to allow a given pod running Vault to access your KMS without a password.
If you can't setup auto-unseal, Vault will start in sealed state. You must enter a given number of shards (3 is the default) to allow Vault to generate that KEK and unseal itself. It will run like that until restarted or manually sealed.

Related

Asymmetric Encryption (Public-Key encryption) I need clarification

I have searched for HOURS on how this works and I just can't get how this can be. The only given definitions are that public keyed encrypted message can only be decrypted by private key. To me, that's just nonsense and I will explain.
A website needs to be downloaded by your browser which also means that Javascript scripts and all the other stuff are accessible to anyone that catches your website if he wishes too. This also means that now, this person knows how you calculate your stuff with your public key making it possible WITHOUT the private key to decrypt it.
I'm just trying to figure out how this works and to me it does not make sens that you CANNOT decrypt an ecrypted text from a public key when you have access to all the calculations made from the side it encrypted.
I mean, when you send a password for example, first, on YOUR end, the browser's end, it encrypts the data to be recieved by the server. By encrypting the data from the browser's end, anyone that took a look on your source code can know how you encrypted it which now can be used to decrypt it. I am creating a new encryption system for our website where the server randomly creates a session key that can only be used by the user with the corresponding session. So only the 2 computers can talk to each other with the same key so if you use the same key on another computer, it just won't work as each key is stored for each session which the key dies after a set amount of time. With what I read, this seams to be called a symetric key system. I want to try and program my own assymetric key system but in all cases when I read, I can only figure out that no matter what happens as an encryption on the client's side, if a malicious person intercepts just before sending the information, he has access to how the encryption worked and therefor, does not need the private key on the server side as he just needs to reverse the process knowing how it was done on the client's side.
I'm starting to think myself as stupid thinking that way.
I'll add a little more information as I think we don't quite catch what I mean. When sending a password, say my name "David" and let's name our user WebUser. We will name our maleficient user BadGuy. So BadGuy hapopens to integrate himself in between WebUser and his browser. BadGuy also recieves ALL javascripts of the webpage permitting him to see how the calculations work before it is sent. WebUser enters his password "David" which is submitted to the javascript encryption system. Right off the bat, BadGuy does not need to decrypot anything as he already caught the password. BUT when the website responds, BadGuy has all the calculations and can use the receieved encrypted data and decrypt it using the decryption calculations he can see in the recieved web pages code.
So the only thing I can understand is that Assymetric keys are used for encryption which technically is decryptable using public known numbers. But in cas of RSA, these 2 numbers are so large that it would take years to figure out the known decryptor. As I can also undersnat is that it is pretty much easier to create the 2 numbers from the private number. But in any case, the encryption process usually ends up with a shared temporary intimate key between the two parties for for faster commuinication and that noone can ever prevent a BagGuy between User and Browser but with todays technocolgies, the real threat is more MiTM attacks where one will sniff the network. In all cases, there is no definate way to communicate 100% of the data in a undecryptable way as at least 50% of it is decryptable i/e data coming from one side or data going to the other side.
Assymetric encryption has two keys, a public and a private key, as you correctly described, so don't feel stupid. Both keys can be used for encryption and decryption, however, if data encrypted by the public key can only be decrypted by the private key and data encrypted by the private key can only be decrypted by the public key.
As a result, in order to be successfully involved in a communication using assymetric encryption you will need to have both a public and a private key.
You share your public key with others, that is, whatever data you receive, it will be encrypted with the public key. You will subsequently be able to decrypt it using your private key, which is your secret. When you send data to the other side of the communication, you encrypt it using your private key and the other side, which has your public key will be able to decrypt it.
Consider the example of versioning. You are involved in a project with some team members. When you pull the commits of others, it is encrypted with your public key, so once it is downloaded at your end, you will be able to decrypt it via your private key. As you work and do your commits, you will push the changes into the repository, encrypted using your private key. The other side of the communication already has your public key and will be able to decrypt it. It is important that you do not share your private key with anyone, so your team-mates will not be able to impersonate you, committing malicious code in your name. You can share your public key with anyone, but it is recommended to share it only with trusted people, like your team-mates, so no one else will be able to decrypt anything encrypted by your private key.
Essentially your public key is a ridiculously large number, which is the result by multiplying two primes (private key). The two primes could be found out by prime factorization, but since the public key is a very very large number, doing the prime factorization would take such a looong time that no one will sit and wait for the time (centuries) while the factorization is being executed and the results are found out.
A session id is a value which identifies a session. If there is a single such value, then it is not an assymetric encryption, as there is no public and private key involved and once someone steals the session ID, as you correctly pointed out, the malicious third person/system can impersonate the actual user and do nasty things. So the problem you have identified actually exists, but this is not a new problem and solutions were implemented. The solution you are looking for is HTTPS. Once your site gets a proper certificate, you will be able to use assymetric encryption safe and sound. Under the hood the server will have the public key of the user's session, while the user will use the private key to encrypt/decrypt and if a middle man intercepts the public key of the session (which is not a session id), the malicious third person will not be able to impersonate the actual user. Read more here:
https://en.wikipedia.org/wiki/Transport_Layer_Security
extending the previous answer
I'm just wandering how an attacker positionned between the user and his browser cannot intercept the connection details when they are clear texte to beggin with and to end with.
The magic here is called DH key exchange.
The symmetric encryption key is derived using Diffie–Hellman key exchange, where the common encryption key is exchanged.
Any "listening" party (your BadGuy) woudn't be able to derive the session key even by sniffing out the whole communications. The server will use its certificate and private key to make sure the client communicates with the legitimate target. This prevents an active "man in the middle" to pose as a false server.
it does not make sens that you CANNOT decrypt an ecrypted text from a public key when you have access to all the calculations made from the side it encrypted.
Asymmetric cryptography is based on so called "trapdoor" funtions. It means it is easy to calculate the function one way (e.g. encrypt data), but very difficult (not feasible) to od it opposite way without some secret value (private key). Indeed sometimes it is difficult to understand it and there are a lot of constraints under the asymmetric encryption is really secure. That's why you would always use some trusted library than do it yourself.
By encrypting the data from the browser's end, anyone that took a look on your source code can know how you encrypted it which now can be used to decrypt it.
Not without the random secret key, which is derived between the client and server during the key exchange (see the first paragraph).
I am creating a new encryption system for our website where the server randomly creates a session key that can only be used by the user with the corresponding session.
It's one of the rules in the field of cryptography - do not design your own crypto!
That's usually a bad idea. Please note the currently used secure channels (SSL, TLS, .. based on RSA, ECC) are designed, reviewed and used by a lot of smart people who know what they are doing, how to mitigate different attack vectors. And IMHO it is still not perfect, but it's the best we have.

Is HMAC still needed if encrypted data is always saved and retrieved locally

My understanding of HMAC is that it can help to verify the integrity of encrypted data before the data is processed i.e. it can be used to determine whether or not the data being sent to a decryption routine has been modified in any way.
That being the case, is there any advantage in incorporating it into an encryption scheme if the data is never transmitted outside of the application generating it? My use case is quite simple - a user submits data (in plaintext) to the scripts I've written to store customer details. My scripts then encrypt this data and save it to the database, and my scripts then provide a way for the user to retrieve the data and decrypt it based on the record ID they supply. There is no way for my users to send encrypted data directly to the decryption routine and I don't need to provide an external API.
Therefore, is it reasonable to assume that there is a chain of trust in the application by default because the same application is responsible for writing and retrieving the data? If I add HMAC to this scheme, is it redundant in this context or is it best practice to always implement HMAC regardless of the context? I'm intending to use the Defuse library but I'd like to understand what the benefit of HMAC is to my project.
Thanks in advance for any advice or input :)
First, you should understand that there are attacks that allow an attacker to modify encrypted data without decrypting it. See Is there an attack that can modify ciphertext while still allowing it to be decrypted? on Security.SE and Malleability attacks against encryption without authentication on Crypto.SE. If an attacker gets write access to the encrypted data -- even without any decryption keys -- they could cause significant havoc.
You say that the encrypted data is "never transmitted outside of the application generating it" but in the next two sentences you say that you "save it to the database" which appears (to me) to be something of a contradiction. Trusting the processing of encrypted data in memory is one thing, but trusting its serialization to disk, especially if done by another program (such as a database system) and/or on a separate physical machine (now or in the future, as the system evolves).
The significant question here is: would it ever be a possible for an attacker to modify or replace the encrypted data with alternate encrypted data, without access to the application and keys? If the attacker is an insider and runs the program as a normal user, then it's not generally possible to defend your data: anything the program allows the attacker to do is on the table. However, HMAC is relevant when write access to the data is possible for a non-user (or for a user in excess of their normal permissions). If the database is compromised, an attacker could possibly modify data with impunity, even without access to the application itself. Using HMAC verification severely limits the attacker's ability to modify the data usefully, even if they get write access.
My OCD usually dictates that implementing HMAC is always good practice, if for no other reason, to remove the warning from logs.
In your case I do not believe there is a defined upside to implementing HMAC other than ensuring the integrity of the plain text submission. Your script may encrypt the data but it would not be useful in the unlikely event that bad data is passed to it.

Change encryption key without exposing plaintext

We're designing a database system to store encrypted strings of information, with encryption and decryption performed client side using public-key cryptography.
If the key was ever changed though, this would necessitate reencrypting all the records client side, which is very impractical.
Is there any way this could be performed server side without exposing either the original (old) decryption key, or the message text?
I guess what I'm after is an associative cipher, something like this:
T( Eo(m) ) = En( Do(Eo(m) ))
where Eo(m) is the cipher text, Eo/Do the old pub/priv key pair, En the new pub key, m the message text and T the magical reencryption function.
Edit: T is calculated clientside and then sent to the server to be used.
You can't retroactively disable the old key anyway. Anyone who has access to the old data and the old key can decrypt the data no matter what you do.
I would suggest simply keeping a ring of keys. Add the new key to the ring and mark it active. Mark the old key expired. Code the client so that if it finds any data that's encrypted with an expired key, it re-encrypts it with the active key. (Or don't. What's needed depends on details of your implementation requirements.)
If desired, after a period of time, you can sweep for any data still encrypted with the old key and re-encrypt it.
You can't eliminate the exposure of the old key anyway, ever -- anyone who can find a backup or copy of data encrypted with the old key can decrypt it if they have the old key. Encryption keys must be protected forever or you get the fiasco that released the Wikileaks diplomatic cables to the public with the names of informants intact.
Think about your security perimeters. If you're worried about the server being compromised, consider building a harder-to-break subsystem which carried out the transcryption. You could do this with a non-network-attached server which was contacted only over a very tightly verified link protocol (over, say, a serial line), or a dedicated hardware security module. However, if you do something like this, you must think about how your keys are protected; if an attacker could steal the transient plaintext from your server, could they also steal the keys protecting it?

AES Encryption and key storage?

A few years ago, when first being introduced to ASP.net and the .NET Framework, I built a very simple online file storage system.
This system used Rijndael encryption for storing the files encrypted on the server's hard drive, and an HttpHandler to decrypt and send those files to the client.
Being one of my first project with ASP.net and databases, not understanding much about how the whole thing works (as well as falling to the same trap described by Jeff Atwood on this subject), I decided to store freshly generated keys and IVs together with each file entry in the database.
To make things a bit clearer, encryption was only to protect files from direct access to the server, and keys were not generated by user-entered passwords.
My question is, assuming I don't want to keep one key for all files, how should I store encryption keys for best security? What is considered best practice? (i.e: On a different server, on a plain-text file, encrypted).
Also, what is the initialization vector used for in this type of encryption algorithm? Should it be constant in a system?
Keys should be protected and kept secret, simple as that. The implementation is not. Key Management Systems get sold for large amounts of money by trusted vendors because solving the problem is hard.
You certainly don't want to use the same key for each user, the more a key is used the "easier" it comes to break it, or at least have some information leaks. AES is a block cipher, it splits the data into blocks and feeds the results of the last block encryption into the next block. An initialization vector is the initial feed into the algorithm, because at the starting point there is nothing to start with. Using random IVs with the same key lowers the risk of information leaks - it should be different for every single piece of data encrypted.
How you store the keys depends on how your system is architected. I've just finished a KMS where the keys are kept away from the main system and functions to encrypt and decrypt are exposed via WCF. You send in plain text and get a reference to a key and the ciphered text back - that way the KMS is responsible for all cryptography in the system. This may be overkill in your case. If the user enters a password into your system then you could use that to generate a key pair. This keypair could then be used to encrypt a key store for that user - XML, SQL, whatever, and used to decrypt each key which is used to protect data.
Without knowing more about how your system is configured, or it's purpose it's hard to recommend anything other than "Keys must be protected, keys and IVs must not be reused."
There's a very good article on this one at http://web.archive.org/web/20121017062956/http://www.di-mgt.com.au/cryptoCreditcard.html which covers the both the IV and salting issues and the problems with ECB referred to above.
It still doesn't quite cover "where do I store the key", admittedly, but after reading and digesting it, it won't be a huge leap to a solution hopefully....
As a pretty good soltution, you could store your Key/IV pair in a table:
ID Key IV
skjsh-38798-1298-hjj FHDJK398720== HFkjdf87923==
When you save an encrypted value, save the ID and a random Salt value along with it.
Then, when you need to decrypt the value, lookup the key/iv pair using the id and the salt stored with the data.
You'd want to make sure you have a good security model around the key storage. If you went with SQL server, don't grant SELECT rights to the user that accesses the database from the application. You wouldn't want to give someone access to the whole table.
What if, you simply just generated a key for each user, then encrypted it with a "master key"? Then, make sure to have random ivs and as long as you keep the master key secret, no one should be able to make much use of any amount of keys. Of course, the encryption and decryption functions would have to be server-side, as well as the master key not being exposed at all, not even to the rest of the server. This would be a decent way to go about it, but obviously, there are some issues, namely, if you have stored your master key unsafely, well there goes your security. Of course, you could encrypt the master key, but then your just kicking the can down the road. Maybe, you could have an AES key, encrypted with a RSA key, and the RSA key is then secured by a secret passprase. This would mitigate the problem, as if you have a decent sized RSA key, you should be good, and then you could expose the encryption functions to the client (though still probably shouldn't) and since the key encryption uses a public key, you can have that taken. For added security, you could cycle the RSA key every few months or even weeks if need be. These are just a few ideas, and I know that it isn't bulletproof, but is more secure than just stuffing it in a sql database.

What system do you use to encrypt files for a group of people (OS agnostic prefered)?

Say you have a bunch of files.
Say you can store meta data to these files.
Say, one of these meta attributes were called "encryption"
Say everyone was allowed to look at these files, but since they are encrypted, only people who know how to decrypt them can actually read the contents.
Say, for every given value of "encryption", a group of people share the knowledge on how to decrypt files marked with that value.
Say you want to be able to do this programmatically, in an OS agnostic way (if possible)
What are the values you would use for "encryption"?
How would you store the keys?
How would you organize access to the keys?
I am currently leaning towards following implementation:
the value of the field "encryption" contains the name of a key, possibly also denoting the algorithm used
each user has access to a bunch of keys. This could be defined by roles the user has in an LDAP/ActiveDirectory like structure, or they could just be files in a secure directory in the users profile/home directory.
on viewing a file, the viewer (I'm trying to build a document management system) checks the users keys and decrypts the file if a matching key was found.
What encryption would you use? Symmetric (AES)? Or Asymmetric (what are the good ones)?
Using asymmetric keys would have the additional benefit of making a difference between reading a file and writing a file: Access to the private key is necessary for writing the file, access to the public key (only semi public, as only certain roles have access to it) would allow reading the file. Am I totally mistaken here?
What are common systems to solve these problems used in small to medium sized businesses?
EDIT: It seems there are no universal sollutions. So, I will state the problem I am trying to solve a little more clearly:
Imagine a Document Management System that operates in a distributed fashion: Each document is copied to various nodes in a (company controlled, private) P2P network. An algorithm for assuring redundancy of documents is used to ensure backups of all documents (including revisions). This system works as a service / daemon in the background and shovels documents to and fro.
This means, that users will end up with documents probably not meant for them to see on their local workstation (a company controlled PC or a laptop or something - the setting is such that a SME IT guy sets this all up and controls who is part of the P2P network).
This rules out directory access based schemes, as the user will probably be able to get to the data. Am I mistaken here? Could a local folder be encrypted such that it can only be accessed by a Domain user? How secure is that?
I am aware of users sharing decrypted versions of files - and that that is hard to suppress technically. This is not a problem I am trying to solve.
The encryption isn't the hard part, here. Understanding the business needs, and especially, what threats you're trying to protect against, is the hard part. Key management isn't a trivial thing.
I highly recommend the book "Applied Cryptography" to help you understand the protocol-level issues better.
This is a hard problem. If this is something really serious, you should not use the advice of amateur cryptographers on the internet.
That said, here's my musings:
I'd encrypt each file with a random symmetric key using AES. This encryption would be on a job that runs overnight, so the key changes overnight.
I'd encrypt the key of each file with the public key of everyone who has access to the file.
If someone loses access to files, they'd be unable to read the new copies the next day (they could still have copies locally of old versions).
I'd use gpg (runs on nearly all OS-es happily).
You misunderstand asymmetric crypto. Public key is given to everyone, Private key you keep yourself. If Alice encrypts something with Bob's Public key, only Bob can decrypt it. If Bob encrypts something with his Private key - everyone can decrypt it, and everyone knows it came from Bob cause only he has his Private Key.
EDIT: However, if you ignored everything I said and went a different route, and gave every FILE it's own pub/priv keypair... then you would rely on the public key be available ONLY to those you want to read the file, and the private key available to those you want to r/w. But that's a bit trickier, and relies heavily on people not being able to distribute keys. Overnight jobs to change keys could mitigate that problem, but then you have the problem of distributing new keys to users.
If I understand you correctly, you could use GNU Privacy Guard. It's cross-platform and open source. Basically, every user has a copy of GPG and a local "keychain" with their "private keys" and "public keys". When you want to encrypt something, you use the person's public key, and the results can only be decrypted with their associated private key. A user can have more than one keypair, so you could give all administrators access to the "administrator role" private key, and each hold of they private key could decrypt documents encrypted with the "administrator role" public key.
The cool part is that you can encrypt a file with multiple public keys, and any one of the corresponding private keys could then be used to decrypt it.
The difficulty of this problem is why many businesses default to using OS-specific solutions, such as Active Directory.
For OS-agnostic, you have to re-create a lot of user-management stuff that the specific OS and/or Network vendors have already built.
But it can be done. For the encryption itself - go with AviewAnew's answer.
I have to agree with Mark here:
Understanding the business needs, and especially, what threats you're trying to protect against, is the hard part
For example; are you worried that unauthorized users may gain access to sensitive files? You can use file-level access control on virtually any operating system to restrict users or groups from accessing files/directories.
Are you worried that authorized users may copy the files locally and then lose their laptop? There are a number of os-level encryption facilities that provide varying degrees of protection. I personally recommend TrueCrypt for thumb drives and other portable media, and Windows Vista now include BitLocker which provides a different level of protection.
Another variation of the lost-laptop theme is the lost-backup theme, and many backup vendors now include encryption schemes for your tape backups for just this reason.
Finally, if you're worried that authorized users may share the files with unauthorized users then you may be trying to solve the wrong problem. Authorized users who can decrypt these files can just as easily share a new unencrypted version of the same document.
What you need is public-key encryption using either OpenPGP or X.509 certificates. In both cases you can encrypt the single block of data for multiple "recipients" using their OpenPGP keys or X.509 certificates respectively. In X.509 the standards for encrypting the data this way are PKCS#7 and CMS (defined in some RFC, I forgot the number). You would need to employ some key revocation checking in order to prevent access for those people, who were given access before but don't have it now.

Resources