Providing decryption key with gcloud jobs submit training - encryption

I have succesfully trained my first network with the Google Cloud ML engine, and now I am trying to make the setup a bit more secure by providing my own encryption key for encrypting the data. As explained in the manual I have now copied my data to the Cloud Storage with my own custom encryption key, instead of storing it there unencrypted.
However, now my setup (obviously!) broke, as the Python code I submit to the ML Engine cannot decrypt the files. I am expecting an option like --decrypt-key to gcloud ml-engine jobs submit training, but I cannot find such an option. How to provide this key such that my code can decrypt the data?

Short answer: You should not pass the decryption key into the training job. Instead see https://cloud.google.com/kms/docs/store-secrets
Long answer: While you could technically make the decryption key a flag that gets passed through the Training Job definition, this would expose it to anyone with access to List Training Jobs. You should instead place the key in the Google Cloud Key Management Service and give the service account running the ML training job permission to fetch the key from there.
You can determine the service account that runs the training job by following the procedure listed at https://cloud.google.com/ml-engine/docs/how-tos/working-with-data#using_a_cloud_storage_bucket_from_a_different_project
Edit: Also note what Alexey says in the comment below; Tensorflow won't currently be able to read and decrypt the files directly from GCS, you'll need to copy them to local disk on every worker with the keys supplied to gsutil cp.

Related

How to use GCP KMS with Firebase and Firebase Cloud Functions

I need some advice on how to properly set up a solid security structure for my app.
What my app does
The goal of this app is to provide a data aggregation service.
To do this, the user needs to provide login data for a variety of his accounts.
The user can then trigger a firebase cloud function which performs web scraping with the provided credentials, encrypts the result-data, and stores it to firestore.
Current encryption
Currently, the encryption key is stored in a separate document in the firestore database.
The cloud function gets the key, performs de- and encryption, and stores the data as a cipher.
Now I know this is kind of pointless because if someone would hack my Google account, the data would still be readable for him.
Problems
Besides this security flaw, I am facing some other problems.
As long as the described encryption only happens in a cloud function, this may be relatively secure because the cloud functions are isolated. My problem is, that there is no way to perform a database query from the client because:
there is no "onRead" cloud function, in which I would decrypt the data before sending it to the client
decrypting the data on the client would expose the encryption key to potential hackers (at least that's what I am thinking at the moment)
Conclusion
I have decided to try GCP's KMS which seems to be the solution to all of these problems. However, I am overwhelmed by all these new terms and most of the resources I found seemed outdated.
Closest I came was this post: http://www.geero.net/2017/05/how-to-encrypt-a-google-firebase-realtime-database/ but since it's from 2017, it seems to be outdated (As far as I understand from this answer)
So I am a bit lost on where to start, what to use, and how to manage responsibilities.
Questions
Is it possible to create a secure client-sided decryption with KMS? If
not, how should this get handled?
How do I implement KMS with firebase and firebase cloud functions? (Any pointing in the right
direction would help)
Do you recommend using this package for the cloud function
implementation: https://www.npmjs.com/package/#google-cloud/kms
Did you spot and other security flaws than those I mentioned?
Do you have any additional advice?
Thanks in advance!

DPAPI Key Storage and Restoration

In light of the upcoming GDPR regulations, the company I work for is looking at upgrading their encryption algorithms and encrypting significantly more data than before. As the one appointed to take care of this, I have replaced our old CAST-128 encryption (I say encryption but it was more like hashing, no salt and resulting in the same ciphertext every time) with AES-256 and written the tools to migrate the data. However, the encryption key is still hardcoded in the application, and extractable within a couple of minutes with a disassembler.
Our product is a desktop application, which most of our clients have installed in-house. Most of them are also hosting their own DBs. Since they have the entirety of the product locally, securing the key seems like a pretty difficult task.
After some research, I've decided to go with the following approach. During the installation, a random 256-bit key will be generated for every customer and used to encrypt their data with AES encryption. The key itself will then be encrypted with DPAPI in user mode, where the only user who can access the data will be a newly created locked down domain service account with limited permissions, who is unable to actually log in to the machine. The encrypted key will the be stored in an ACL-ed part of the registry. The encryption module will then impersonate that user to perform its functions.
The problem is that since the key will be randomly generated at install time, and encrypted immediately, not even we will have it. If customers happen to delete this account, reinstall the server OS, or manage to lose the key in some other manner, the data will be unrecoverable. So after all that exposition, here comes the actual question:
I am thinking of having customers back up the registry where the key is stored and assuming that even after a reinstall or user deletion, as long as the same user account is created with the same password, on the same machine, it will create the same DPAPI secrets and be able to decrypt the key. However, I do not know whether or not that is the case since I'm not sure how these secrets are generated in the first place. Can anyone confirm whether or not this is actually the case? I'm also open to suggestions for a completely different key storage approach if you can think of a better one.
I don't see the link with GDPR but let's say this is just context.
It takes more than the user account, its password and the machine. there is more Entropy added to the ciphering of data with DPAPI.
See : https://msdn.microsoft.com/en-us/library/ms995355.aspx#windataprotection-dpapi_topic02
A small drawback to using the logon password is that all applications
running under the same user can access any protected data that they
know about. Of course, because applications must store their own
protected data, gaining access to the data could be somewhat difficult
for other applications, but certainly not impossible. To counteract
this, DPAPI allows an application to use an additional secret when
protecting data. This additional secret is then required to unprotect
the data. Technically, this "secret" should be called secondary
entropy. It is secondary because, while it doesn't strengthen the key
used to encrypt the data, it does increase the difficulty of one
application, running under the same user, to compromise another
application's encryption key. Applications should be careful about how
they use and store this entropy. If it is simply saved to a file
unprotected, then adversaries could access the entropy and use it to
unprotect an application's data. Additionally, the application can
pass in a data structure that will be used by DPAPI to prompt the
user. This "prompt structure" allows the user to specify an additional
password for this particular data. We discuss this structure further
in the Using DPAPI section.

Encrypting the database at rest without paying?

Right now the only way to encrypt a Cassandra database at rest seems to be with their enterprise edition which costs thousands of dollars: How to use Cassandra with TDE (Transparent Data Encryption)
Another solution is to encrypt every value before it enters the database, but then the key will be stored somewhere on every server in plaintext and would be easy to find.
I understand they offer "free" use for certain companies, but this is not an option and I am not authorized to pay $2000/server. How do traditional companies encrypt their distributed databases?
Thanks for the advice
I took the approach of encrypting the data disk on AWS. I added a new volume to the instance and checked the option to encrypt the volume. Then I edited cassandra.yaml to point to the encrypted volume.
We have done similar requirement in one of our project. Basically, I made use of trigger feature in Cassandra with custom implementation to perform encryption. It seems to be working fine for us.
You can refer below docs on how to create trigger and sample implemention of ITrigger interface
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTrigger.html
https://github.com/apache/cassandra/blob/2e5847d29bbdd45fd4fc73f071779d91326ceeba/examples/triggers/src/org/apache/cassandra/triggers/AuditTrigger.java
Encrypting before inserting is a good way. The keys will either be on each application or on each cassandra node. There isnt much difference really, either way you should use filesystem permissions to restrict access to key just the apps user. Theres steps to get more secure from there like requiring entering of passphrase on startup vs storing on disk, but it makes operational tasks horrific.

How do I hide my secret key for data bag encryption on a standalone node using Chef Client local mode

I have a Windows server running Chef Client in local mode. I would like to use encrypted data bags for users and passwords, but this becomes an issue since the secret key will need to be stored locally. What are my best options for enabling encrypted data bags and also having a secure secret key?
This isn't what encrypted data bags do. The purpose of that feature is to prevent disclosing the contents to the Chef Server. From the PoV of the client, it is in the clear because it has to have the decryption key. If you have only a single node, there isn't much value in the encryption for Chef. It might still be useful if you are storing that data in a git repo or similar, but in those cases you are probably better off with another solution. Check out https://coderanger.net/chef-secrets/ for a summary of the options.

EncryptByKey versus EncryptByPassPhrase?

What are your thoughts about SQL Server's symmetric key functions? Specifically, I have two questions:
Which set of functions is better... EncryptByKey or EncryptByPassPhrase?
Both functions require a passphrase of some kind. In a typical web-application architecture, where should this passphrase be stored? (i.e., hard coded within a stored procedure in the database, or stored as a configuration setting in the web application)
I'm eager to see what the best practice is for these functions.
Encrypting using a passphrase is easier but the advantage of using a key is that the key can be secured using built in SQL sever roles. You can lock down use of the key to only those users that require access to that data.
If you use a certificate then you only need plain text during the initial setup and can store it outside your system. Again, the certificate is a securable object and can be locked down.
Hope this helps.

Resources