Check if azure databricks mount point exists from .NET - .net-core

I work on an app which does some kind of data engineering and we use Azure ADLS for data storage and Databricks for data manipulation. There are two approaches in order to retrieve the data, the first one uses the Storage Account and Storage account secret key and the other approach uses mount point. When I go with the first approach, I can successfully check, from .NET, whether the Storage account and it's corresponsive Secret key correspond to each other and return a message whether the credentials are right or not. However, I need to do the same thing with the mount point i.e. determine whether the mount point exists in dbutils.fs.mounts() or anywhere in the storage (I don't know how mount point exactly works and if it stores data in blob).
The flow for Storage account and Secret key is the following:
Try to connect using the BlobServiceClient API from Microsoft;
If it fails, return a message to the user that the credentials are invalid;
If it doesn't fail, proceed further.
I'm not that familiar with /mnt/ and stuff because I mostly do .NET but is there a way to check from .NET whether a mount point exists or not?

Mount point is just a kind of reference to the underlying cloud storage. dbutils.fs.mounts() command needs to be executed on some cluster - it's doable, but it's not fast & cumbersome.
The simplest way to check that is to use List command of DBFS REST API, passing the mount point name /mnt/<something> as path parameter. If it doesn't exist, you'll get error message RESOURCE_DOES_NOT_EXIST:
{
"error_code": "RESOURCE_DOES_NOT_EXIST",
"message": "No file or directory exists on path /mnt/test22/."
}

Related

Using Airflows S3Hook is there a way to copy objects between buckets with different connection ids?

I'm copying files from an external companies bucket, they've sent me an access key/secret that I've set up as an env variable. I want to be able to copy objects from their bucket, I've used the below but that's for moving objects with the same connection, how do I use S3Hook to copy objects w. a different conn id?
s3 = S3Hook(self.aws_conn_id)
s3_conn = s3.get_conn()
ext_s3 = S3Hook(self.ext_aws_conn_id)
ext_s3 conn = ext_s3.get_conn()
#this moves objects w. the same connection...
s3_conn.copy_object(Bucket="bucket",
Key=f'dest_key',
CopySource={
'Bucket': self.partition.bucket,
'Key': key
}, ContentEncoding='csv')
From my point of view this is impossible. First of all, you can only declare one URL endpoint.
Secondly, Airflow S3Hook work with Boto3 in its background, and probably, both of your connections will have different acces_key and secret_key to create the boto3 resource/client. As explained in this post, if you wish to copy between different buckets, then you will need to use a single set of credentials that have:
GetObject permission on the source bucket
PutObject permission on the destination bucket
Again in the S3Hook, you can only declare a single set of credentials. You could maybe use the credentials given by your client and declare a bucket in your account with PutObject permission, but this will imply that you are allowed to do this in your enterprise (not very wise in terms of security), and even though your S3Hook will still only reference to one single endpoint.
To sum up, everything I have been dealing with the same problem and ended up creating two S3 connections using the first one for downloading from the original bucket and the second to upload to my enterprise bucket.

VAULT - How to unwrap Secret ID with HTTP request

I want to integrate vault with a Java application. I follow this blog to do it.
The question is when I have wrapping token, I want to Unwrap it in step number 9 in the picture above with HTTP request and get the secret_id. I see the API document in here, but it require X-Vault-Token which can not store in my JAVA application. Without it the API response permission denied.
But when I use vault command: VAULT_TOKEN=xxxxxxxxxx vault unwrap -field=secret_id, it response a secret that what I want (I do not login to Vault).
Any have experience about this please help. Thank you.
In your linked diagram, step 10 (the next step) is literally "Login with Role ID and Secret ID". If you want to wrap a different secret, then you can change the pattern completely, but the blog post you're referencing wants you to use the Secret ID from the wrapped token response to then log into Vault with that role and get your final secrets.
So, take the output of SECRET_ID=$(VAULT_TOKEN=xxxxxxxxxx vault unwrap -field=secret_id), export it, and then run a resp=$(vault write auth/approle/login role_id="${ROLE_ID}" secret_id="${SECRET_ID}"); VAULT_TOKEN=$(echo "${resp}" | jq -r .auth.client_token), export the VAULT_TOKEN, then call Vault to get the secret you really want (vault kv get secret/path/to/foobar) and do something with it.
#!/usr/bin/env bash
wrap_token=$(cat ./wrapped_token.txt)
role_id=$(cat ./approle_role_id.txt)
secret_id=$(VAULT_TOKEN="${wrap_token}" vault unwrap -field=secret_id)
resp=$(vault write -format=json auth/approle/login role_id="${role_id}" secret_id="${secret_id}")
VAULT_TOKEN=$(echo "${resp}" | jq -r '.auth.client_token')
export VAULT_TOKEN
# Put a secret in a file
# Best to ensure that the fs permissions are suitably restricted
UMASK=0077 vault kv get -format=json path/to/secret > ./secret_sink.json
# Put a secret in an environment variable
SECRET=$(vault kv get -format=json path/to/secret)
export SECRET
In case you want to reduce the security of your pattern, you can read below...
If you want to avoid logging into Vault and simply give the app a secret, you can avoid many of these steps by having your trusted CI solution request the secret directly, i.e. vault kv get -wrap_ttl=24h secret/path/to/secret, and then the unwrapping step you're doing will actually contain a secret you want to use, instead of the intermediary secret that would allow you to log into Vault and establish an application identity. However, this is not recommended as it would make your CI solution want to access more secrets, which is far from least privilege, and it makes it incredibly difficult to audit where the secrets are actually being leveraged from a Vault perspective, which is one of the primary benefits of implementing a central secrets management solution like Vault.

Manually Creating a Root Token in Vault (the hard way)

Ok so I have an application that I inherited that we do not know the root token and we do not have any recovery keys or unseal keys. The problem is, we cannot authenticate into Vault at all and we also cannot have the instance go down.
I do have access to the datastore it uses (DynamoDB) and the encrypting keys. My assumption is that it would be possible in theory to manually add an entry and set a password directly on the underlying datastore instance so that we can have a root account again.
I know this one is weird but we cannot re-initialize the database.
Any thoughts on how this could be done?
You can try one of the below -
The initial root token generated at vault operator init time -- this token has no expiration
By using another root token; a root token with an expiration cannot create a root token that never expires
By using vault operator generate-root (example) with the permission of a quorum of unseal key holders
Root tokens are useful in development but should be extremely carefully guarded in production. In fact, the Vault team recommends that root tokens are only used for just enough initial setup (usually, setting up auth methods and policies necessary to allow administrators to acquire more limited tokens) or in emergencies, and are revoked immediately after they are no longer needed. If a new root token is needed, the operator generate-root command and associated API endpoint can be used to generate one on-the-fly.
You can read more here - https://www.vaultproject.io/docs/concepts/tokens
No matter how bad the breakup was with the previous administrator, call him and ask for the shards. Now. It's an emergency.
To create a root token, you need a quorum of shards. A shard is a large number that could be in base64. For example, this is what the same shard looks in both formats:
9PTUFNoCFapAvxQ2L72Iox/hmpjyHGC5PpkDj9itaMo=
f4f4d414da0215aa40bf14362fbd88a31fe19a98f21c60b93e99038fd8ad68ca
You can mix and match formats, but each shard can be only entered once.
Run the command vault status to know how many different shards you need to find. The default Threshold is 3:
$ vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
If you do get your hands on some shards, enter the command vault operator generate-root and enter them at the prompt. Don't cancel the ongoing root token generation, if someone entered a shard some time in the past, Vault has it (even if you don't). vault operator generate-root -status will tell you if Vault already has some shards. Here is an example where the first shard of three was entered:
$ vault operator generate-root -status
Nonce 9f435314-ce20-4716-cea7-a083de224e4e
Started true
Progress 1/3
Complete false
OTP Length 26
If you can't find the shards, you are in trouble. You will have to find a password and read all the secrets one by one (can be scripted), ideally every version of them. You say you can't log in, so you might have to ask your user to do it.
Keep in mind that some backends (like the PKI) can't be exported manually, not even by root.

AWS Auto Scaling Launch Configuration Encrypted EBS Cloud Formation Example

I am creating cloud formation script, which will have ELB. In Auto Scaling launch configuration, I want to add encrypted EBS volume. Couldn't find an encrypted property withing blockdevicemapping. I need to encrypt volume. How can I attach an encrypted EBS volume to an EC2 instance through auto scaling launch configuration?
There is no such property for some strange reason when using launch configurations, however it is there when using blockdevicemappings with simple EC2 instances. See
launchconfig-blockdev vs ec2-blockdev
So you'll either have to use simple instances instead of autoscaling groups, or you can try this workaround:
SnapshotIds are accepted for launchconf blockdev too, and as stated here "Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted."
Create a snapshot from an encrypted empty EBS volume and use it in the CloudFormation template. If your template should work in multiple regions then of course you'll have to create the snapshot in every region and use a Mapping in the template.
As Marton says, there is no such property (unfortunately it often takes a while for CloudFormation to catch up with the main APIs).
Normally each encrypted volume you create will have a different key. However, when using the workaround mentioned (of using an encrypted snapshot) the resulting encrypted volumes will inherit the encryption key from the snapshot and all be the same.
From a cryptography point of view this is a bad idea as you potentially have multiple, different volumes and snapshots with the same key. If an attacker has access to all of these then he can potentially use differences to infer information about the key more easily.
An alternative is to write a script that creates and attaches a new encrypted volume at the boot time of a instance. This is fairly easy to do. You'll need to give the instance permissions to create and attach volumes and either have installed the AWS CLI tool or a library for your preferred scripting language. One you have that you can, from the instance that is booting, create a volume and attach it.
You can find a starting point for such a script here: https://github.com/guardian/machine-images/blob/master/packer/resources/features/ebs/add-encrypted.sh
There is an AutoScaling EBS Block Device type which provides the "Encrypted" option:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-launchconfig-blockdev-template.html
Hope this helps!
AWS recently announced Default Encryption for New EBS Volumes. You can enable this per region via
EC2 Console > Settings > Always encrypt new EBS volumes
https://aws.amazon.com/blogs/aws/new-opt-in-to-default-encryption-for-new-ebs-volumes/

What's the best method for passing AWS credentials as user data to an EC2 instance?

I have a job processing architecture based on AWS that requires EC2 instances query S3 and SQS. In order for running instances to have access to the API the credentials are sent as user data (-f) in the form of a base64 encoded shell script. For example:
$ cat ec2.sh
...
export AWS_ACCOUNT_NUMBER='1111-1111-1111'
export AWS_ACCESS_KEY_ID='0x0x0x0x0x0x0x0x0x0'
...
$ zip -P 'secret-password' ec2.sh
$ openssl enc -base64 -in ec2.zip
Many instances are launched...
$ ec2run ami-a83fabc0 -n 20 -f ec2.zip
Each instance decodes and decrypts ec2.zip using the 'secret-password' which is hard-coded into an init script. Although it does work, I have two issues with my approach.
'zip -P' is not very secure
The password is hard-coded in the instance (it's always 'secret-password')
The method is very similar to the one described here
Is there a more elegant or accepted approach? Using gpg to encrypt the credentials and storing the private key on the instance to decrypt it is an approach I'm considering now but I'm unaware of any caveats. Can I use the AWS keypairs directly? Am I missing some super obvious part of the API?
You can store the credentials on the machine (or transfer, use, then remove them.)
You can transfer the credentials over a secure channel (e.g. using scp with non-interactive authentication e.g. key pair) so that you would not need to perform any custom encryption (only make sure that permissions are properly set to 0400 on the key file at all times, e.g. set the permissions on the master files and use scp -p)
If the above does not answer your question, please provide more specific details re. what your setup is and what you are trying to achieve. Are EC2 actions to be initiated on multiple nodes from a central location? Is SSH available between the multiple nodes and the central location? Etc.
EDIT
Have you considered parameterizing your AMI, requiring those who instantiate your AMI to first populate the user data (ec2-run-instances -f user-data-file) with their AWS keys? Your AMI can then dynamically retrieve these per-instance parameters from http://169.254.169.254/1.0/user-data.
UPDATE
OK, here goes a security-minded comparison of the various approaches discussed so far:
Security of data when stored in the AMI user-data unencrypted
low
clear-text data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the clear-text http://169.254.169.254/1.0/user-data)
Security of data when stored in the AMI user-data and encrypted (or decryptable) with easily obtainable key
low
easily-obtainable key (password) may include:
key hard-coded in a script inside an ABI (where the ABI can be obtained by an attacker)
key hard-coded in a script on the AMI itself, where the script is readable by any user who manages to log onto the AMI
any other easily obtainable information such as public keys, etc.
any private key (its public key may be readily obtainable)
given an easily-obtainable key (password), the same problems identified in point 1 apply, namely:
the decrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the encrypted http://169.254.169.254/1.0/user-data, ulteriorly descrypted with the easily-obtainable key)
Security of data when stored in the AMI user-data and encrypted with not easily obtainable key
average
the encrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access encrypted http://169.254.169.254/1.0/user-data)
an attempt to decrypt the encrypted data can then be made using brute-force attacks
Security of data when stored on the AMI, in a secured location (no added value for it to be encrypted)
higher
the data is only accessible to one user, the user who requires the data in order to operate
e.g. file owned by user:user with mask 0600 or 0400
attacker must be able to impersonate the particular user in order to gain access to the data
additional security layers, such as denying the user direct log-on (having to pass through root for interactive impersonation) improves security
So any method involving the AMI user-data is not the most secure, because gaining access to any user on the machine (weakest point) compromises the data.
This could be mitigated if the S3 credentials were only required for a limited period of time (i.e. during the deployment process only), if AWS allowed you to overwrite or remove the contents of user-data when done with it (but this does not appear to be the case.) An alternative would be the creation of temporary S3 credentials for the duration of the deployment process, if possible (compromising these credentials, from user-data, after the deployment process is completed and the credentials have been invalidated with AWS, no longer poses a security threat.)
If the above is not applicable (e.g. S3 credentials needed by deployed nodes indefinitely) or not possible (e.g. cannot issue temporary S3 credentials for deployment only) then the best method remains to bite the bullet and scp the credentials to the various nodes, possibly in parallel, with the correct ownership and permissions.
I wrote an article examining various methods of passing secrets to an EC2 instance securely and the pros & cons of each.
http://www.shlomoswidler.com/2009/08/how-to-keep-your-aws-credentials-on-ec2/
The best way is to use instance profiles. The basic idea is:
Create an instance profile
Create a new IAM role
Assign a policy to the previously created role, for example:
{
"Statement": [
{
"Sid": "Stmt1369049349504",
"Action": "sqs:",
"Effect": "Allow",
"Resource": ""
}
]
}
Associate the role and instance profile together.
When you start a new EC2 instance, make sure you provide the instance profile name.
If all works well, and the library you use to connect to AWS services from within your EC2 instance supports retrieving the credentials from the instance meta-data, your code will be able to use the AWS services.
A complete example taken from the boto-user mailing list:
First, you have to create a JSON policy document that represents what services and resources the IAM role should have access to. for example, this policy grants all S3 actions for the bucket "my_bucket". You can use whatever policy is appropriate for your application.
BUCKET_POLICY = """{
"Statement":[{
"Effect":"Allow",
"Action":["s3:*"],
"Resource":["arn:aws:s3:::my_bucket"]}]}"""
Next, you need to create an Instance Profile in IAM.
import boto
c = boto.connect_iam()
instance_profile = c.create_instance_profile('myinstanceprofile')
Once you have the instance profile, you need to create the role, add the role to the instance profile and associate the policy with the role.
role = c.create_role('myrole')
c.add_role_to_instance_profile('myinstanceprofile', 'myrole')
c.put_role_policy('myrole', 'mypolicy', BUCKET_POLICY)
Now, you can use that instance profile when you launch an instance:
ec2 = boto.connect_ec2()
ec2.run_instances('ami-xxxxxxx', ..., instance_profile_name='myinstanceprofile')
I'd like to point out that it is not needed to supply any credentials to your EC2 instance anymore. Using IAM, you can create a role for your EC2 instances. In these roles, you can set fine-grained policies that allow your EC2 instance to, for example, get a specific object from a specific S3 bucket and no more. You can read more about IAM Roles in the AWS docs:
http://docs.aws.amazon.com/IAM/latest/UserGuide/WorkingWithRoles.html
Like others have already pointed out here, you don't really need to store AWS credentials for an EC2 instance, by using IAM Roles -
https://aws.amazon.com/blogs/security/a-safer-way-to-distribute-aws-credentials-to-ec2/.
I will add that you can employ the same method also for securely storing NON-AWS credentials for you EC2 instance, like say if you have some db credentials you want to keep secure. You save the non-aws credentials on a S3 Bukcet, and use IAM role to access that bucket.
you can find more detailed information on that here - https://aws.amazon.com/blogs/security/using-iam-roles-to-distribute-non-aws-credentials-to-your-ec2-instances/

Resources