I work on an app which does some kind of data engineering and we use Azure ADLS for data storage and Databricks for data manipulation. There are two approaches in order to retrieve the data, the first one uses the Storage Account and Storage account secret key and the other approach uses mount point. When I go with the first approach, I can successfully check, from .NET, whether the Storage account and it's corresponsive Secret key correspond to each other and return a message whether the credentials are right or not. However, I need to do the same thing with the mount point i.e. determine whether the mount point exists in dbutils.fs.mounts() or anywhere in the storage (I don't know how mount point exactly works and if it stores data in blob).
The flow for Storage account and Secret key is the following:
Try to connect using the BlobServiceClient API from Microsoft;
If it fails, return a message to the user that the credentials are invalid;
If it doesn't fail, proceed further.
I'm not that familiar with /mnt/ and stuff because I mostly do .NET but is there a way to check from .NET whether a mount point exists or not?
Mount point is just a kind of reference to the underlying cloud storage. dbutils.fs.mounts() command needs to be executed on some cluster - it's doable, but it's not fast & cumbersome.
The simplest way to check that is to use List command of DBFS REST API, passing the mount point name /mnt/<something> as path parameter. If it doesn't exist, you'll get error message RESOURCE_DOES_NOT_EXIST:
{
"error_code": "RESOURCE_DOES_NOT_EXIST",
"message": "No file or directory exists on path /mnt/test22/."
}
I already followed the steps exactly specified at this link
However, I am still having the issue. My build will get stuck when accessing the private repo.
$ julia --check-bounds=yes -e 'Pkg.clone("https://github.com/xxxx/xxxx.git")'
INFO: Cloning xxxx from https://github.com/xxxx/xxxx.git
Username for 'https://github.com':
Done: Job Cancelled
Note: I manually cancel it after a few minutes of waiting. How can I get it to use the SSH key I have setup and bypass this username and password field?
Note: xxxx is used in place of the name of my project to make this post general. I have already checked out the links on Travis CI and they don't make it clear what needs to occur. Thank you!
Update: I tried to add a GitHub Token Pkg.clone("https://fake_git_hub_token#github.com/xxxx/xxxx.git") and it still prompts me to sign in with the username. I gave that token full Repo access. Also, note that I am using Travis CL Virtual Machine.
In the Travis CI docs they reference the following:
Assumptions:
The repository you are running the builds for is called “myorg/main” and depends on “myorg/lib1” and “myorg/lib2”.
You know the credentials for a user account that has at least read access to all three repositories.
To pull in dependencies with a password, you will have to use the user name and password in the Git HTTPS URL: https://ci-user:mypassword123#github.com/myorg/lib1.git.
SOLUTION:
just add TravisCIUsername:mypassword#github.com/organizer_of_the_repo/Dependancy.git
In my case, I am going to make a fake admin account to run the tests since someone will have to expose their password to use this setup. Note that you can set up 2-factor authentication on the admin account such that only one person can access it even if they know the password.
You need to add the SSH key to the Travis UI under an environmental variable for your desired repo. You also need to add the key to the .travis.yml file on that repo.
https://docs.travis-ci.com is the docs for Travis
SOLUTION: just add Travis_CI_Username:my_password#github.com/organizer_of_the_repo/Dependancy.git to the travis.yml. file.
If this is unclear, please comment and I will update, but this is how I got it to work for me(even tho I went through all the SSH key business).
In my case, I am going to make a fake admin account to run the tests since someone will have to expose their password to use this setup.
Note that you can set up 2-factor authentication on the admin account such that only one person can access it even if they know the password.
I am creating cloud formation script, which will have ELB. In Auto Scaling launch configuration, I want to add encrypted EBS volume. Couldn't find an encrypted property withing blockdevicemapping. I need to encrypt volume. How can I attach an encrypted EBS volume to an EC2 instance through auto scaling launch configuration?
There is no such property for some strange reason when using launch configurations, however it is there when using blockdevicemappings with simple EC2 instances. See
launchconfig-blockdev vs ec2-blockdev
So you'll either have to use simple instances instead of autoscaling groups, or you can try this workaround:
SnapshotIds are accepted for launchconf blockdev too, and as stated here "Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted."
Create a snapshot from an encrypted empty EBS volume and use it in the CloudFormation template. If your template should work in multiple regions then of course you'll have to create the snapshot in every region and use a Mapping in the template.
As Marton says, there is no such property (unfortunately it often takes a while for CloudFormation to catch up with the main APIs).
Normally each encrypted volume you create will have a different key. However, when using the workaround mentioned (of using an encrypted snapshot) the resulting encrypted volumes will inherit the encryption key from the snapshot and all be the same.
From a cryptography point of view this is a bad idea as you potentially have multiple, different volumes and snapshots with the same key. If an attacker has access to all of these then he can potentially use differences to infer information about the key more easily.
An alternative is to write a script that creates and attaches a new encrypted volume at the boot time of a instance. This is fairly easy to do. You'll need to give the instance permissions to create and attach volumes and either have installed the AWS CLI tool or a library for your preferred scripting language. One you have that you can, from the instance that is booting, create a volume and attach it.
You can find a starting point for such a script here: https://github.com/guardian/machine-images/blob/master/packer/resources/features/ebs/add-encrypted.sh
There is an AutoScaling EBS Block Device type which provides the "Encrypted" option:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-launchconfig-blockdev-template.html
Hope this helps!
AWS recently announced Default Encryption for New EBS Volumes. You can enable this per region via
EC2 Console > Settings > Always encrypt new EBS volumes
https://aws.amazon.com/blogs/aws/new-opt-in-to-default-encryption-for-new-ebs-volumes/
I have been searching for an answer on MS, SE and Google and cannot find it. I want to use the GRS option for Azure Storage (Cloud Block Blobs) but I cannot figure out how to properly do that.
I created my storage object in Azure and chose the GRS option.
I get that I have a primary and secondary connection string and know how to get that from the Azure portal.
What I do not know, in ASP.NET 4.0, is how to set both connection strings in the CloudBlockClient and gracefully handle the primary storage being unavailable.
--What exception is thrown and where, when primary is unavailable? Is this thrown when I create the client, or when I try to get a blob reference?
-- How do I then use the secondary?
Do I have to just test for any old exception and then try using the secondary connection string in a new CloudBlockClient if the primary does not work? Or is there anything in the API for this. I would think there would be but I cannot find it.
None of the "How to use Azure Storage" tutorials I have seen go into this. Most of the documentation seems to date from before mid-2014 when this feature became generally available.
This blog post should help you. In short if you want to read from both primary and secondary you want to enable RA-GRS - essentially read access from the secondary. If you are using out storage client libraries you can also enable a retry policy that will first try to read from a primary and then from the secondary if the first read fails.
I have a job processing architecture based on AWS that requires EC2 instances query S3 and SQS. In order for running instances to have access to the API the credentials are sent as user data (-f) in the form of a base64 encoded shell script. For example:
$ cat ec2.sh
...
export AWS_ACCOUNT_NUMBER='1111-1111-1111'
export AWS_ACCESS_KEY_ID='0x0x0x0x0x0x0x0x0x0'
...
$ zip -P 'secret-password' ec2.sh
$ openssl enc -base64 -in ec2.zip
Many instances are launched...
$ ec2run ami-a83fabc0 -n 20 -f ec2.zip
Each instance decodes and decrypts ec2.zip using the 'secret-password' which is hard-coded into an init script. Although it does work, I have two issues with my approach.
'zip -P' is not very secure
The password is hard-coded in the instance (it's always 'secret-password')
The method is very similar to the one described here
Is there a more elegant or accepted approach? Using gpg to encrypt the credentials and storing the private key on the instance to decrypt it is an approach I'm considering now but I'm unaware of any caveats. Can I use the AWS keypairs directly? Am I missing some super obvious part of the API?
You can store the credentials on the machine (or transfer, use, then remove them.)
You can transfer the credentials over a secure channel (e.g. using scp with non-interactive authentication e.g. key pair) so that you would not need to perform any custom encryption (only make sure that permissions are properly set to 0400 on the key file at all times, e.g. set the permissions on the master files and use scp -p)
If the above does not answer your question, please provide more specific details re. what your setup is and what you are trying to achieve. Are EC2 actions to be initiated on multiple nodes from a central location? Is SSH available between the multiple nodes and the central location? Etc.
EDIT
Have you considered parameterizing your AMI, requiring those who instantiate your AMI to first populate the user data (ec2-run-instances -f user-data-file) with their AWS keys? Your AMI can then dynamically retrieve these per-instance parameters from http://169.254.169.254/1.0/user-data.
UPDATE
OK, here goes a security-minded comparison of the various approaches discussed so far:
Security of data when stored in the AMI user-data unencrypted
low
clear-text data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the clear-text http://169.254.169.254/1.0/user-data)
Security of data when stored in the AMI user-data and encrypted (or decryptable) with easily obtainable key
low
easily-obtainable key (password) may include:
key hard-coded in a script inside an ABI (where the ABI can be obtained by an attacker)
key hard-coded in a script on the AMI itself, where the script is readable by any user who manages to log onto the AMI
any other easily obtainable information such as public keys, etc.
any private key (its public key may be readily obtainable)
given an easily-obtainable key (password), the same problems identified in point 1 apply, namely:
the decrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the encrypted http://169.254.169.254/1.0/user-data, ulteriorly descrypted with the easily-obtainable key)
Security of data when stored in the AMI user-data and encrypted with not easily obtainable key
average
the encrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access encrypted http://169.254.169.254/1.0/user-data)
an attempt to decrypt the encrypted data can then be made using brute-force attacks
Security of data when stored on the AMI, in a secured location (no added value for it to be encrypted)
higher
the data is only accessible to one user, the user who requires the data in order to operate
e.g. file owned by user:user with mask 0600 or 0400
attacker must be able to impersonate the particular user in order to gain access to the data
additional security layers, such as denying the user direct log-on (having to pass through root for interactive impersonation) improves security
So any method involving the AMI user-data is not the most secure, because gaining access to any user on the machine (weakest point) compromises the data.
This could be mitigated if the S3 credentials were only required for a limited period of time (i.e. during the deployment process only), if AWS allowed you to overwrite or remove the contents of user-data when done with it (but this does not appear to be the case.) An alternative would be the creation of temporary S3 credentials for the duration of the deployment process, if possible (compromising these credentials, from user-data, after the deployment process is completed and the credentials have been invalidated with AWS, no longer poses a security threat.)
If the above is not applicable (e.g. S3 credentials needed by deployed nodes indefinitely) or not possible (e.g. cannot issue temporary S3 credentials for deployment only) then the best method remains to bite the bullet and scp the credentials to the various nodes, possibly in parallel, with the correct ownership and permissions.
I wrote an article examining various methods of passing secrets to an EC2 instance securely and the pros & cons of each.
http://www.shlomoswidler.com/2009/08/how-to-keep-your-aws-credentials-on-ec2/
The best way is to use instance profiles. The basic idea is:
Create an instance profile
Create a new IAM role
Assign a policy to the previously created role, for example:
{
"Statement": [
{
"Sid": "Stmt1369049349504",
"Action": "sqs:",
"Effect": "Allow",
"Resource": ""
}
]
}
Associate the role and instance profile together.
When you start a new EC2 instance, make sure you provide the instance profile name.
If all works well, and the library you use to connect to AWS services from within your EC2 instance supports retrieving the credentials from the instance meta-data, your code will be able to use the AWS services.
A complete example taken from the boto-user mailing list:
First, you have to create a JSON policy document that represents what services and resources the IAM role should have access to. for example, this policy grants all S3 actions for the bucket "my_bucket". You can use whatever policy is appropriate for your application.
BUCKET_POLICY = """{
"Statement":[{
"Effect":"Allow",
"Action":["s3:*"],
"Resource":["arn:aws:s3:::my_bucket"]}]}"""
Next, you need to create an Instance Profile in IAM.
import boto
c = boto.connect_iam()
instance_profile = c.create_instance_profile('myinstanceprofile')
Once you have the instance profile, you need to create the role, add the role to the instance profile and associate the policy with the role.
role = c.create_role('myrole')
c.add_role_to_instance_profile('myinstanceprofile', 'myrole')
c.put_role_policy('myrole', 'mypolicy', BUCKET_POLICY)
Now, you can use that instance profile when you launch an instance:
ec2 = boto.connect_ec2()
ec2.run_instances('ami-xxxxxxx', ..., instance_profile_name='myinstanceprofile')
I'd like to point out that it is not needed to supply any credentials to your EC2 instance anymore. Using IAM, you can create a role for your EC2 instances. In these roles, you can set fine-grained policies that allow your EC2 instance to, for example, get a specific object from a specific S3 bucket and no more. You can read more about IAM Roles in the AWS docs:
http://docs.aws.amazon.com/IAM/latest/UserGuide/WorkingWithRoles.html
Like others have already pointed out here, you don't really need to store AWS credentials for an EC2 instance, by using IAM Roles -
https://aws.amazon.com/blogs/security/a-safer-way-to-distribute-aws-credentials-to-ec2/.
I will add that you can employ the same method also for securely storing NON-AWS credentials for you EC2 instance, like say if you have some db credentials you want to keep secure. You save the non-aws credentials on a S3 Bukcet, and use IAM role to access that bucket.
you can find more detailed information on that here - https://aws.amazon.com/blogs/security/using-iam-roles-to-distribute-non-aws-credentials-to-your-ec2-instances/