SSH public key authentication with Linux Azure DSVM through R - r

I'm trying to use the R AzureDSVM package to create a Linux DSVM through R. I am reading the guide https://raw.githubusercontent.com/Azure/AzureDSVM/master/vignettes/10Deploy.Rmd (Azure DSVM guide)
First the guide requests you create an Azure Active Directory application which will provide a "tenant ID", "client ID" and "user key", the guidelines described in http://htmlpreview.github.io/?https://github.com/Microsoft/AzureSMR/blob/master/inst/doc/Authentication.html (Azure SMR Auth guide)
As I understand it, this creates an app registered in Azure Active Directory, creates an "authentication key" for the app, which is the user key, and associates the app with a Resource Group. I've done this sucessfully.
The Azure DSVM guide then creates a VM with public key authentication in a similar way to what follows:
library(AzureSMR)
library(AzureDSVM)
TID <- "123abc" # Tenant ID
CID <- "456def" # Client ID
KEY <- "789ghi" # User key
context <- createAzureContext(tenantID=TID, clientID=CID, authKey=KEY)
resourceGroup<-"myResouceGroup"
location<-"myAzureLocation"
vmUsername<-"myVmUsername"
size<-"Standard_D1_v2"
mrsVmPassword<-"myVmPassword"
hostname<-"myVmHostname"
ldsvm <- deployDSVM(context,
resource.group = resourceGroup,
location = location,
hostname = hostname,
username = vmUsername,
size = size,
os = "Ubuntu",
pubkey = PUBKEY)
The guide vaguely describes creating a public key (PUBKEY) from the users private key, which is sent to the VM to allow it to provide SSH authentication:
To get started we need to load our Azure credentials as well as the
user’s ssh public key. Public keys on Linux are typically created on
the users desktop/laptop machine and will be found within
~/.ssh/id_rsa.pub. It will be convenient to create a credentials file
to contain this information. The contents of the credentials file will
be something like the foloowing and we assume the user creates such a
file in the current working directory, naming the file _credentials.R.
Replace with the user’s username.
TID <- "72f9....db47" # Tenant ID
CID <- "9c52....074a" # Client ID
KEY <- "9Efb....4nwV....ASa8=" # User key
PUBKEY <- readLines("~/.ssh/id_rsa.pub") # For Linux DSVM
My question:
Is this public key PUBKEY generated from the authentication/user key created by setting up the Azure Active Directory application in the Azure SMR Auth guide (the KEY variable in the above script)? If so, how? I've tried using the R sodium library pubkey(charToRaw(KEY)) to do this but I get "Invalid key, must be exactly 32 bytes".
If PUBKEY isn't generated from KEY, from what is it generated? And how does the package know how to authenticate with the private key to this public key?

The AAD key is used for authenticating to AAD. The public/private keypair is separate and is used for authenticating to the VM. If you do not have a public key (in the file ~/.ssh/id_rsa.pub), you can create one using ssh-keygen on Linux.
SSH connections use the private key (in ~/.ssh/id_rsa) by default.

A couple of things in addition to Paul Shealy's (correct) answer:
ssh-keygen is also installed on recent versions of Windows 10 Pro, along with ssh, scp and curl. Otherwise, you probably have the Putty ssh client installed, in which case you can use puttygen to save a public/private key pair.
AzureDSVM is rather old and depends on AzureSMR, which is no longer actively maintained. If you want to deploy a DSVM, I'd recommend using the AzureVM package, which is on CRAN and GitHub. This in turn builds on the AzureRMR package which provides a general framework for managing Azure resources.
library(AzureVM)
az <- AzureRMR::az_rm$new(tenant="youraadtenant", app="yourapp_id", password="password")
sub <- az$get_subscription("subscription_id")
rg <- sub$get_resource_group("rgname")
vm <- rg$create_vm(os="Ubuntu",
username="yourname",
passkey=readLines("~/.ssh/id_rsa.pub"),
userauth_type="key")
Have a look at the AzureRMR and AzureVM vignettes for more information.
Disclaimer: I'm the author of AzureRMR and AzureVM.

Related

How to hide sensitive data from node.conf?

Can someone please give me an example for corporatePasswordStore that is mentioned here:
https://docs.corda.net/node-administration.html?fbclid=IwAR0gRwe5BtcWO0NymZVyE7_yMfthu2xxnU832vZHdbuv17S-wPXgb7iVZSs#id2
I've been doing a lot of research in the last few days on how to hide the plain passwords from node.conf; it's a new topic for me and this is what I came up with so far:
Create a priv/pub key with gpg2
Create a password store with pass (using the key that I generated earlier).
Store all the plain passwords from node.conf inside that password store.
Replace the plain passwords in node.conf with environment variables (e.g. keyStorePassword = ${KEY_PASS})
Create a script file (e.g. start_node.sh) that will do the following:
a. Set an environment variable to one of the passwords from the password store: export key_store_password=$(pass node.conf/keyStorePassword)
b. Start the node: java -jar corda.jar
c. Restart the gpg agent to clear the cached passwords, otherwise you can get any password from the store without passing the passphrase: gpgconf --reload gpg-agent
Pros:
Using the bash file start_node.sh allows to set many passwords as environment variables at once (e.g. keyStore, trustStore, db passwords, RPC user password)
Since we are running the bash file with bash start_node.sh and not source start_node.sh, the environment variable is not exposed to the parent process (i.e. you cannot read that environment variable value inside the terminal where you ran bash start_node.sh
History commands are not enabled by default inside bash scripts.
Cons:
You no longer can have a service that automatically starts on VM startup, because the start_node.sh script will ask for the passphrase for your gpg key that was used to encrypt the passwords inside the password store (i.e. it's an interactive script).
Am I over-complicating this? Do you have an easier approach? Is it even necessary to hide the plain passwords?
I'm using Corda open source so I can't use the Configuration Obfuscator (which is for Enterprise only): https://docs.corda.r3.com/tools-config-obfuscator.html#configuration-obfuscator (edited)
I wrote a detailed article here: https://blog.b9lab.com/enabling-corda-security-with-nodes-configuration-file-412ce6a4371c, which covers the following topics:
Enable SSL for database connection.
Enable SSL for RPC connection.
Enable SSL for Corda webserver.
Enable SSL for Corda standalone shell.
Hide plain text passwords.
Set permissions for RPC users.

Scopus api wrong

I use this to test if I can retrieve references from a paper using doi from rscopus package
I use this:
library(rscopus)
library(dplyr)
auth_token_header("please_add")
akey="please_add"
object_retrieval("10.1109/ISCSLP.2014.6936630", ref = "doi")
but I receive this error:
Error in get_api_key(api_key, error = api_key_error) :
API key not found, please set option('elsevier_api_key_filename') or option('elsevier_api_key') for general use or set environment variable Elsevier_API, to be accessed by Sys.getenv('Elsevier_API')
Why do I receive it?
Please follow the steps I outlined in the section of https://github.com/muschellij2/rscopus#steps-to-get-api-key
Which is posted below:
In order to use this package, you need an API key from https://dev.elsevier.com/sc_apis.html. You should login from your institution and go to Create API Key. You need to provide a website URL and a label, but the website can be your personal website, and agree to the terms of service.
Go to https://dev.elsevier.com/user/login. Login or create a free account.
Click "Create API Key". Put in a label, such as rscopus key. Add a website. http://example.com is fine if you do not have a site.
Read and agree to the TOS if you do indeed agree.
Add Elsevier_API = "API KEY GOES HERE" to ~/.Renviron file, or add export Elsevier_API=API KEY GOES HERE to your ~/.bash_profile.
Alternatively, you you can either set the API key using rscopus::set_api_key or by options("elsevier_api_key" = api_key). You can access the API key using rscopus::get_api_key.
You should be able to test out the API key using the interactive Scopus APIs.
A note about API keys and IP addresses
The API Key is bound to a set of IP addresses, usually bound to your institution. Therefore, if you are using this for a Shiny application, you must host the Shiny application from your institution servers in some way. Also, you cannot access the Scopus API with this key if you are offsite and must VPN into the server or use a computing cluster with an institution IP.
See https://dev.elsevier.com/tecdoc_api_authentication.html

Credentials for AWS Athena ODBC connection

I want to access AWS Athena in Power BI with ODBC. I used the ODBC driver(1.0.3) that Amazon provides:
https://docs.aws.amazon.com/de_de/athena/latest/ug/connect-with-odbc.html
To access the AWS-Service I use the user=YYY and the password=XXX. To access the relevant data our administrator created a role “ExternalAthenaAccessRole#99999”.
99999 is the ID of the account where Athena runs.
To use the ODVC-driver in Power BI I created the following connection string:
Driver=Simba Athena ODBC Driver;AwsRegion=eu-central-1;S3OutputLocation=s3://query-results-bucket/testfolder;AuthenticationType=IAM Credentials;
But when I enter the User XXX with the password YYY It get the message “We couldn’t authenticate with the credentials provided. Please try again.”.
Normally I would think that I must include the role “ExternalAthenaAccessRole#99999” in the connection string, but I couldn’t find a parameter for it in the documentation.
https://s3.amazonaws.com/athena-downloads/drivers/ODBC/SimbaAthenaODBC_1.0.3/Simba+Athena+ODBC+Install+and+Configuration+Guide.pdf
Can anybody help me how I can change the connection string so that I can access the data with the ODBC driver in Power BI?
TL;DR;
When using Secret Keys, do not specify "User / password", but instead always click on "default credentials" in Power Bi, to force it to use the Local AWS Configuration (e.g. C:/...$USER_HOME/.aws/credentials)
Summarized Guide for newbies:
Prerequisites:
AWSCli installed locally, on your laptop. If you don’t have this, just download the MSI installer from here:
https://docs.aws.amazon.com/cli/latest/userguide/install-windows.html
Note: this quick guide is just to configure the connection using AWS Access Keys, and not federating the credentials through any other Security layer.
Configure locally your AWS credentials.
From the Windows command prompt (cmd), execute: aws configure
Enter your AWS Access Key ID, Secret Access Key and default region; for example "eu-west-1" for Ireland.
You can get these Keys from the AWS console, IAM service, Users, select your user, Security, Create/Download Access Keys.
You should never share these keys, and it’s highly recommended to rotate these, for example, every month.
Download Athena ODBC Driver:
https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html
Important: If you have Power Bi 64 bits, download the same (32 or 64) for the ODBC.
Install it on your laptop, where you have Power Bi.
Open Windows ODBCs, add a User DSN and select Simba-Athena as the Driver.
Use always "Default credentials" and not user/password, since it will use our local keys from Step 1.
Configure an S3 bucket, for the temporary results. You can use something like: s3://aws-athena-query-results-eu-west-1-power-bi
On the Power Bi app, click on Get Data and Type ODBC.
Choose Credentials "default", to use the local AWS keys (from step 1) and, optionally, enter a "select" query.
Click on Load the data.
Important concern: I’m afraid Power Bi will load all the results from the query into our local memory. So if, for example. we're bringing 3 months of data and that is equivalent to 3 GB, then we will consume this in our local laptop.
Another important concern:
- For security reasons, you'll need to implement a KMS Encryption keys. Otherwise, the data is being transmitted in clear text, instead of being encrypted.
Relevant reference (as listed above), where you can find the steps for this entire configuration process, but more in detail:
- https://s3.amazonaws.com/athena-downloads/drivers/ODBC/Simba+Athena+ODBC+Install+and+Configuration+Guide.pdf
Carlos.

How to locate the private key of a certificate in Windows

I want to locate the private key of a certificate in the current user certificate store in Windows. Does anyone know where the private key is saved?
This article describes where private keys are stored on a filesystem: Key Storage and Retrieval
To determine exact file name, run the following command in the Command Prompt:
certutil -user -store my "<SerialNumber>"
where <SerialNumber> is the serial number of the target certificate. If certificate contains private key, there will be Unique Container Name field which contains file name.
You can see the certificates in the Microsoft Management Console (MMC). See here a guide with some steps:
https://msdn.microsoft.com/en-us/library/ms788967(v=vs.110).aspx

What's the best method for passing AWS credentials as user data to an EC2 instance?

I have a job processing architecture based on AWS that requires EC2 instances query S3 and SQS. In order for running instances to have access to the API the credentials are sent as user data (-f) in the form of a base64 encoded shell script. For example:
$ cat ec2.sh
...
export AWS_ACCOUNT_NUMBER='1111-1111-1111'
export AWS_ACCESS_KEY_ID='0x0x0x0x0x0x0x0x0x0'
...
$ zip -P 'secret-password' ec2.sh
$ openssl enc -base64 -in ec2.zip
Many instances are launched...
$ ec2run ami-a83fabc0 -n 20 -f ec2.zip
Each instance decodes and decrypts ec2.zip using the 'secret-password' which is hard-coded into an init script. Although it does work, I have two issues with my approach.
'zip -P' is not very secure
The password is hard-coded in the instance (it's always 'secret-password')
The method is very similar to the one described here
Is there a more elegant or accepted approach? Using gpg to encrypt the credentials and storing the private key on the instance to decrypt it is an approach I'm considering now but I'm unaware of any caveats. Can I use the AWS keypairs directly? Am I missing some super obvious part of the API?
You can store the credentials on the machine (or transfer, use, then remove them.)
You can transfer the credentials over a secure channel (e.g. using scp with non-interactive authentication e.g. key pair) so that you would not need to perform any custom encryption (only make sure that permissions are properly set to 0400 on the key file at all times, e.g. set the permissions on the master files and use scp -p)
If the above does not answer your question, please provide more specific details re. what your setup is and what you are trying to achieve. Are EC2 actions to be initiated on multiple nodes from a central location? Is SSH available between the multiple nodes and the central location? Etc.
EDIT
Have you considered parameterizing your AMI, requiring those who instantiate your AMI to first populate the user data (ec2-run-instances -f user-data-file) with their AWS keys? Your AMI can then dynamically retrieve these per-instance parameters from http://169.254.169.254/1.0/user-data.
UPDATE
OK, here goes a security-minded comparison of the various approaches discussed so far:
Security of data when stored in the AMI user-data unencrypted
low
clear-text data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the clear-text http://169.254.169.254/1.0/user-data)
Security of data when stored in the AMI user-data and encrypted (or decryptable) with easily obtainable key
low
easily-obtainable key (password) may include:
key hard-coded in a script inside an ABI (where the ABI can be obtained by an attacker)
key hard-coded in a script on the AMI itself, where the script is readable by any user who manages to log onto the AMI
any other easily obtainable information such as public keys, etc.
any private key (its public key may be readily obtainable)
given an easily-obtainable key (password), the same problems identified in point 1 apply, namely:
the decrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access clear-text http://169.254.169.254/1.0/user-data)
you are vulnerable to proxy request attacks (e.g. attacker asks the Apache that may or may not be running on the AMI to get and forward the encrypted http://169.254.169.254/1.0/user-data, ulteriorly descrypted with the easily-obtainable key)
Security of data when stored in the AMI user-data and encrypted with not easily obtainable key
average
the encrypted data is accessible to any user who manages to log onto the AMI and has access to telnet, curl, wget, etc. (can access encrypted http://169.254.169.254/1.0/user-data)
an attempt to decrypt the encrypted data can then be made using brute-force attacks
Security of data when stored on the AMI, in a secured location (no added value for it to be encrypted)
higher
the data is only accessible to one user, the user who requires the data in order to operate
e.g. file owned by user:user with mask 0600 or 0400
attacker must be able to impersonate the particular user in order to gain access to the data
additional security layers, such as denying the user direct log-on (having to pass through root for interactive impersonation) improves security
So any method involving the AMI user-data is not the most secure, because gaining access to any user on the machine (weakest point) compromises the data.
This could be mitigated if the S3 credentials were only required for a limited period of time (i.e. during the deployment process only), if AWS allowed you to overwrite or remove the contents of user-data when done with it (but this does not appear to be the case.) An alternative would be the creation of temporary S3 credentials for the duration of the deployment process, if possible (compromising these credentials, from user-data, after the deployment process is completed and the credentials have been invalidated with AWS, no longer poses a security threat.)
If the above is not applicable (e.g. S3 credentials needed by deployed nodes indefinitely) or not possible (e.g. cannot issue temporary S3 credentials for deployment only) then the best method remains to bite the bullet and scp the credentials to the various nodes, possibly in parallel, with the correct ownership and permissions.
I wrote an article examining various methods of passing secrets to an EC2 instance securely and the pros & cons of each.
http://www.shlomoswidler.com/2009/08/how-to-keep-your-aws-credentials-on-ec2/
The best way is to use instance profiles. The basic idea is:
Create an instance profile
Create a new IAM role
Assign a policy to the previously created role, for example:
{
"Statement": [
{
"Sid": "Stmt1369049349504",
"Action": "sqs:",
"Effect": "Allow",
"Resource": ""
}
]
}
Associate the role and instance profile together.
When you start a new EC2 instance, make sure you provide the instance profile name.
If all works well, and the library you use to connect to AWS services from within your EC2 instance supports retrieving the credentials from the instance meta-data, your code will be able to use the AWS services.
A complete example taken from the boto-user mailing list:
First, you have to create a JSON policy document that represents what services and resources the IAM role should have access to. for example, this policy grants all S3 actions for the bucket "my_bucket". You can use whatever policy is appropriate for your application.
BUCKET_POLICY = """{
"Statement":[{
"Effect":"Allow",
"Action":["s3:*"],
"Resource":["arn:aws:s3:::my_bucket"]}]}"""
Next, you need to create an Instance Profile in IAM.
import boto
c = boto.connect_iam()
instance_profile = c.create_instance_profile('myinstanceprofile')
Once you have the instance profile, you need to create the role, add the role to the instance profile and associate the policy with the role.
role = c.create_role('myrole')
c.add_role_to_instance_profile('myinstanceprofile', 'myrole')
c.put_role_policy('myrole', 'mypolicy', BUCKET_POLICY)
Now, you can use that instance profile when you launch an instance:
ec2 = boto.connect_ec2()
ec2.run_instances('ami-xxxxxxx', ..., instance_profile_name='myinstanceprofile')
I'd like to point out that it is not needed to supply any credentials to your EC2 instance anymore. Using IAM, you can create a role for your EC2 instances. In these roles, you can set fine-grained policies that allow your EC2 instance to, for example, get a specific object from a specific S3 bucket and no more. You can read more about IAM Roles in the AWS docs:
http://docs.aws.amazon.com/IAM/latest/UserGuide/WorkingWithRoles.html
Like others have already pointed out here, you don't really need to store AWS credentials for an EC2 instance, by using IAM Roles -
https://aws.amazon.com/blogs/security/a-safer-way-to-distribute-aws-credentials-to-ec2/.
I will add that you can employ the same method also for securely storing NON-AWS credentials for you EC2 instance, like say if you have some db credentials you want to keep secure. You save the non-aws credentials on a S3 Bukcet, and use IAM role to access that bucket.
you can find more detailed information on that here - https://aws.amazon.com/blogs/security/using-iam-roles-to-distribute-non-aws-credentials-to-your-ec2-instances/

Resources