Is it possible to use kubernetes-secrets together with Google Composer in order to access secrets from Airflow workers?
We are using k8s secrets with our existing standalone k8s Airflow cluster and were hoping we can achieve the same with Google Composer.
By default, Kubernetes secrets are not exposed to the Airflow workers deployed by Cloud Composer. You can patch the deployments to add them (airflow-worker and airflow-scheduler), but there will be no guarantee that they won't be reverted if you perform an update on the environment (such as configuration update or in-place upgrade).
It's probably easiest to use an Airflow connection (which are encrypted in the metadata database using Fernet), or to launch new pods using KubernetesPodOperator/GKEPodOperator and mounting the relevant secrets into the pod at pod launch.
Kubernetes secrets are available to the Airflow workers. You can contribute the components for whatever API you wish to call to work natively in Airflow so that the credentials can be stored as a Connection in Airflow's metadata database, which is encrypted at rest. Using Airflow connection involves storing the secret key in GCS with an appropriate ACL, and setting up Composer to secure the connection.
You can write your own custom operator to access the secret in the Kubernetes and use it. Take a look for SimpleHttpOperator - this pattern can be applied to any arbitrary secret management scheme. This is for for scenarios that access external services that aren't explicitly supported by Airflow Connections, Hooks, and Operators.
I hope it helps.
Related
My code is in R. And I need to excess external database. I am storing database credentials in AWS Secret Manager.
So I first tried using paws library to get aws secrets in R but that would require storing access key, secret id and session token, and I want to avoid that.
Is there a better way to do this? I have created IAM role for Sagemaker. Is it possible to pass secrets as environment variables?
Edit: I wanted to trigger Sagemaker Processing
I found a simple solution to it. Env variables can be passed via Sagemaker sdk. It minimizes the dependencies.
https://sagemaker.readthedocs.io/en/stable/api/training/processing.html
As another answer suggested, paws can be used as well to get secrets from aws. This would be a better approach
You should be able to use Paws for this. According to documentation it will use the IAM role configured for your Sagemaker instance
If you are running the package on an instance with an appropriate IAM role, Paws will use it automatically and you don’t need to do anything extra.
You only have to add the relevant access permissions (e.g. Allow ssm:GetParameters) to the Sagemaker IAM role.
I'm using two firebase projects: one for development and staging, and another for production. The Firebase CLI allows me to switch projects with firebase use _____.
For the client I'm using create-react-app and implicitly configuring firebase by using the From Hosting URLs.
The trouble comes with configuring each project's connection to third party services. For most services I have separate accounts, so need different keys (and secrets on the server), for development and production.
For firebase functions, I can use functions config vars for each project. Pretty easy.
But what's the best way to do this on the client?
create-react-app has great support for various .env files, but can I link a .env file to a firebase project rather than using their prioritization?
Or is there a way to expose the firebase functions config vars to create-react-app's start, build, and test processes as environment variables? (preferably without building all variables into the public js :-P)
What's the best way to do this?
The best way to do this seems to be to use GCP secret manager :
Secret Manager stores API keys, passwords, certificates, and other
sensitive data. It provides convenience while improving security
https://cloud.google.com/secret-manager/docs/quickstart
Beware, it's a standalone service by GCP, therefore Google charges you to store your API keys. The pricing calculation example they detail, so i'm guessing it's a typical use case, gives a monthly cost of $15.15.
That's not cheap to store dumb API keys.
The other way is to use cloud functions as you did.
The benefits of using GCP SM are that the service can be combined with audit logs, that it has a version management feature, and that you can set permission levels.
I'm attempting to utilize the KMS library in one of my DAGs which is running the PythonOperator, but I'm encountering an error in the airflow webserver:
details = "Cloud Key Management Service (KMS) API has not been used in project 'TENANT_PROJECT_ID' before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project='TENANT_PROJECT_ID' then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
The airflow webserver is unable to import my specific DAG from my host project to the tenant project (which is where the webserver is running). The DAGs runs with no problem as my host project is correctly setup, but not having the opportunity to monitor it in the UI is a huge drawback.
System specifications:
softwareConfig:
imageVersion: composer-1.8.2-airflow-1.10.3
pypiPackages:
google-cloud-kms: ==1.2.1
pythonVersion: '3'
It would be nice to be able to leverage KMS and the airflow ui, if not then I might have to add my secrets to cloud composer environmental variables (which is not preferred.)
Any known solutions on this?
The Airflow webserver is a managed component in Cloud Composer, so as other have stated, it runs in a tenant project that you (as the environment owner) do not have access to. There is currently no way to access this project.
If you have a valid use case for enabling extra APIs in the tenant project, I'd recommend submitting product feedback. You can find out how to do that from the product's public documentation (including if you want to submit a feature request to the issue tracker).
Alternatively, if you're willing to experiment, AIP-24 was an Airflow proposal called DAG database persistence that caches DAGs in the Airflow database, as opposed to parsing/importing them in the webserver (which is the reason why you need KMS in this situation). If you're using Composer 1.8.1+, then you can experimentally enable the feature by setting core.store_serialized_dags=True. Note that it's not guaranteed to work for all DAGs, but it may be useful to you here.
I try to run a bash command in this pattern ssh user#host "my bash command" using BashOperator in Airflow. This works locally because I have my publickey in the target machine.
But I would like to run this command in Google Cloud Composer, which is Airflow + Google Kubernetes Engine. I understood that the Airflow's core program is running in 3 pods named according to this pattern airflow-worker-xxxxxxxxx-yyyyy.
A naive solution was to create an ssh keys for each pod and add it's public key to the target machine in Compute Engine. The solution worked until today, somehow my 3 pods have changed so my ssh keys are gone. It was definitely not the best solution.
I have 2 questions:
Why Google cloud composer have changed my pods ?
How can I resolve my issue ?
Pods restarts are not specifics to Composer. I would say this is more related to kubernetes itself:
Pods aren’t intended to be treated as durable entities.
So in general pods can be restarted for different reasons, so you shouldn't rely on any changes that you make on them.
How can I resolve my issue ?
You can solve this taking into account that Cloud Composer creates a Cloud Storage bucket and links it to your environment. You can access the different folders of this bucket from any of your workers. So you could store your key (you can use only one key-pair) in "gs://bucket-name/data", which you can access through the mapped directory "/home/airflow/gcs/data". Docs here
I am trying to test run a basic .NET web application on pivotal cloud foundry. This web application uses as its database a MongoDB server hosted on my local machine. At the moment I am limited to use of the cloud infrastructure by using just the Apps Manager.
I have read the pivotal cloud foundry docs about user provided services, but cannot figure out as to how the connection is to be really made. I have already come across various other ways like using MongoDB as a service (beta version), but at the moment I am not allowed access to the Operations Manager. Looking for an explanation on user provided services or how to implement the service broker API, specifically.
I am new to Mongo as well, so any suggestion regarding making a connection through tweaking Mongo may help as well. Thanks
The use case you describe (web app in PCF connecting to a resource in your local machine) is not recommended.
You can create a MongoDB instance for development purposes in PCF.
$ cf marketplace
...
mlab sandbox Fully managed MongoDB-as-a-Service
...
You can create a mlab service and bind it to your application. You will then have a MongoDB instance in PCF that you can use for development purposes.
Edit:
In that case a user provided service might help you, where you pass in your remote MongoDB instance configuration that you can read in your application. e.g.:
cf.exe cups my-mongodb -p '{"key1":"value1","key2":"value2"}'
You can add your local mongo-db as a CUPS service to your PCF Dev.
Check out the following post.
How to create a CUPS service for mongoDB?