I'm attempting to utilize the KMS library in one of my DAGs which is running the PythonOperator, but I'm encountering an error in the airflow webserver:
details = "Cloud Key Management Service (KMS) API has not been used in project 'TENANT_PROJECT_ID' before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project='TENANT_PROJECT_ID' then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
The airflow webserver is unable to import my specific DAG from my host project to the tenant project (which is where the webserver is running). The DAGs runs with no problem as my host project is correctly setup, but not having the opportunity to monitor it in the UI is a huge drawback.
System specifications:
softwareConfig:
imageVersion: composer-1.8.2-airflow-1.10.3
pypiPackages:
google-cloud-kms: ==1.2.1
pythonVersion: '3'
It would be nice to be able to leverage KMS and the airflow ui, if not then I might have to add my secrets to cloud composer environmental variables (which is not preferred.)
Any known solutions on this?
The Airflow webserver is a managed component in Cloud Composer, so as other have stated, it runs in a tenant project that you (as the environment owner) do not have access to. There is currently no way to access this project.
If you have a valid use case for enabling extra APIs in the tenant project, I'd recommend submitting product feedback. You can find out how to do that from the product's public documentation (including if you want to submit a feature request to the issue tracker).
Alternatively, if you're willing to experiment, AIP-24 was an Airflow proposal called DAG database persistence that caches DAGs in the Airflow database, as opposed to parsing/importing them in the webserver (which is the reason why you need KMS in this situation). If you're using Composer 1.8.1+, then you can experimentally enable the feature by setting core.store_serialized_dags=True. Note that it's not guaranteed to work for all DAGs, but it may be useful to you here.
Related
I have a backend system built in AWS and I'm utilizing CloudWatch in all of the services for logging and monitoring. I really like the ability to send structured JSON logs into CloudWatch that are consistent and provide a lot of context around the log message. Querying the logs and getting to the root of an issue is simple or just exploring the health of the environment - makes CloudWatch a must have for my backend.
Now I'm working on the frontend side of things, mobile applications using Xamarin.Forms. I know AWS has Amplify but I really wanted to stick with Xamarin.Forms as that's a skill set I've already got and I'm comfortable with. Since Amplify didn't support Xamarin.Forms I've been stuck looking at other options for logging - one of them being Microsoft's AppCenter.
If I go the AppCenter route I'll end up having to build out a mapping of the AppCenter installation identifier and my users between the AWS environment and the AppCenter environment. Before I start down that path I wanted to ask a couple questions around best practice and security of an alternative approach.
I'm considering using the AWS SDK for .Net, creating an IAM Role with a Policy that allows for X-Ray and CloudWatch PUT operations on a specific log group and then assigning it to an IAM User. I can issue access keys for the user and embed them in my apps config files. This would let me send log data right into CloudWatch from the mobile apps using something like NLog.
I noticed with AppCenter I have to provide a client secret to the app, which wouldn't be any different than providing an IAM User access key to my app for pushing into CloudWatch. I'm typically a little shy about issuing access keys from AWS but as long as the Policy is tight I can't think of any negative side-effects... other than someone flooding me with log data should they pull the key out of the app data.
An alternative route I'm exploring is instead of embedding the access keys in my config files - I could request them from my API services and hold it in-memory. Only downside to that is when the user doesn't have internet connectivity logging might be a pain (will need to look at how NLog handles sinks that aren't currently available - queueing and flushing)
Is there anything else I'm not considering or is this approach a feasible solution with minimal risk?
I'm using two firebase projects: one for development and staging, and another for production. The Firebase CLI allows me to switch projects with firebase use _____.
For the client I'm using create-react-app and implicitly configuring firebase by using the From Hosting URLs.
The trouble comes with configuring each project's connection to third party services. For most services I have separate accounts, so need different keys (and secrets on the server), for development and production.
For firebase functions, I can use functions config vars for each project. Pretty easy.
But what's the best way to do this on the client?
create-react-app has great support for various .env files, but can I link a .env file to a firebase project rather than using their prioritization?
Or is there a way to expose the firebase functions config vars to create-react-app's start, build, and test processes as environment variables? (preferably without building all variables into the public js :-P)
What's the best way to do this?
The best way to do this seems to be to use GCP secret manager :
Secret Manager stores API keys, passwords, certificates, and other
sensitive data. It provides convenience while improving security
https://cloud.google.com/secret-manager/docs/quickstart
Beware, it's a standalone service by GCP, therefore Google charges you to store your API keys. The pricing calculation example they detail, so i'm guessing it's a typical use case, gives a monthly cost of $15.15.
That's not cheap to store dumb API keys.
The other way is to use cloud functions as you did.
The benefits of using GCP SM are that the service can be combined with audit logs, that it has a version management feature, and that you can set permission levels.
Is it possible to use kubernetes-secrets together with Google Composer in order to access secrets from Airflow workers?
We are using k8s secrets with our existing standalone k8s Airflow cluster and were hoping we can achieve the same with Google Composer.
By default, Kubernetes secrets are not exposed to the Airflow workers deployed by Cloud Composer. You can patch the deployments to add them (airflow-worker and airflow-scheduler), but there will be no guarantee that they won't be reverted if you perform an update on the environment (such as configuration update or in-place upgrade).
It's probably easiest to use an Airflow connection (which are encrypted in the metadata database using Fernet), or to launch new pods using KubernetesPodOperator/GKEPodOperator and mounting the relevant secrets into the pod at pod launch.
Kubernetes secrets are available to the Airflow workers. You can contribute the components for whatever API you wish to call to work natively in Airflow so that the credentials can be stored as a Connection in Airflow's metadata database, which is encrypted at rest. Using Airflow connection involves storing the secret key in GCS with an appropriate ACL, and setting up Composer to secure the connection.
You can write your own custom operator to access the secret in the Kubernetes and use it. Take a look for SimpleHttpOperator - this pattern can be applied to any arbitrary secret management scheme. This is for for scenarios that access external services that aren't explicitly supported by Airflow Connections, Hooks, and Operators.
I hope it helps.
This page explains both:
Obtaining and providing service account credentials manually for developing local, deploying on-premises, or deploying to another public cloud.
Obtaining credentials on Compute Engine, Kubernetes Engine, App Engine flexible environment, and Cloud Functions
But there is no mention of obtaining credentials on Cloud Run. I'd appreciate it if you give instructions for obtaining credentials and setting firebase-admin initializeApp and firebase initializeApp for authentication on Cloud Run.
The documentation suggests that you can use the default service account just like other Google Cloud products as described here. The Firebase Admin SDK should use that account when initialized with no parameters.
There are also steps described if you want to use a non-default service account, which you can simply configure in the console or provide with gcloud.
If you must provide a file that's readable at runtime, you will have to deploy an image with that file added to the image. There is no short set of steps to add that file - you will have to make your docker build include it in a readable location, and your code will know where to look for it in order to load it.
I try to run a bash command in this pattern ssh user#host "my bash command" using BashOperator in Airflow. This works locally because I have my publickey in the target machine.
But I would like to run this command in Google Cloud Composer, which is Airflow + Google Kubernetes Engine. I understood that the Airflow's core program is running in 3 pods named according to this pattern airflow-worker-xxxxxxxxx-yyyyy.
A naive solution was to create an ssh keys for each pod and add it's public key to the target machine in Compute Engine. The solution worked until today, somehow my 3 pods have changed so my ssh keys are gone. It was definitely not the best solution.
I have 2 questions:
Why Google cloud composer have changed my pods ?
How can I resolve my issue ?
Pods restarts are not specifics to Composer. I would say this is more related to kubernetes itself:
Pods aren’t intended to be treated as durable entities.
So in general pods can be restarted for different reasons, so you shouldn't rely on any changes that you make on them.
How can I resolve my issue ?
You can solve this taking into account that Cloud Composer creates a Cloud Storage bucket and links it to your environment. You can access the different folders of this bucket from any of your workers. So you could store your key (you can use only one key-pair) in "gs://bucket-name/data", which you can access through the mapped directory "/home/airflow/gcs/data". Docs here