I want to create a role that can clear past Airflow DAG execution in specific DAG.
I have already found that User role has
can delete on DAG Runs
can delete on DAGs
https://airflow.apache.org/docs/apache-airflow/stable/security/access-control.html
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/delete_dag_run
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/post_clear_task_instances
But I assume that this permission can delete all DAGs. Which permission should I grant to clear past DAG execution for specific DAG?
Related
We have an adhoc airflow DAG, which anyone can trigger to run manually from team of 50+.
We can check airflow audit logs who triggered the DAG via dag id and we can also get email upon failure.
There are ways using emailOperator or custom PythoOperator which will trigger the email, but the challenge is airflow user may mark this task as 'Success' and continue to core operation of adhoc DAG.
But we are more curious to know if we can get email upon DAG start OR at the start of each task run, this will help us understand and track activity and usage/command executed from adhoc DAG.
I'm attempting to utilize the KMS library in one of my DAGs which is running the PythonOperator, but I'm encountering an error in the airflow webserver:
details = "Cloud Key Management Service (KMS) API has not been used in project 'TENANT_PROJECT_ID' before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudkms.googleapis.com/overview?project='TENANT_PROJECT_ID' then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
The airflow webserver is unable to import my specific DAG from my host project to the tenant project (which is where the webserver is running). The DAGs runs with no problem as my host project is correctly setup, but not having the opportunity to monitor it in the UI is a huge drawback.
System specifications:
softwareConfig:
imageVersion: composer-1.8.2-airflow-1.10.3
pypiPackages:
google-cloud-kms: ==1.2.1
pythonVersion: '3'
It would be nice to be able to leverage KMS and the airflow ui, if not then I might have to add my secrets to cloud composer environmental variables (which is not preferred.)
Any known solutions on this?
The Airflow webserver is a managed component in Cloud Composer, so as other have stated, it runs in a tenant project that you (as the environment owner) do not have access to. There is currently no way to access this project.
If you have a valid use case for enabling extra APIs in the tenant project, I'd recommend submitting product feedback. You can find out how to do that from the product's public documentation (including if you want to submit a feature request to the issue tracker).
Alternatively, if you're willing to experiment, AIP-24 was an Airflow proposal called DAG database persistence that caches DAGs in the Airflow database, as opposed to parsing/importing them in the webserver (which is the reason why you need KMS in this situation). If you're using Composer 1.8.1+, then you can experimentally enable the feature by setting core.store_serialized_dags=True. Note that it's not guaranteed to work for all DAGs, but it may be useful to you here.
I am trying to create a Cloud Composer DAG to be triggered via a Pub/Sub message.
There is the following example from Google which triggers a DAG every time a change occurs in a Cloud Storage bucket:
https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf
However, on the beginning they say you can trigger DAGs in response to events, such as a change in a Cloud Storage bucket or a message pushed to Cloud Pub/Sub. I have spent a lot of time try to figure out how that can be done, but no result.
Can you please help or giving me some directions? Thanks!
There are 2 ways to trigger a DAG by a Pub/Sub events.
You can place a PubSubPullSensor at the beginning of your DAG. Your DAG will be triggered every time a Pub/Sub message can be pulled by the PubSubPullSensor. And it will execute the rest of the tasks in your DAG.
You can also create a Cloud Function that acts as Pub/Sub trigger. And put the Composer DAG triggering logic inside the Cloud Function trigger. When a message is published to the Pub/Sub topic, the Cloud Function should be able to trigger the Composer DAG.
To extend the public documentation page you already posted, you can configure a Cloud Function to run each time a message is published to a Cloud Pub/Sub topic. There is more information about that in another public documentation page.
To attach a function to a topic, set the --trigger-topic flag when deploying the function:
gcloud functions deploy $FUNCTION_NAME --runtime $RUNTIME --trigger-topic $TOPIC_NAME
Is there a place where I can find a list of permissions that I need to give to create the metadadb in mysql ? I dont want to do grant all for the airflow user.
I'm attempting to filter Dags by owner in my Airflow instance.
List of the steps i'm taking.
1- Configure my airflow.cfg as follow. portion of airflow.cfg config
file
2- My Dags receive a owner through default_args variable
3- Have an Airflow user named as the one pass to my Dags as owner
Still when I login with the user, I can see all Dags. Steps to create the user
Any Idea Why is not filtering Dags by owner. Thanks
Until Airflow Version 1.9.0:
The reason the user is still able to access all the dags is that it is a superuser by default. Unless you use LDAP for authentication, all the users created are superusers and Airflow have no other roles.
However, if you use LDAP, you can have superuser and dataprofiler roles.
This should change in upcoming versions of Airflow.