I have been trying to look at how to use the User role. It says here, that it is for users with DAG ownership. So I created a couple of users with usernames ABC and XYZ and assigned them with User role.
Here's my DAG:
DEFAULT_ARGS = {
'owner': 'ABC',
...,
...
}
dag = DAG(
'test_dag',
default_args=DEFAULT_ARGS,
...,
...
)
When I logged in as XYZ, I expected the DAG test_dag to be hidden. If not hidden then at least to be in inactive state, since test_dag belongs to ABC. But as a XYZ, I'm able to operate test_dag.
Am I missing anything out here?
Make sure you are using the new RBAC UI. Verify that you have the following in your airflow.cfg file
[webserver]
rbac = True
authenticate = True
filter_by_owner = True
Are you using password authentication? If so, this is probably a bug, that is still not fixed: JIRA. It was also discussed here: How to allow airflow dags for concrete user(s) only
You can try to use LDAP or OAuth as you authentication method. This might resolve your problem.
Related
I've created a role "finance" that only has the following roles: role permissions
Note that this role only has "can read on DAG: read manifest".
When logging in with that specific role I'm able to see 2 DAG's.
Being read_manifest and fin_daily_product_sales.
Airflow overview
This role is able to execute the fin_daily_product_sales, within that DAG we use the TriggerDagRunOperator to trigger the read_manifest DAG.
code of triggerdagrunoperator
I would expect this to fail because the role only has read permission on the read_manifest DAG.
Why is this role able to trigger the DAG via the TriggerDagRunOperator?
I have a DAG in Airflow where the run is not scheduled, but triggered by an event. I would like to send an alert when the DAG did not run in the last 24 hours. My problem is I am not really sure which tool is the best for the task.
I tried to solve it with the Logs Explorer, I was able to write a quite good query filtering by the textPayload, but it seems that tool is designed to send the alert when a specific log is there, not when it is missing. (Maybe I missed something?)
I also checked Monitoring where I could set up an Alert when logs are missing, however in this case I was not able to write any query where I can filter logs by textPayload.
Thank you in advance if you can help me!
You could set up a separate alert DAG that notifies you if other DAGs haven't run in a specified amount of time? To get the last runtime of a DAG, use something like this:
from airflow.models import DagRun
dag_runs = DagRun.find(dag_id=dag_id)
dag_runs.sort(key=lambda x: x.execution_date, reverse=True)
Then you can use dag_runs[0] and compare with the current server time. If the date difference is greater than 24h, raise an alert.
I was able to do it in the monitoring. I did not need the filtering query which I used in the Logs Explorer. I needed to create an Alerting Policy, filtered by workflow_name, task_name and location. In the configure trigger section I was able to choose "Metric absence" with a 1 day absence time, so I resolved my old query with this.
Of course, it could be solved with setting up a new DAG, but setting up an Alerting Policy seems more easier.
We have encountered a scenario recently where someone mistakenly turned off a production dag, and we want to get alert whenever a dag is paused using datadog.
I have checked https://docs.datadoghq.com/integrations/airflow/?tab=host
But have not got any metric for dag to check if it is paused or not.
I can run a custom script in datadog as well.
One of the method is that I exec into postgres pod and get the list of active dags:
select * from dag where is_paused=true;
Or is there any other way I can get the unpaused dag list and also when new dag is added what is the best way to handle it.
I want the alert whenever a unpaused dag is paused.
If you are on Airflow 2 you can use the REST API to query for state of the DAG.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag
There is "is_paused" field.
And of you are not Airflow 2, you should be. Airflow 1.10 is end-of-life and will not receive any fixes (including critical security fixes) so you should upgrade as soon as you can.
When you trigger an Airflow DAG either through the UI (see screenshot) or the API (https://airflow.apache.org/docs/stable/rest-api-ref.html), you have the option of submitting a JSON configuration. However the usefulness of this isn't clearly documented as far as I can tell. I have two basic questions:
Is this intended for free-form configuration settings at the application level, or is this only for Airflow configuration variables?
If this is for free-form configuration settings, how (in my code) can I access whatever configuration was passed when the DAG was triggered?
Here is the screenshot where you can provide configuration when triggering a DAG:
Yes it is intended for Application level configuration.
Example -
{"appConfig":"Test"}
To read it in your DAG
def read_app_configuration(**kwargs):
print("Read App Config - Task : Start")
dag_run = kwargs['dag_run']
region = kwargs['dag_run'].conf.get('appConfig')
I use Airflow 1.10.4, created a role test_role and a user test_user with that role. I also created a DAG with access_control
with DAG(DAG_NAME,
schedule_interval='#daily',
default_args=default_args,
access_control={
'test_role': {'can_dag_read'},
},
) as dag:
DummyOperator(task_id='run_this_1') >> DummyOperator(
task_id='run_this_2') >> DummyOperator(task_id='run_this_3')
but when I login using that user, I didn’t see this DAG. anything wrong?
I guess access control parm is not released yet. Kindly refer airflow jira and change log.
As a workaround, we can go with webUI access control options.
If you find any promising solution please let me know as well. Thanks in advance!
I think you can try to execute "airflow sync_perm" on your Airflow WebServer.