I use Airflow 1.10.4, created a role test_role and a user test_user with that role. I also created a DAG with access_control
with DAG(DAG_NAME,
schedule_interval='#daily',
default_args=default_args,
access_control={
'test_role': {'can_dag_read'},
},
) as dag:
DummyOperator(task_id='run_this_1') >> DummyOperator(
task_id='run_this_2') >> DummyOperator(task_id='run_this_3')
but when I login using that user, I didn’t see this DAG. anything wrong?
I guess access control parm is not released yet. Kindly refer airflow jira and change log.
As a workaround, we can go with webUI access control options.
If you find any promising solution please let me know as well. Thanks in advance!
I think you can try to execute "airflow sync_perm" on your Airflow WebServer.
Related
I have two DAGs in my airflow scheduler, which were working in the past. After needing to rebuild the docker containers running airflow, they are now stuck in queued. DAGs in my case are triggered via the REST API, so no actual scheduling is involved.
Since there are quite a few similar posts, I ran through the checklist of this answer from a similar question:
Do you have the airflow scheduler running?
Yes!
Do you have the airflow webserver running?
Yes!
Have you checked that all DAGs you want to run are set to On in the web ui?
Yes, both DAGS are shown in the WebUI and no errors are displayed.
Do all the DAGs you want to run have a start date which is in the past?
Yes, the constructor of both DAGs looks as follows:
dag = DAG(
dag_id='image_object_detection_dag',
default_args=args,
schedule_interval=None,
start_date=days_ago(2),
tags=['helloworld'],
)
Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
No, I trigger my DAGs manually via the REST API.
If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.
Here is the output of what this paragraph is showing me:
What is the best way to find the reason, why the tasks won't exit the queued state and run?
EDIT:
Out of curiousity I tried to trigger the DAG from within the WebUI and now both Runs executed (the one triggered from the WebUI failed, but that was expected, since there was no config set)
We have encountered a scenario recently where someone mistakenly turned off a production dag, and we want to get alert whenever a dag is paused using datadog.
I have checked https://docs.datadoghq.com/integrations/airflow/?tab=host
But have not got any metric for dag to check if it is paused or not.
I can run a custom script in datadog as well.
One of the method is that I exec into postgres pod and get the list of active dags:
select * from dag where is_paused=true;
Or is there any other way I can get the unpaused dag list and also when new dag is added what is the best way to handle it.
I want the alert whenever a unpaused dag is paused.
If you are on Airflow 2 you can use the REST API to query for state of the DAG.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag
There is "is_paused" field.
And of you are not Airflow 2, you should be. Airflow 1.10 is end-of-life and will not receive any fixes (including critical security fixes) so you should upgrade as soon as you can.
I need to restart Airflow. I want to make sure I do it when it's idle, so I that I don't interrupt a job by restarting the worker component of Airflow.
How do I see what DAGs are running?
I don't see anything in the UI that would list currently running DAGs.
I don't see any command in the airflow CLI to list currently running DAGs.
I found airflow shell that lets me connect to the DB, but I don't know enough about Airflow internals to know where to look to see what's running.
You can also query the database to get all running tasks at once:
select * from task_instance where state='running'
You can use the command line command airflow jobs check which would return "No alive jobs found." in the event no jobs are running.
I found it... it's in the UI, on the DAGs page, it's the second circle under "Recent Tasks":
When you trigger an Airflow DAG either through the UI (see screenshot) or the API (https://airflow.apache.org/docs/stable/rest-api-ref.html), you have the option of submitting a JSON configuration. However the usefulness of this isn't clearly documented as far as I can tell. I have two basic questions:
Is this intended for free-form configuration settings at the application level, or is this only for Airflow configuration variables?
If this is for free-form configuration settings, how (in my code) can I access whatever configuration was passed when the DAG was triggered?
Here is the screenshot where you can provide configuration when triggering a DAG:
Yes it is intended for Application level configuration.
Example -
{"appConfig":"Test"}
To read it in your DAG
def read_app_configuration(**kwargs):
print("Read App Config - Task : Start")
dag_run = kwargs['dag_run']
region = kwargs['dag_run'].conf.get('appConfig')
I have been trying to look at how to use the User role. It says here, that it is for users with DAG ownership. So I created a couple of users with usernames ABC and XYZ and assigned them with User role.
Here's my DAG:
DEFAULT_ARGS = {
'owner': 'ABC',
...,
...
}
dag = DAG(
'test_dag',
default_args=DEFAULT_ARGS,
...,
...
)
When I logged in as XYZ, I expected the DAG test_dag to be hidden. If not hidden then at least to be in inactive state, since test_dag belongs to ABC. But as a XYZ, I'm able to operate test_dag.
Am I missing anything out here?
Make sure you are using the new RBAC UI. Verify that you have the following in your airflow.cfg file
[webserver]
rbac = True
authenticate = True
filter_by_owner = True
Are you using password authentication? If so, this is probably a bug, that is still not fixed: JIRA. It was also discussed here: How to allow airflow dags for concrete user(s) only
You can try to use LDAP or OAuth as you authentication method. This might resolve your problem.