Debugging incomplete runs of a dag in mwaa - airflow

we deployed MWAA using mwaa_enviroment terraform resource. I added example dag but none of the runs complete. What do I need to debug what is happening?
First image is where say_hello task stays in up_for_retry status. Second image is when I click log from the say_hello task.
Our MWAA environment's role already has this in it's policy.
statement {
actions = [
"logs:*",
]
resources = ["arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:airflow-axiom_mwaa-*:*"]
}

Related

Airflow/Cloud Composer execute custom logic when DAG run is killed

I'm currently working on a custom sensor operator that launches a job in a remote server and periodically polls for its status. In some cases I may need to cancel/delete this DAG run programmatically with another program. Since the actual job is executed remotely, I want to be able to shut down the remote job before the DAG run is canceled/deleted. Since only the operator has the details of the remote job, I'm wondering if there is a way to trigger some custom logic to shut down the remote job right before the DAG run is going to be canceled/deleted?
Here is a summarized version of my operator code:
class JobOperator(base_sensor_operator.BaseSensorOperator):
def poke(self, context: Any) -> bool:
if !self._job_started:
self._job_info = LaunchJob(self._job_config)
else:
status = PollJob(self._job_info)
if status == "SUCCESS":
# some logic.
else:
# some logic.
I'm not sure if Airflow has such a trigger to execute some logic before DAG run deletion. Any help would be appreciated!

Airflow 2.2.4 manually triggered DAG stuck in 'queued' status

I have two DAGs in my airflow scheduler, which were working in the past. After needing to rebuild the docker containers running airflow, they are now stuck in queued. DAGs in my case are triggered via the REST API, so no actual scheduling is involved.
Since there are quite a few similar posts, I ran through the checklist of this answer from a similar question:
Do you have the airflow scheduler running?
Yes!
Do you have the airflow webserver running?
Yes!
Have you checked that all DAGs you want to run are set to On in the web ui?
Yes, both DAGS are shown in the WebUI and no errors are displayed.
Do all the DAGs you want to run have a start date which is in the past?
Yes, the constructor of both DAGs looks as follows:
dag = DAG(
dag_id='image_object_detection_dag',
default_args=args,
schedule_interval=None,
start_date=days_ago(2),
tags=['helloworld'],
)
Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
No, I trigger my DAGs manually via the REST API.
If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.
Here is the output of what this paragraph is showing me:
What is the best way to find the reason, why the tasks won't exit the queued state and run?
EDIT:
Out of curiousity I tried to trigger the DAG from within the WebUI and now both Runs executed (the one triggered from the WebUI failed, but that was expected, since there was no config set)

Datadog airflow alert when a dag is paused

We have encountered a scenario recently where someone mistakenly turned off a production dag, and we want to get alert whenever a dag is paused using datadog.
I have checked https://docs.datadoghq.com/integrations/airflow/?tab=host
But have not got any metric for dag to check if it is paused or not.
I can run a custom script in datadog as well.
One of the method is that I exec into postgres pod and get the list of active dags:
select * from dag where is_paused=true;
Or is there any other way I can get the unpaused dag list and also when new dag is added what is the best way to handle it.
I want the alert whenever a unpaused dag is paused.
If you are on Airflow 2 you can use the REST API to query for state of the DAG.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag
There is "is_paused" field.
And of you are not Airflow 2, you should be. Airflow 1.10 is end-of-life and will not receive any fixes (including critical security fixes) so you should upgrade as soon as you can.

Changes in DAG/Operator while DAG is running

What would happen if DAG changes while DAG is running (particularly if this is dynamic DAG)
What would happen if code of custom made operator is changed during some DAG run
So after some investigation I came to conclusion that DAG changes will be visible in current DAG run and that operators (and all other plugins) are not reloaded by default.
DAG changes during DAG run
If you remove task while DAG is running scheduler notice that, set status for that task to "Removed" while logging current line:
State of this instance has been externally set to removed. Taking the poison pill.
If you add new task it will also be visible and executed. It will be executed even if it is upstream task of task which is already finished
If task is changed, changes will be incorporated into current DAG run only if task is not already started execution or finished
All plugins (included operators) are not refreshed automatically by default. You can restart Airflow or you can set reload_on_plugin_change property in [webserver] section of airflow.cfg file to True

How to stop Airflow running tasks from 'off' dags

I created some DAGs, ran them and stopped them in the middle of their execution (with the OFF button).
The UI still shows 'Running tasks' for those stopped DAGs though.
I tried to set 'clear' to those tasks and not they are in blue, in 'shutdown state'.
I am wondering if those tasks are counted in the total of running tasks, and blocking other tasks from starting (with my current configuration, only 32 tasks can run in parallel). Is there a way to clean completely the DAGs that I don't need anymore and to make sure the tasks are not blocking anything and making Airflow slower?
Thanks!
You can delete all of the dag data from the dag_run and task_instances tables in the meta database.
You can also do this through the Airflow Webserver UI by navigating to
Browse -> DAG Runs
& Browse -> Task Instances
And deleting all the records relevant to the dag id.
One note though is that the task and DAG status fields on the main page may take a while to reflect the changes.

Resources