How to check airflow scheduler healthy and status? - airflow

i have configured apache-airflow with postgreSSQL database and in my airflow i have running 1 dag, now its running successfully but if scheduler have any issue means how i get that and what is the way to check that, kindly give some idea and solution.

Airflow exposes /health endpoint for this purpose
Also do check REST API reference, it has many useful endpoints for several common day-to-day stuff like triggering a DAG or returning the latest runs of DAGs
UPDATE-1
Apparently just because scheduler is running, doesn't necessarily mean that it will actually trigger a DAG; for e.g. this
you can think of it like there could be internal bugs / interesting corrupt internal states of Airflow that may cause it to not trigger DAGs
Thus people have gone a step ahead and they schedule a canary DAG (a dummy DAG which does nothing but runs every few minutes). Then by monitoring metrics (think prometheus) of canary DAG, they can reliably affirm if Airflow scheduler is working as expected or not.

Related

How can we add long running monitor feature to Airflow in simplest way?

I tried to explore "How can we add long running monitor feature to Airflow in simplest way?" via a plugin or similar like that?
One of the logic can be considering last 15 days runtime (sorting the data and excluding the edge values)
Below is the only question I found, which has not been answered adequately. Need your guidance on stable and simple solutions?
Monitoring long lasting tasks in Airflow
Airflow dags - reporting of the runtime for tracking purposes
Any way of monitoring Airflow DAG's execution time?

Does it makes sense to pause a DAG that has Schedule set to None?

I'm new to Airflow and have a hard time to figure for what Pause a DAG is used for.
If our dags are set only for manual trigger does it makes sense to pause these kind of DAGs?
Certainly! Airflow DAGs that are configured with a schedule_interval of None could be executed by manual intervention through the UI, getting triggered by another DAG via the TriggerDagRunOperator, or even through an API call. If any of these actions happen, you could prevent the DAG from running by pausing it.
Another situation for pausing DAGs could be if a DAG fails frequently for whatever reason or has some flawed logic which requires manual intervention to fix data affected by the DAG processing, you can pause the DAG to keep it from executing even if the DAG has a regular schedule_interval.
There are other scenarios but pausing DAGs is helpful if you want to prevent DAG execution caused by an expected or even unexpected triggering.

Can I restrict max_active_runs of a DAG only for runs with the same dag_run.conf?

I have a DAG that runs in a multi tenant scenario. The tenant ID get's set in the dag_run.conf when the DAG is triggered. I want to ensure that there is at most one active run per tenant at a time, but potentially many active runs simultaneously across all tenants.
So far I have found the max_active_runs setting, but that would require me to actually setup one DAG per tenant, which I am trying to avoid.
Is there a way to achieve this in airflow or am I approaching the problem in the wrong way?
You are using dag_run.conf which means that you are triggering your dags manually. Currently there is a bug (Airflow 2.0.1) max_active_runs isn't respected for manual runs (see GitHub issue).

Airflow dag runs most of the time, but "freezes" every now and then. What is the best way to debug this?

One of my Airflow DAGs runs without any issues most of the time. However, every now and then (every >3 hours), it "freezes".
In this state, its tasks are not "queued" (see attached image), and the timeouts which exist on specific tasks also do not activate. The only way of getting out of such a scenario is my manually marking that run as a fail.
This failure is always followed up by another immediate failure (see blank cells in the image).
What should I look for in the logs and/or what are other ways of debugging this?
Found the issue, it was just some tasks running longer than the schedule and hence double running in parallel.
I was hoping that in such cases airflow would provide some kind of feedback in the logs or UI, but that isn't the case.
Resolved.

Airflow python client

We have some applications running and we want to start using airflow. From the documentation it seems that the only way to start a DAG is over command line. Is this true?
For example we have a flask server running and we want to start some workflow controlled by airflow. How can we achieve this? Is there an API to trigger e.g.: "Run DAG now with parameters x,y,h"?
There are couple of ways to achieve this with airflow. It depends on your situation which one or if any at all is suitable for you. Two suggestions that come to my mind:
Use Triggered DAGs. Python Jobs running in the Background may trigger a DAG to be executed in case an event happens. Have a look at the example_trigger_controller_dag.py and example_trigger_target_dag.py in the repository: GitHub Airflow
Use SensorTasks: There are some predefined sensors available which you can use to listen for specific events in a datasource f.e. If the existing once do not satisfy your need, airflow should be adaptable enough to let you implement your own sensor Airflow Sensor
After reading your question i understand your usecase as:
That you wish to run/trigger a DAG from HTTP server
--> you can just use the provided Airflow webserver(localhost:8080/) from which you can trigger/run the dag manually
Also You Can go HERE ,which is still in experimentation mode and use the api as provided
Please elaborate more so as to understand question better.

Resources