airflow jobs stucked in running status - airflow

There is a strange phenomenon when using air flow. When I run the airflow scheduler, the unidentified jobs suddenly become running, and when I exit the scheduler, it turns into success.
I tried airflow db reset, but I keep getting that job if I just run the scheduler. can you tell me why?

Related

Airflow scheduler does not start after Google Composer update

I have a composer-2.0.25-airflow-2.2.5. I need to update the number of workers and environment variables in an environment that is already running. After update the environment the sheduler monitoring is unhealthy and the pod continues restarting alone. Sometimes appears the CrashLoopBackOff so indicates that a container is repeatedly crashing after restarting.
I looked the info of the pod where I saw the scheduler restarts.
I need the environment to continue running after the updates.
Do you have any idea about this issue?

if the AirFlow scheduler crash, can AirFlow restart the in progress job in another scheduler container?

If the AirFlow scheduler crash, can AirFlow restart the in progress job in another scheduler container? Or, do it have to rerun the job from the beginning?
I am considering to use AirFlow to implement on-demand nearline processing and wish to know the reliability aspects. But I could not confirm this point from the docs.

Airflow DAG getting psycopg2.OperationalError when running tasks with KubernetesPodOperator

Context
We are running Airflow 2.1.4. on a AKS cluster. The Airflow metadata database is an Azure managed postgreSQL(8 cpu). We have a DAG that has like 30 tasks, each task use a KubernetesPodOperator (using the apache-airflow-providers-cncf-kubernetes==2.2.0) to execute some container logic. Airflow is configured with the Airflow official HELM chart. The executor is Celery.
Issue
Usually the first like 5 tasks execute successfully (taking like 1 or 2 minute each) and get marked as done (and colored green) in the Airflow UI. The tasks after that are also successfully executed on AKS, but Airflow not marked as completed in Airflow as such. In the end this leads up to this error message and marking the already finished task as a fail:
[2021-12-15 11:17:34,138] {pod_launcher.py:333} INFO - Event: task.093329323 had an event of type Succeeded
...
[2021-12-15 11:19:53,866]{base_job.py:230} ERROR - LocalTaskJob heartbeat got an exception
psycopg2.OperationalError: could not connect to server: Connection timed out
Is the server running on host "psql-airflow-dev-01.postgres.database.azure.com" (13.49.105.208) and accepting
TCP/IP connections on port 5432?
Similar posting
This issue is also described in this post: https://www.titanwolf.org/Network/q/98b355ff-d518-4de3-bae9-0d1a0a32671e/y Where in the post a link to Stackoverflow does not work anymore.
The metadata database (Azure managed postgreSQL) is not overloading. Also the AKS node pool we are using does not show any sign of stress. It seems like the scheduler cannot pick up / detect a finished task after like a couple of tasks have run.
We also looked at several configuration option as stated here
We are looking now for a number of days now to get this solved but unfortunately no success.
Anyone any idea's what the cause could be? Any help is appreciated!

Airflow dag dependencies

I have a airflow dag-1 that runs approximately for week and dag-2 that runs every day for few hours. When the dag-1 is running i cannot have the dag-2 running due to API limit rate (also dag-2 is supposed to run once dag-1 is finished).
Suppose the dag-1 is running already, then dag-2 that is supposed to run everyday fails, is there a way i can schedule the dag dependencies in a right way?
Is it possible to stop dag-1 temporarily(while running) when dag-2 is supposed to start and then run dag-1 again without manual interruption?
One of the best way is to use the defined pool ..
Lets say if you have a pool named: "specefic_pool" and allocate only one slot for it.
Specify the pool name in your dag bash command (instead of default pool, please use newly created pool) By that way you may over come of running both the dags parallel .
This helps whenever Dag1 is running Dag2 will never be triggered until pool is free or if the dag2 picked the pool until dag2 is completed dag1 is not going to get triggered.

Does Airflow restart affect current running jobs?

This seems like a mundane question but just to be on the safe side,
what are the effects of restarting the airflow service on the jobs which are currently running?
If you only restart the airflow webserver/scheduler processes then the running jobs are not affected. However restarting the worker process kills the job (killed as zombie - http://airflow.incubator.apache.org/concepts.html#zombies-undeads) and then it may or may not be retried accordingly to the dag/task rules.

Resources