I'm testing kubernetes pod operator running
Airflow 2.2.1 with kubernetes cnf 2.1.0 on my minikube.
I'm having issues trying to spaw a mock task:
init_environments = [
k8s.V1EnvVar(name='AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='POD_TEMPLATE_FILE', value='""')]
other_task = KubernetesPodOperator(
dag=dag,
task_id="ingestion_kube",
env_vars=init_environments,
cmds=["bash", "-cx"],
arguments=["echo 10 \n\n\n\n\n\n\n"],
name="base",
image="meltano-flieber",
image_pull_policy="IfNotExists",
in_cluster=True,
namespace="localkubeflow",
is_delete_operator_pod=False,
pod_template_file=None,
get_logs=True
)
The task pod from KubernetesExecutor that executes the airflow completes successfully but there is no sign of the pod operator task.
Relevant logs of airflow task:
[2021-11-23 21:30:28,213] {dagbag.py:500} INFO - Filling up the DagBag from /opt/***/dags/meltano/meltano_ingest_pendo.py
Running <TaskInstance: meltano_tasks.ingestion_kube manual__2021-11-23T21:30:10.788963+00:00 [queued]> on host meltanotasksingestionkube.996b19ee10464c2f8683cdfc8ce7303
And in airflow tha task looks like if failed but without any relevant logs, anybody have something related to that or has any suggestions?
Related
Hi I'm currently running airflow on a Dataproc cluster. My DAGs used to run fine but facing this issue where tasks are ending up in 'retry' state without any logs when I click on task instance -> logs on airflow UI
I see the following error in terminal where I started the airflow webserver
2022-06-24 07:30:36.544 [ERROR] Executor reports task instance
<TaskInstance: **task name** 2022-06-23 07:00:00+00:00 [queued]> finished (failed)
although the task says its queued. Was the task killed externally?
None
[2022-06-23 06:08:33,202] {models.py:1758} INFO - Marking task as UP_FOR_RETRY
2022-06-23 06:08:33.202 [INFO] Marking task as UP_FOR_RETRY
What I tried so far
restarted webserver
Started server from 3 different ports
re-ran backfill command with 3 different timestamps
deleted dag runs for my dag, created a new dag run and then re-ran backfill command
cleared the PID as mentioned here How do I restart airflow webserver? and restarted the webserver
None of these worked. This issue is persistent for the past two days, appreciate any help here.At this point I'm guessing this is to do with a shared DB but not sure how to fix this.
<<update>> So what I also found is these tasks eventually go to success or failure state. when that happens the logs are available, but still no logs for the retry attempts in $airflow_home or our remote directory
The issue was there was another celery worker listening on the same queue. since this second worker was not configured properly it was failing the task and not writing the logs to remote location.
Airflow Version: 2.2.4
Airflow running in EKS
Issue: Logs not showing in the UI while tasks running
The issue with the logs is that airflow is only writing the logs to the log file rather than standard out as well. This is what's preventing us from being able to see the logs in the web UI while the task is running.
When i get into the pod , i do see log inside the pod
Is there any solution to finding the setting or configuration needed to output to both?
I log as below
kubectl logs detaskdate0.3d55e5ba89ca4ad491bb3e1cadfdaaec -n airflow
Added new context arn:aws:eks:us-west-2:XXXXXXXX:cluster/us-west-2-airflow-cluster to /home/airflow/.kube/config
[2022-05-20 19:56:43,529] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags/tss/dq_tss_mod_date_dag.py
I'm having trouble getting the LocalExecutor to work.
I created a postgres database called airflow and granted all privileges to the airflow user. Finally I updated my airflow.cfg file:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
executor = LocalExecutor
# The SqlAlchemy connection string to the metadata database.
# SqlAlchemy supports many different database engine, more information
# their website
sql_alchemy_conn = postgresql+psycopg2://airflow:[MY_PASSWORD]#localhost:5432/airflow
Next I ran:
airflow initdb
airflow scheduler
airflow webserver
I thought it was working, but I noticed my dags were taking a long time to finish. Upon further inspection of my log files, I noticed that they say Airflow is using the SequentialExecutor.
INFO - Job 319: Subtask create_task_send_email [2020-01-07 12:00:16,997] {__init__.py:51} INFO - Using executor SequentialExecutor
Does anyone know what could be causing this?
I upgraded the airflow version from 1.7.3 to 1.10.1. After up-gradation of the scheduler, webserver and workers, the dags have stopped working showing below error on scheduler-
Either the dag did not exist or it failed to parse.
I have not made any changes to the config. While investigating the issue the scheduler logs shows the issue. Earlier the scheduler run the task as -
Adding to queue: airflow run <dag_id> <task_id> <execution_date> --local -sd DAGS_FOLDER/<dag_filename.py>
While now it is running with absolute path -
Adding to queue: airflow run <dag_id> <task_id> <execution_date> --local -sd /<PATH_TO_DAGS_FOLDER>/<dag_filename.py>
PATH_TO_DAGS_FOLDER is like /home/<user>/Airflow/dags...
which is same as what it is pushing it to workers by since worker is running on some other user it is not able to find the dag location specified.
I am not sure how to tell the worker to look in it's own airflow home dir and not the scheduler one?
I am using mysql as backend and rabbitmq for message passing.
Airflow stopped running tasks all of a sudden. Below are all running
airflow scheduler
airflow webserver
airflow worker
webui message
All dependencies are met but the task instance is not running. In most
cases this just means that the task will probably be scheduled soon
unless:
- The scheduler is down or under heavy load
If this task instance does not start soon please contact your Airflow
administrator for assistance.
Scheduler seems to be in a loop, keeps repeating the below messages. WebUI shows tasks are in queued state. Tried restarting the scheduler, didn't help.
[2018-11-17 22:03:45,809] {{jobs.py:1607}} DEBUG - Starting Loop...
[2018-11-17 22:03:45,809] {{jobs.py:1627}} INFO - Heartbeating the process manager
[2018-11-17 22:03:45,810] {{jobs.py:1662}} INFO - Heartbeating the executor
[2018-11-17 22:03:45,810] {{base_executor.py:103}} DEBUG - 124 running task instances
[2018-11-17 22:03:45,810] {{base_executor.py:104}} DEBUG - 0 in queue
[2018-11-17 22:03:45,810] {{base_executor.py:105}} DEBUG - 76 open slots
[2018-11-17 22:03:45,810] {{base_executor.py:132}} DEBUG - Calling the <class 'airflow.executors.celery_executor.CeleryExecutor'> sync method
[2018-11-17 22:03:45,810] {{celery_executor.py:80}} DEBUG - Inquiring about 124 celery task(s)
Airflow setup:
apache-airflow[celery, redis, all]==1.9.0
I also checked these posts but didn't help me:
Airflow 1.9.0 is queuing but not launching tasks
Airflow tasks get stuck at "queued" status and never gets running
Problem solved. This is a problem when you create your build on or after 2018-11-15 Turns out apache-airflow[celery, redis, all]==1.9.0 takes the latest version of redis-py 3.0.1 which does not work with celery 4.2.1.
Solution is to use redis-py 2.10.6
redis==2.10.6
apache-airflow[celery, all]==1.9.0