Hello All I am working on the airflow DAG where I have set retries as 1.The task fails in the first run but gets stuck in the second run in up for retry state.The expected result is that it should fail but remains stuck
When I see at the logs it shows Task exited with return code 1 but in the UI its in the up for retry state.
Not sure what is going wrong.Can anyone help me.
Thank you.
Related
I need some code to be executed only if a task has been manually cleared and restarted.
Therefore my question: How can I find out, if the task is currently on retry during its run? Does Airflow set some attributes of the task or DAG when I clear a task?
Thanks!
As screenshot:
Any idea why this would occur, and how would I go about troubleshooting it?
--
Update:
It's "none status" not "queued" as I originally interpreted
The DAG run occurred on 3/8 and last relevant commit was on 3/1. But I'm having trouble finding the same DAG run....will keep investigating
It's not Queued status. It's None status.
This can happen in one of the following cases:
The task drop_staging_table_if_exists was added after create_staging_table started to run.
The task drop_staging_table_if_exists used to have a different task_id in the past.
The task drop_staging_table_if_exists was somewhere else in the workflow and you changed the dependencies after the DAG run started.
Note Airflow currently doesn't support DAG versioning (It will be supported in future versions when AIP-36 DAG Versioning is completed) This means that Airflow constantly reload the DAG structure, so changes that you make will also be reflected on past runs - This is by design! and it's very useful for cases where you want to backfill past runs.
Either way, if you will start a new run or clear this specific run the issue you are facing will be resolved.
I am trying to use airflow trigger_dag dag_id to trigger my dag, but it just show me running state and doesn't do anymore.
I have searched for many questions, but all people just say dag id paused. the problem is my dag is unpaused, but also keep the running state.
Note: I can use one dag to trigger another one in Web UI. But it doesn't work in command line.
please see the snapshot as below
I had the same issue many times, The state of the task is not running, it is not queued either, it's stuck after we 'clear'. Sometimes I found the task is going to Shutdown state before getting into stuck. And after a large time the instance will be failed, still, the task status will be in white. I have solved it in many ways, I
can't say its reason or exact solution, but try one of this:
Try trigger dag command again with the same Execution date and time instead of the clear option.
Try backfill it will run only unsuccessful instances.
or try with a different time within the same interval it will create another instance which is fresh and not have the issue.
I have a DAG defined that contains a number of tasks, the last of which is only run if any of the previous tasks fail. This task simply posts to a Slack channel that the DAG run experienced errors.
What I would really like is if the message sent to the Slack channel contained the actual error that is logged in the task logs, to provide immediate context to the error and perhaps save Ops from having to dig through the logs.
Is this at all possible?
We have a long dag (~60 tasks), and quite frequently we see a dagrun for this dag in a state of failed. When looking at the tasks in the DAG they are all in a state of either success or null (i.e. not even queued yet). It appears that the dag has got into a state of failed prematurely.
Under what circumstances can this happen, and what should people do to protect against it?
If it's helpful for context we're running Airflow using the Celery executor and currently running on version 1.9.0. If we set the state of the dag in question back to running then all the tasks (and the dag as a whole) complete successfully.
The only way that a DAG can fail without a task failing is through something not connected to any of the tasks. Besides manual intervention (check that nobody on the team is manually failing the dags!) the only thing that fails DAGs outside of considering task states is the timeout checker.
This runs inside the scheduler, while considering whether it needs to schedule a new dag_run. If it finds another active run, which has been running longer than the dagrun_timeout argument of the DAG, then it will get killed. As far as I can see this isn't logged anywhere, so the best way to diagnose this is to look at the time that the DAG started and the time that the last task finished to see if it's roughly the length of dagrun_timeout.
You can see the code in action here: https://github.com/apache/incubator-airflow/blob/e9f3fdc52cb53f3ac3e9721e5128d17d1c5c418c/airflow/jobs.py#L800