I recently upgraded my airflow to 2.2.4. Now, from my admin console when I hit trigger DAG, it goes to queued. It does not immediately run but runs after 7 hours. The airflow scheduler and server are in the PDT timezone (7 hours behind UTC) so why is it triggering the DAG 'now' as UTC 'now' and not the set PDT timezone?
My airflow.cfg has default_timezone and default_ui_timezone set to America/Los_Angeles
You have to set default_timezone in the Airflow configuration. By default its value is UTC.
See : https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#default-timezone
Related
I set the schedule like '* 1,5,10,18 * * *' in airflow.
But 18 in yesterday wasn't executed. So I checked the logs.
then I found the job scheduled in 10 executed in 18.
I want to know why and how can I fix.
Note that if you run a DAG on a schedule_interval of one day, the run
stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In
other words, the job instance is started once the period it covers has
ended.
~Airflow scheduler docs
So as you can see it will be scheduled after the period - 10->18 is closed, so after 18:00. Check task before, it should be ran just after 10:00.
You don't understand how the airflow scheduler works.
Airflow as DAG Run (data_interval_start) always takes the date of the previous execution, therefore the task performed at your place at 18, has DAG Run at 10. The same as the next task, call at 10 will have a DAG Run at 18 the previous day.
I don’t understand why SLA’s sometimes are triggered.
An example is a dag with an SLA of 'sla': timedelta(hours=1)
I received an email with this:
Here’s a list of tasks that missed their SLAs:
APP_task_group.copy_events on 2022-05-08T03:00:00+00:00
But checking the graph view in the UI I see it started at 04:28 and finished at 04:46, which is range of one hour.
I understand that the SLA is starting in the real UTC time when the dag starts, so the dag with id 2022-05-08T03:00:00+00:00 has between 04:00 and 04:59 UTC to run without raising an SLA.
Am I wrong?
What does explain why this SLA is raising?
After pausing a dag for 2-3 days, when resuming the dag with catchup=False, will run immediately with the last execution.
For example a dag that sends data to an external system is scheduled to run everyday on 19:00.
Stopping the dag for 4 days and enabling on 11:00 would run the dag immediately with yesterdays execution and then again on 19:00 for that day.
In this case the dag runs two times on the day it's resumed.
Is it possible to resume the dag and the first run will happen actually on 19:00?
With default operators, we cannot achieve what you are expecting. Closest to that, what airflow has is LatestOnlyOperator. This is one of the simplest Operators and needs only following configuration
latest_only = LatestOnlyOperator(task_id='latest_only')
This would let the downstream tasks run only if the current time falls between current execution date and next execution date. So, in your case, it would skip execution of three days, but yesterday's run would trigger the jobs.
I have my DAG like this,
dag = DAG('testing',description='Testing DAG',schedule_interval='0 4,15 * * *')
t1 = BashOperator(task_id = 'testing_task',bash_command = 'python /home/ubuntu/airflow/dags/scripts/test.py',dag=dag, start_date=datetime(2018, 2, 8))
I want to schedule it to run every day 3PM and 4AM, I changed my AWS instance local timezone to NZ.
In the airflow web UI, in the top right, i still see airflow showing UTC time. However if i see the last run(my manual run through UI) for my DAG, it shows NZ time. So i assumed the schedular works in local timezone (NZ time), so tried to schedule on that timezone, but it was not triggered on time. Job did not work on time. How to solve this?
Thanks,
Right now (as of Airflow 1.9) Airflow only operates in UTC. The "solution" for now is to put the schedule in UTC -- as horrible as that is.
The good news is that on the master branch (which will be in the next non-point release, Airflow 1.10) is support for Timezones! https://github.com/apache/incubator-airflow/blob/772dbae298680feb9d521e7cd5526f4059d7cb69/docs/timezone.rst
I have a Controller DAG (SampleController) that will call a Target DAG (SampleWait), both with a start_date of datetime.now() and a schedule_interval of None.
I trigger the Controller DAG from command line or the webserver UI, and it will run right away with an execution date of "right now" in my system time zone. In the screenshot, it is 17:25 - which isn't my "real" UTC time; it is my local time.
However, when the triggered DAG Run for the target is created, the execution date is "adjusted" to the UTC time, regardless of how I try to manipulate the start_date - it will ALWAYS be in the future (21:25 here). In my case, it is four hours in the future, so the target DAG just sits there doing nothing. I actually have a sensor in the Controller that waits for the Target DAG to finish, so that guy is going to be polling for no reason too.
Even the examples in the Github for the Controller-Target pattern exhibit the exact same behavior when I run them and I can't find any proper documentation on how to actually handle this issue, just that it is a "pitfall".
It is strange that Airflow seems to be aware of my time zone and adjusts within one operator, but not when I do it from command line or the web server UI.
What gives?