running task with specific schedule interval in airflow dag - airflow

We have two dags : dag1 and dag2. Dag1 should be running for 4 times in a dag and Dag2 should be running only once as per first instance of Dag1 run.
Dag 1 - Schedule interval: 5-20/5 13 * * *
Dag1 as per above schedule interval Dag1 will be running for 4 times but it should trigger only once for Dag2. We are using trigger dag run operator to trigger dag2 from dag1
Is there any way to achieve. I tried External task sensor, branch operator to have exact time dependency and some file dependency but still it may impact in some or other case.
Is there any simpler way.

Related

airflow schedule issue - diff between schedule time and run time

I set the schedule like '* 1,5,10,18 * * *' in airflow.
But 18 in yesterday wasn't executed. So I checked the logs.
then I found the job scheduled in 10 executed in 18.
I want to know why and how can I fix.
Note that if you run a DAG on a schedule_interval of one day, the run
stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In
other words, the job instance is started once the period it covers has
ended.
~Airflow scheduler docs
So as you can see it will be scheduled after the period - 10->18 is closed, so after 18:00. Check task before, it should be ran just after 10:00.
You don't understand how the airflow scheduler works.
Airflow as DAG Run (data_interval_start) always takes the date of the previous execution, therefore the task performed at your place at 18, has DAG Run at 10. The same as the next task, call at 10 will have a DAG Run at 18 the previous day.

airflow DAG triggering for time consuming runs

I am completely new to Airflow and am trying to grasp the concepts of scheduling and default args.
I have a scenario where I would like to schedule my DAG hourly to do some data transfer task between a source and a database. What I am trying to understand is, lets say one of my DAG runs has triggered at 00:00 AM. Now if it takes more than an hour for this run to successfully complete all of its tasks (say 1 hour 30 min), does it mean that the next DAG run that was supposed to be triggered at 01:00 AM will NOT get triggered but the DAG run from 02:00 AM will get triggered?
Yes.
In order to avoid, you need catchup=True for the DAG object.
Reference : https://airflow.apache.org/docs/apache-airflow/stable/dag-run.html
(Search for Catchup)
Airflow Scheduler used to monitor all DAGs and tasks in Airflow.Default Arguments can be used to create tasks with default parameters in DAG.
The first DAG runs based on start_date and runs based on scheduled_interval sequentially. Scheduler doesn’t trigger tasks until the period has ended.For your requirement you can set dag.catchup to true as to run the DAG for each completed interval and scheduler will execute them sequentially.Catchup is used to start the DAG run since the last data interval which has not started for any data interval.

Airflow DAG missing runs in hourly schedule

I'm running into a rather bizarre issue of an airflow hourly Dag missing a scheduled run every day.
This hourly dag is schedule for 0 * * * *, yet it does not generate a run for 0500 hours
I have another daily Dag that has a crossdag dependency to a task in that hourly dag. The daily dag is schedule for 0 5 * * *
I don't know what could cause Airflow to skip scheduling an entire run. My dags don't have any settings for concurrency or max_active_runs. Don't have task level concurrency limits either (pool, task_concurrency). Even if they are set, I believe these parameters would only queue the execution of dags or tasks, not skipping a schedule run entirely(maybe someone confirm)?
I'm guessing it would be something in the global Airflow settings that limits the number of active dag runs being scheduled at the same time. Any suggestion what it might be? The environment has been set up by our infrastructure team a long time ago so I would need to provide some instructions for them to take a look
Airflow version 1.10.12

resuming a dag runs immediately with the last scheduled execution

After pausing a dag for 2-3 days, when resuming the dag with catchup=False, will run immediately with the last execution.
For example a dag that sends data to an external system is scheduled to run everyday on 19:00.
Stopping the dag for 4 days and enabling on 11:00 would run the dag immediately with yesterdays execution and then again on 19:00 for that day.
In this case the dag runs two times on the day it's resumed.
Is it possible to resume the dag and the first run will happen actually on 19:00?
With default operators, we cannot achieve what you are expecting. Closest to that, what airflow has is LatestOnlyOperator. This is one of the simplest Operators and needs only following configuration
latest_only = LatestOnlyOperator(task_id='latest_only')
This would let the downstream tasks run only if the current time falls between current execution date and next execution date. So, in your case, it would skip execution of three days, but yesterday's run would trigger the jobs.

DAG Running stop once it reach 50 successfully

I am using latest AirFlow. I ran a dag file which just executes print and sleeps command for 10 seconds.
Once that dag complete 50 successful runs it automatically stoped. When I restart the web server , scheduler and worker then it again runs for another 50. I did this way 2 times and the same result.
There is some issue with the schedule.
Most likely reason is the schedule is made of Catchup or Backfill task with specific start_date and end_date parameters .
for more details https://airflow.apache.org/docs/stable/dag-run.html

Resources