I have a hourly shell script job that takes a date and hour as input params. The date and hour are used to construct the input path to fetch data for the logic contained in the job DAG. When a job fails and I need to rerun it (by clicking "Clear" for the failed task node to clean up the status to re-trigger a new run), how can I make sure the date and hour used for rerun are the same as the failed run since the rerun could happen in a different hour as the original run?
You have 3 options:
Hover to the failed task which is going to clear, in its displaying tag there will be a value with key Run:, it is its Execution date and time.
Click on the failed task which is going to clear, heading of its displaying popup which has the clear option will be [taskname] on [executiondatewithtime]
Open the task log, the first line after the attempts count will be included a string with format Executing <Task([TaskName]): task_id> on [ExecutionDate withTime]
Related
I have use case to create 2 tasks of BigqueryOperator that have same destination table but I need one to run daily, and the second one to be run manually just when I need.
Below are the illustration of Tree View
| task_3rd_adhoc
| task_3rd
|---- task_2nd
|---- task_1st_a
|---- task_1st_b
From example above, DAG are run daily. And I aim to the task will be:
task_1st_a and task_1st_b run first. Target table are:
project.dataset.table_1st_a with _PARTITIONTIME = execution date, and
project.dataset.table_1st_b with _PARTITIONTIME = execution date.
then task_2nd_a will run after task_1st_a and task_1st_b finish. BigQueryOperator use TriggerRule.ALL_SUCCESS. Target table is:
project.dataset.table_2nd with _PARTITIONTIME = execution date.
then task_3rd will run after task_2nd success. BigQueryOperator use TriggerRule.ALL_SUCCESS. Target table is:
project.dataset.table_3rd with PARTITIONTIME = D-2 from execution date.
task_3rd_adhoc will not run in daily job. I need this when I want to backfill table project.dataset.table_3rd. With target table:
project.dataset.table_3rd with _PARTITIONTIME = execution_date
But I still can't find what is the correct TriggerRule for step #4 above. I tried TriggerRule.DUMMY because I thought it can be used to set no Trigger, but task_3rd_adhoc also run in daily job when I tried create DAG above.
(based on this doc dependencies are just for show, trigger at will)
First of all, you've misunderstood TriggerRule.DUMMY.
Usually, when you wire tasks together task_a >> task_b, B would run only after A is complete (success / failed, based on B's trigger_rule).
TriggerRule.DUMMY means that even after wiring tasks A & B together as before, B would run independently of A (run at will). It doesn't mean run at your will, rather it runs at Airflow's will (it will trigger it whenever it feels like). So clearly tasks having dummy trigger rule will pretty much ALWAYS run, albeit, at an unpredictable time
What you need here (to have a particular task in DAG always but run it only when manually specified) is a combination of
AirflowSkipException
Variable
Here's roughly how you can do
A Variable should hold the command for this task (whether or not it should run). This Variable, of course, you can edit anytime from UI (thereby controlling whether or not that task runs in next DagRun)
In the Operator's code (execute() method for custom-operator or just python_callable in case of PythonOperator), you'll check value of Variable (whether or not the task is supposed to run)
Based on the Variable value, if the task is NOT supposed to run, you must throw an AirflowSkipException, so that the task will be marked at skipped. Or else, it will just run as usual
I have a sensor that waits for a file to appear in an external file system
The sensor uses mode="reschedule"
I would like to trigger a specific behavior after X failed attempts.
Is there any straightforward way to know how many times the sensor has already attempted to run the poke method?
My quick fix so far has been to push an XCom with the attempt number, and increase it every time the poke method returns False. Is there any built-in mechanism for this?
Thank you
I had a similar problem when sensor mode = "reschedule", trying to poke a different path to a file based on the current time without directly referencing pendulum.now or datetime.now
I used task_reschedules (as done in the base sensor operator to get try_number for reschedule mode https://airflow.apache.org/docs/apache-airflow/stable/_modules/airflow/sensors/base.html#BaseSensorOperator.execute)
def execute(self, context):
task_reschedules = TaskReschedule.find_for_task_instance(context['ti'])
self.poke_number = (len(task_reschedules) + 1)
super().execute(context)
then self.poke_number can be used within poke(), and current time is approximately execution_date + (poke_number * poke_interval).
Apparently, the XCom thing isn't working, because pushed XComs don't seem to be available between pokes; they always return undefined.
try_number inside task_instance doesn't help either, as pokes don't count as a new try number
I ended up computing the attempt number by hand:
attempt_no = math.ceil((pendulum.now(tz='utc') - kwargs['ti'].start_date).seconds / kwargs['task'].poke_interval)
The code will work fine as long as individual executions of the poke method don't last longer than the poke interval (which they shouldn't)
Best
I have a DAG without a schedule (it is run manually as needed). It has many tasks. Sometimes I want to 'skip' some initial tasks by changing the task state to SUCCESS manually. Changing task state of a manually executed DAG fails, seemingly because of a bug in parsing the execution_date.
Is there another way to individually setting task states for a manually executed DAG?
Example run below. The execution date of the Task is 01-13T17:27:13.130427, and I believe the milliseconds are not being parsed correctly.
Traceback
Traceback (most recent call last):
File "/opt/conda/envs/jumpman_prod/lib/python3.6/site-packages/airflow/www/views.py", line 2372, in set_task_instance_state
execution_date = datetime.strptime(execution_date, '%Y-%m-%d %H:%M:%S')
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 365, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: ..130427
It's not working from Task Instances page, but you can do it in another page:
- open DAG graph view
- select needed Run (screen 1) and click go
- select needed task
- in a popup window click Mark success (screen 2)
- then confirm.
PS it relates to airflow 1.9 version
Screen 1
Screen 2
What you may want to do to accomplish this is using branching, which, as the name suggests, allows you to follow different execution paths according to some conditions, just like an if in any programming language.
You can use the BranchPythonOperator (documented here) to attain this goal: the idea is that this operator is configured by a python_callable, a function that outputs the task_id to execute next (which should, of course, be a task which is directly downstream from the BranchPythonOperator itself).
Using branching will set the skipped tasks to the proper state automatically, as mentioned in the documentation:
All other “branches” or directly downstream tasks are marked with a state of skipped so that these paths can’t move forward. The skipped states are propagated downstream to allow for the DAG state to fill up and the DAG run’s state to be inferred.
The resulting DAG would look something like the following:
(source: apache.org)
Branching is documented here, on the official Apache Airflow documentation.
I have a control-m file watcher job which waits for a specific file, if file arrived with in specified time job ends ok, but I want to set job status ok when file does not arrived in specified time instead of keep waiting for the file, is this possible ? how to implement it ?
Thank you.
There are two ways of setting up a file-watcher.
File Watcher Job
Filewatcher Utility in Control M ctmfw
There are two consequences of FW jobs getting completed.
Giving the out condition to next job, so that the successer jobs start executing
Just to complete the job, so that this gets cleard off in New day process.
Now, if you want 1st consequence, then this is one option -
Assume that your FW job [ABC] runs between 0600 - 1800, and the out condition it passes to the successor job is ABC-OK. Successor job [DEF] runs on getting the condition ABC-OK; Keep a dummy job [ABC_DUMMY] which runs on 1805 that sets the same condition ABC-OK. So, once ABC_DUMMY completes, DEF will get the condition it is looking for and will execute.
If the file arrived early, then the FW job ABC will run, and it will set the condition ABC-OK. and DEF will start running.
In both this condition, ensure that once DEF is completed, ABC-OK is negated.
If you are looking for second consequence, then I believe as long as job is not failing, FW jobs will be in 'To Run' status, and this will get cleared off in the New Day Process.
Happy to help further. Just post your doubts here.
JN
Edit your FileWatcher job
In the EXECUTION tab:
Submit between "Enter your beginning time" to "enter your ending time"
In the STEPS tab:
ON (Statement=* CODE=COMPSTAT=0)
DO OK
DO CONDITION, NAME=FILE-FOUND
ON (Statement=* CODE=COMPSTAT!0)
DO OK
DO CONDITION, NAME=FILE-NOT-FOUND
Use wait until parameter in file watcher. Suppose if you want the job to watch for the file until 06:00 AM, mention 06:00 AM in wait until parameter mention 06:00.
Exactly at 06:00 AM the job will get fail if it doesn't find the file. then u can use step tab to set the job okay with either of the following options.
Option 1:
ON(ON (Statement=* CODE=COMPSTAT!0))
DO OK
or
Option 2:
ON( (statement=* CODE=NOTOK))
DO OK
I am using LSF bsub command to submit jobs in Unix environment. However the LSF job is waiting for child jobs to finish.
Here is an example (details about sample scripts below):
Without LSF: If I submit parent.ksh in Unix without using LSF, i.e in command prompt I type ./parent.ksh, parent.ksh get's submitted and get's completed in a second without waiting for child jobs script1.ksh and script2.ksh since these jobs have been submitted in background mode. This is typical behaviour in Unix.
With LSF: However if I submit my parent.ksh using LSF, i.e. bsub parent.ksh, parent.ksh wait for 180 seconds(thats the longest time taken by child number 2 i.e. script2.ksh) after submission. Please note I have expcluded time taken by job in pending status.
This is something I was not expecting, how can I ensure this does not happen?
I had checked, script1.ksh and script2.ksh was invoked in both cases.
parent.ksh:
#!/bin/ksh<br>
/abc/def/script1.ksh &<br>
/abc/def/script1.ksh &<br><br>
script1.ksh:
#!/bin/ksh<br>
sleep 80<br><br>
script2.ksh:
#!/bin/ksh<br>
sleep 180
I guess the reason is that LSF tracks process tree of your job, thus LSF job only completes till these two background processes exits. So you can try to create a new process group for the background process under a new session.