Scenario : Two jobs are in a BOX_JOB - JOB_1 and JOB_2 , JOB_2 is dependent on JOB_1 Success and should fail whenever JOB_1 fails.
Observation: Once the BOX_JOB gets activated JOB_1 and JOB_2 both gets activated , but JOB_2 continues to be activated even when JOB_1 has completed as failure
JOB_2 JIL
job_type: cmd
....
condition: s(JOB_1)
box_name: BOX_JOB
Check out the box_terminator attribute. Set box_terminator: 1 (or y) in the JIL for JOB_1 and it will terminate BOX_JOB if JOB_1 fails and JOB_2 should go to INACTIVE.
Check out page 204 (among others) of the Autosys Reference Guide (pdf) for more.
Related
I have a task in Airflow 2.1.2 which is finishing with success status, but after that log shows a sigterm:
[2021-12-07 06:11:45,031] {python.py:151} INFO - Done. Returned value was: None
[2021-12-07 06:11:45,224] {taskinstance.py:1204} INFO - Marking task as SUCCESS. dag_id=DAG_ID, task_id=TASK_ID, execution_date=20211207T050000, start_date=20211207T061119, end_date=20211207T061145
[2021-12-07 06:11:45,308] {local_task_job.py:197} WARNING - State of this instance has been externally set to success. Terminating instance.
[2021-12-07 06:11:45,309] {taskinstance.py:1265} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2021-12-07 06:11:45,310] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 6666
[2021-12-07 06:11:45,310] {taskinstance.py:1284} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-12-07 06:11:45,362] {process_utils.py:66} INFO - Process psutil.Process(pid=6666, status='terminated', exitcode=1, started='06:11:19') (6666) terminated with exit code 1
As you can see the first row returns Done, and the previous rows of this log showed that all script worked fine and data has been inserted in the Datawarehouse.
In the line number 8 it shows SIGTERM due some external trigger mark it as success but I am sure that nobody used the API, or CLI to mark it as success neither the UI.
Any idea how to avoid it and why could this be happening?
I don't know if maybe increasing the AIRFLOW_CORE_KILLED_TASK_CLEANUP_TIME could fix it, but I would like to understand it.
Because I can't use the airflow CLI, I'm actually parsing scheduler logs with grep on airflow1 in order to retrieve some infos such as :
check if the dag is triggered or not / if it's successful or not / start timestamp with the pattern "INFO Marking run" :
[2021-12-01 11:06:50,340] {logging_mixin.py:112} INFO - [2021-12-01 11:06:50,339] {dagrun.py:307} INFO - Marking run <DagRun prd_*** # 2021-12-01 10:02:00+00:00: scheduled__2021-12-01T10:02:00+00:00, externally triggered: False>successful
when the dag is not triggered, I use the pattern 'INFO - Created' to retrieve the dag' start timestamp :
[2021-12-01 11:04:49,213] {scheduler_job.py:1298} INFO - Created <DagRun prd_*** # 2021-12-01T10:02:00+00:00: scheduled__2021-12-01T10:02:00+00:00, externally triggered: False>
It works well on airflow1 but I can't find those data in the airflow2 scheduler logs after migration.
Does the configuration need to be changed ?
Regards,
Troubadour
You should use Airflow 2 REST API instead.
It was precisely done so that you do not have to parse logs. https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
I am using Airflow in a Docker container. I run a DAG with multiple Jupyter notebooks. I have the following error everytime after 60 minutes:
[2021-08-22 09:15:15,650] {local_task_job.py:198} WARNING - State of this instance has been externally set to skipped. Terminating instance.
[2021-08-22 09:15:15,654] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 277
[2021-08-22 09:15:15,655] {taskinstance.py:1284} ERROR - Received SIGTERM. Terminating subprocesses.
[2021-08-22 09:15:18,284] {taskinstance.py:1501} ERROR - Task failed with exception
I tried to tweak the config file but could not find the good option to remove the 1 hour timeout.
Any help would be appreciated.
The default is no timeout. When your DAG defines dagrun_timeout=timedelta(minutes=60) and execution time exceeds 60 minutes then active task stops with message "State of this instance has been externally set to skipped" logged.
I'm having a problem in changing status of a session from running to succeeded when a condition is met..
For example, I have a workflow as below:
start ---------> workA
| |---> workB
|--------> timer_20mins
From the diagram above, the process of workA and workB is running concurrently as well as the timer. So if the session's process succeeded before the configured 20 mins in the timer, the status of the timer should change from running to succeeded......I tried with post session success command, but it's still not working. How should I rectify the code?
Have you reviewed my previous answer available here?
This problem here is very similar, you just need the decision to fire in case both your sessions are done, so it should have one more decision task with Treat the input links as set to AND this time.
Briefly, the flow should look like this:
Start--->s_sessionA---\
\ > Decision [AND]
\--->s_sessionB--/ \
\ > Decision [OR] ---(False)---> Control Task [Fail parent]
\-------------->timer-----------/
We have jobs getting stuck in autosys R11 screen due to app server down
So is there any way to monitor autoys itself is up and running
Note-The jobs which got stuck shows completed in database but the dependent jobs cannot start though from front end the jobs are still in runnig status
Please help
chk_auto_up command will check if application server, event server,
scheduler and agent are working fine.
chase command checks if agent is running fine.
autoping command checks if agent is able to communicate with the
application server.
Check the log files of components by below commands :
autosyslog -e (scheduler)
autosyslog -s (server)
autosyslog -d j (job)
check the status of each components manually by below commands
unisrvcntr status waae_server.$AUTOSERV
unisrvcntr status waae_agent-$AGENT_NAME
unisrvcntr status waae_webserver.$AUOTSERV
unisrvcntr status waae_sched.$AUTOSERV