LSF parent job waiting for child - unix

I am using LSF bsub command to submit jobs in Unix environment. However the LSF job is waiting for child jobs to finish.
Here is an example (details about sample scripts below):
Without LSF: If I submit parent.ksh in Unix without using LSF, i.e in command prompt I type ./parent.ksh, parent.ksh get's submitted and get's completed in a second without waiting for child jobs script1.ksh and script2.ksh since these jobs have been submitted in background mode. This is typical behaviour in Unix.
With LSF: However if I submit my parent.ksh using LSF, i.e. bsub parent.ksh, parent.ksh wait for 180 seconds(thats the longest time taken by child number 2 i.e. script2.ksh) after submission. Please note I have expcluded time taken by job in pending status.
This is something I was not expecting, how can I ensure this does not happen?
I had checked, script1.ksh and script2.ksh was invoked in both cases.
parent.ksh:
#!/bin/ksh<br>
/abc/def/script1.ksh &<br>
/abc/def/script1.ksh &<br><br>
script1.ksh:
#!/bin/ksh<br>
sleep 80<br><br>
script2.ksh:
#!/bin/ksh<br>
sleep 180

I guess the reason is that LSF tracks process tree of your job, thus LSF job only completes till these two background processes exits. So you can try to create a new process group for the background process under a new session.

Related

Kernel cancelling a `input_request` at the end of the execution of a cell

I'm implementing a new Go kernel, using directly the ZMQ messages. But as an extra I want it to execute any bash command when a line is prefixed with !, similar to the usual ipython kernel.
One of the tricky parts seems to be bash scripts that take input -- there is no way (that I know of) to predict when I need to request input. So I took the following approach:
Whenever I execute a bash script, if it hasn't ended after 500ms (configurable), it issues an input_request.
If the kernel receives any input back (input_reply message), it writes the contents to the bash program's piped stdin (concurrently, not to block), and immediately issues another input_request.
Now at the end of the execution of the bash program, there is always the last input_request pending, and the corresponding widget expecting input from the user.
Jupyter doesn't drop the input_request after the execution of the cell ended, and requires the user to type enter and send an input_reply before another cell can be executed. It complains with "Cell not executed due to pending input"
Is there a way to cancel the input_request (the pending input) if the execution of the last cell already finished ?
Maybe there is some undocumented message that can be send once the bash program ends ?
Any other suggested approach ?
I know something similar works in colab.research.google.com, if I do:
!while read ii; do if [[ "${ii}" == "done" ]] ; then exit 0; fi ; echo "Input: $ii"; done
It correctly asks for inputs, and closes the last one.
But I'm not sure how that is achieved.
Jupyter's ipython notebook doesn't seem to have that smarts though, at least here the line above just locks. I suppose it never sends a input_request message.
many thanks in advance!

Apache Airflow: rerun for tasks with date parameters

I have a hourly shell script job that takes a date and hour as input params. The date and hour are used to construct the input path to fetch data for the logic contained in the job DAG. When a job fails and I need to rerun it (by clicking "Clear" for the failed task node to clean up the status to re-trigger a new run), how can I make sure the date and hour used for rerun are the same as the failed run since the rerun could happen in a different hour as the original run?
You have 3 options:
Hover to the failed task which is going to clear, in its displaying tag there will be a value with key Run:, it is its Execution date and time.
Click on the failed task which is going to clear, heading of its displaying popup which has the clear option will be [taskname] on [executiondatewithtime]
Open the task log, the first line after the attempts count will be included a string with format Executing <Task([TaskName]): task_id> on [ExecutionDate withTime]

Manual DAG run set individual task state

I have a DAG without a schedule (it is run manually as needed). It has many tasks. Sometimes I want to 'skip' some initial tasks by changing the task state to SUCCESS manually. Changing task state of a manually executed DAG fails, seemingly because of a bug in parsing the execution_date.
Is there another way to individually setting task states for a manually executed DAG?
Example run below. The execution date of the Task is 01-13T17:27:13.130427, and I believe the milliseconds are not being parsed correctly.
Traceback
Traceback (most recent call last):
File "/opt/conda/envs/jumpman_prod/lib/python3.6/site-packages/airflow/www/views.py", line 2372, in set_task_instance_state
execution_date = datetime.strptime(execution_date, '%Y-%m-%d %H:%M:%S')
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/opt/conda/envs/jumpman_prod/lib/python3.6/_strptime.py", line 365, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: ..130427
It's not working from Task Instances page, but you can do it in another page:
- open DAG graph view
- select needed Run (screen 1) and click go
- select needed task
- in a popup window click Mark success (screen 2)
- then confirm.
PS it relates to airflow 1.9 version
Screen 1
Screen 2
What you may want to do to accomplish this is using branching, which, as the name suggests, allows you to follow different execution paths according to some conditions, just like an if in any programming language.
You can use the BranchPythonOperator (documented here) to attain this goal: the idea is that this operator is configured by a python_callable, a function that outputs the task_id to execute next (which should, of course, be a task which is directly downstream from the BranchPythonOperator itself).
Using branching will set the skipped tasks to the proper state automatically, as mentioned in the documentation:
All other “branches” or directly downstream tasks are marked with a state of skipped so that these paths can’t move forward. The skipped states are propagated downstream to allow for the DAG state to fill up and the DAG run’s state to be inferred.
The resulting DAG would look something like the following:
(source: apache.org)
Branching is documented here, on the official Apache Airflow documentation.

Control-M : setting job status ok after specified time

I have a control-m file watcher job which waits for a specific file, if file arrived with in specified time job ends ok, but I want to set job status ok when file does not arrived in specified time instead of keep waiting for the file, is this possible ? how to implement it ?
Thank you.
There are two ways of setting up a file-watcher.
File Watcher Job
Filewatcher Utility in Control M ctmfw
There are two consequences of FW jobs getting completed.
Giving the out condition to next job, so that the successer jobs start executing
Just to complete the job, so that this gets cleard off in New day process.
Now, if you want 1st consequence, then this is one option -
Assume that your FW job [ABC] runs between 0600 - 1800, and the out condition it passes to the successor job is ABC-OK. Successor job [DEF] runs on getting the condition ABC-OK; Keep a dummy job [ABC_DUMMY] which runs on 1805 that sets the same condition ABC-OK. So, once ABC_DUMMY completes, DEF will get the condition it is looking for and will execute.
If the file arrived early, then the FW job ABC will run, and it will set the condition ABC-OK. and DEF will start running.
In both this condition, ensure that once DEF is completed, ABC-OK is negated.
If you are looking for second consequence, then I believe as long as job is not failing, FW jobs will be in 'To Run' status, and this will get cleared off in the New Day Process.
Happy to help further. Just post your doubts here.
JN
Edit your FileWatcher job
In the EXECUTION tab:
Submit between "Enter your beginning time" to "enter your ending time"
In the STEPS tab:
ON (Statement=* CODE=COMPSTAT=0)
DO OK
DO CONDITION, NAME=FILE-FOUND
ON (Statement=* CODE=COMPSTAT!0)
DO OK
DO CONDITION, NAME=FILE-NOT-FOUND
Use wait until parameter in file watcher. Suppose if you want the job to watch for the file until 06:00 AM, mention 06:00 AM in wait until parameter mention 06:00.
Exactly at 06:00 AM the job will get fail if it doesn't find the file. then u can use step tab to set the job okay with either of the following options.
Option 1:
ON(ON (Statement=* CODE=COMPSTAT!0))
DO OK
or
Option 2:
ON( (statement=* CODE=NOTOK))
DO OK

Return value of background process

I need to take some action based on the return value of a background process ie if it terminates in the first place.
Specifically : in ideal operation, the server which I run as the background process will just keep running forever. In such cases keeping it in the background makes sense, since I want my shell script to do other things after spawning the server. But if the server terminates abnormally, I want to preferably use the exit return value from the server to decide whether to kill my main script or not. If that's not possible I at least want to abort the main script rather than run it with a failed server.
I am looking for something in the nature of an asynchronous callback for shell scripts. One solution is to spawn a monitoring process that periodically checks if the server has failed or not. Preferably I would want to do it without that within the main shell script itself.
You can use shell traps to invoke a function when a child exits by trapping SIGCHLD. If there is only one background process running, then you can wait for it in the sigchld handler and get the status there. If there are multiple background children running it gets a little more complex; here is a code sample (only tested with bash):
set -m # enable job control
prtchld() {
joblist=$(jobs -l | tr "\n" "^")
while read -a jl -d "^"; do
if [ ${jl[2]} == "Exit" ] ; then
job=${jl[1]}
status=${jl[3]}
task=${jl[*]:4}
break
fi
done <<< $joblist
wait $job
echo job $task exited: $status
}
trap prtchld SIGCHLD
(sleep 5 ; exit 5) &
(sleep 1 ; exit 7) &
echo stuff is running
wait
I like the first one better for my purpose, I presume in the "do something here if process failed" I can kill the script that called this wrapper script for foo by using it's name.
I think the first solution works well for multiple children. Anyway, I had to get this done quickly, so I used a hack which works for my application:
I start the process in background as usual within the main script, then use $! to get it's pid ( since $! returns last bg pid), sleep for 2 seconds and do a ps -e | grep pid to check if the process is still around based on the return value of (ps -e | grep pid). This works well for me because if my background process aborts it does so immediately ( because the address is in use).
You could nest the background process inside a script. For example, if the process you wish to send to the background is called foo:
#!/bin/sh
foo
if [ $? ]
then
# do something here if process failed
fi
Then just run the above script in the background instead of foo. you can kill it if you need to shut it down, but otherwise it will never terminate as long as foo continues to run, and if foo dies, you can do whatever you want to based on its error code.

Resources