Return value of background process - unix

I need to take some action based on the return value of a background process ie if it terminates in the first place.
Specifically : in ideal operation, the server which I run as the background process will just keep running forever. In such cases keeping it in the background makes sense, since I want my shell script to do other things after spawning the server. But if the server terminates abnormally, I want to preferably use the exit return value from the server to decide whether to kill my main script or not. If that's not possible I at least want to abort the main script rather than run it with a failed server.
I am looking for something in the nature of an asynchronous callback for shell scripts. One solution is to spawn a monitoring process that periodically checks if the server has failed or not. Preferably I would want to do it without that within the main shell script itself.

You can use shell traps to invoke a function when a child exits by trapping SIGCHLD. If there is only one background process running, then you can wait for it in the sigchld handler and get the status there. If there are multiple background children running it gets a little more complex; here is a code sample (only tested with bash):
set -m # enable job control
prtchld() {
joblist=$(jobs -l | tr "\n" "^")
while read -a jl -d "^"; do
if [ ${jl[2]} == "Exit" ] ; then
job=${jl[1]}
status=${jl[3]}
task=${jl[*]:4}
break
fi
done <<< $joblist
wait $job
echo job $task exited: $status
}
trap prtchld SIGCHLD
(sleep 5 ; exit 5) &
(sleep 1 ; exit 7) &
echo stuff is running
wait

I like the first one better for my purpose, I presume in the "do something here if process failed" I can kill the script that called this wrapper script for foo by using it's name.
I think the first solution works well for multiple children. Anyway, I had to get this done quickly, so I used a hack which works for my application:
I start the process in background as usual within the main script, then use $! to get it's pid ( since $! returns last bg pid), sleep for 2 seconds and do a ps -e | grep pid to check if the process is still around based on the return value of (ps -e | grep pid). This works well for me because if my background process aborts it does so immediately ( because the address is in use).

You could nest the background process inside a script. For example, if the process you wish to send to the background is called foo:
#!/bin/sh
foo
if [ $? ]
then
# do something here if process failed
fi
Then just run the above script in the background instead of foo. you can kill it if you need to shut it down, but otherwise it will never terminate as long as foo continues to run, and if foo dies, you can do whatever you want to based on its error code.

Related

Kernel cancelling a `input_request` at the end of the execution of a cell

I'm implementing a new Go kernel, using directly the ZMQ messages. But as an extra I want it to execute any bash command when a line is prefixed with !, similar to the usual ipython kernel.
One of the tricky parts seems to be bash scripts that take input -- there is no way (that I know of) to predict when I need to request input. So I took the following approach:
Whenever I execute a bash script, if it hasn't ended after 500ms (configurable), it issues an input_request.
If the kernel receives any input back (input_reply message), it writes the contents to the bash program's piped stdin (concurrently, not to block), and immediately issues another input_request.
Now at the end of the execution of the bash program, there is always the last input_request pending, and the corresponding widget expecting input from the user.
Jupyter doesn't drop the input_request after the execution of the cell ended, and requires the user to type enter and send an input_reply before another cell can be executed. It complains with "Cell not executed due to pending input"
Is there a way to cancel the input_request (the pending input) if the execution of the last cell already finished ?
Maybe there is some undocumented message that can be send once the bash program ends ?
Any other suggested approach ?
I know something similar works in colab.research.google.com, if I do:
!while read ii; do if [[ "${ii}" == "done" ]] ; then exit 0; fi ; echo "Input: $ii"; done
It correctly asks for inputs, and closes the last one.
But I'm not sure how that is achieved.
Jupyter's ipython notebook doesn't seem to have that smarts though, at least here the line above just locks. I suppose it never sends a input_request message.
many thanks in advance!

Why does Progress go back to the initial screen after a session crash?

Hello all and thanks for viewing this question,
I have a program that users get access to via a login screen. Once the user's credentials have been validated on the login screen, the main program is called (from the login screen) and the login screen disappears. All good. However, if the session crashes (or I press CTRL-PAUSE), the main program is terminated and I end up at the initial login screen. I'd have assumed that after a session crash, Progress (11.4) should take me back to the OS (Windows Server 2012), but not back to the initial screen. I have tried placing QUIT in different areas of the program, but Progress still takes me back to the initial screen, while I need it to quit completely. Any thoughts would be greatly appreciated. Thanks!
It's the AVM's default behavior to rerun the startup procedure after a STOP condition has occurred that was not handled.
You can add an
ON STOP UNDO, RETURN "stopped" .
option to a DO, FOR or REPEAT block close where your "crash" happens. Then the calling procedure could check for the RETURN-VALUE of "stopped".
Assuming you are on a recent version (OpenEdge 12.x), you can also use CATCH Blocks for Progress.Lang.Stop:
CATCH stopcon AS Progress.Lang.Stop:
QUIT.
END CATCH.
I think that your use of the word "crashed" is very, very confusing. If your session actually "crashes" in the usual sense that _progres (or prowin if this is Windows) terminates, then you would not have any locked records remaining. You would also have a protrace file that would help you to identify where the issue occurs.
Incidentally, you could add error logging to the client startup to determine where the errors that QXtend cannot find are occurring:
_progres dbname -p startup.p -clientlog logname.log
You have not shared any code so I can only guess but, presumably, you are running your login program via the -p startup parameter.
Correct me if I am wrong but something along these lines:
_progres dbname -p startup.p
The startup program then runs whatever it runs to get you logged in and run the application. Maybe something like this:
/* startup.p
*/
message "(re)starting!".
pause.
run value( "login.p" ).
run value( "stuff.p" ).
message "all done".
pause.
quit.
And:
/* login.p
*/
message "hello, logging in!".
pause.
return.
Along with:
/* stuff.p
*/
message "hello, doing stuff!".
pause.
run value( "notthere.p" ).
message "hello, doing more stuff!".
pause.
return.
At some point an error occurs (you seem to want to call this a "crash"). I have arranged for a serious error to occur when stuff.p tries to "run notthere.p". So if you run my example you will see the behavior that you have described - your session "crashes", the startup procedure re-runs, and you get to the login screen again.
To change that and trap the error simply wrap a "DO ON STOP" around the RUN statements. Like this:
/* startup.p
*/
message "(re)starting!".
pause.
do on error undo, leave
on endkey undo, leave
on stop undo, leave
on quit undo, leave: /* "leave", exits this block when one of the named conditions arises */
run value( "login.p" ).
run value( "stuff.p" ).
/* we just leave because we finished normally */
end.
message "all done".
pause.
quit.
You mention QXtend so I am guessing that MFG/Pro is involved. If you cannot directly modify the MFG/Pro startup procedure (as I recall that would be "-p mfg.p") just adapt the code above to be a "shim" that runs mfg.p from within the "DO ON STOP..." block.
I believe I have found a way to quit the initial login screen when this appears as the result of a session crash, by using the the ETIME function. Thanks again, Mike for your response.

NetLogo: Can't "stop" forever button from another procedure?

I have simplified my problem below. I want to stop the execution of the forever button "go" when there's no robots, and I want to call this from another procedure ("test" in this case) like so:
to go
test
end
to test
if not any? robots [ stop ]
end
The reason for this is that I want to call stop where the robot dies such that I can send an appropriate user message.
Sadly, you must re-organize your code so that the you call if not any? robots [ stop ] in your go in order for the following to be true:
See the documentation:
A forever button can also be stopped from code. If the forever button
directly calls a procedure, then when that procedure stops, the button
stops. (In a turtle or patch forever button, the button won’t stop
until every turtle or patch stops – a single turtle or patch doesn’t
have the power to stop the whole button.)
Ref:http://ccl.northwestern.edu/netlogo/docs/programming.html#buttons
stop This agent exits immediately from the enclosing procedure, ask,
or ask-like construct (e.g. crt, hatch, sprout). Only the enclosing
procedure or construct stops, not all execution for the agent.
Ref: http://ccl.northwestern.edu/netlogo/docs/dict/stop.html
One alternative hacky solution which I'm tempted to not post may be to do the following where you raise an error in which then stops.
to go
carefully[test][error-message stop]
end
to test
if not any? robots [ error "no more robots!" ]
end

How to show in GNU Screen hardstatus tabs that have an activity?

Each time I have more than 4 tabs, I really like to know in which one there's activity.
Until now, I used to benefit from rxvt tabbing system. It displays a * next to tabs which are not shown, but have an activity. It's really usefull when you're on a IRC channel for example.
How can I do it with zsh/screen ?
Here's my .zshrc :
function precmd {
echo -ne "\033]83;title zsh\007"
}
function preexec {
local foo="$2 "
local bar=${${=foo}[1]}
echo -ne "\033]83;title $bar\007"
}
and my .screenrc
hardstatus off
hardstatus alwayslastline
hardstatus string '%{= kG}[ %{G}%H %{g}][%= %{= kw}%?%-Lw%?%{r}(%{W}%n*%f%t%?(%u)%?%{r})%{w}%?%+Lw%?%?%= %{g}][%{B} %m-%d %{W} %c %{g}]'
[...]
shell "/usr/bin/zsh"
aclchg :window: -rwx #?
aclchg :window: +x title
This is documented in the manual:
monitor [on|off]
Toggles activity monitoring of windows. When monitoring is
turned on and an affected window is switched into the background,
you will receive the activity notification message in the status
line at the first sign of output and the window will also be
marked with an `#' in the window-status display. Monitoring is
initially off for all windows.
You can manually toggle monitoring for the current window via C-a M, or if you want monitoring on for all windows by default, add defmonitor on to your screenrc. Once on, any windows to the left or right of the current one in your hardstatus line (as expanded by %-Lw and %+Lw respectively in your hardstatus string line) will show an # symbol after the hyphen which follows the window number. You'll also get an alert message which can be configured via the activity command.
On my system, the # doesn't appear until something else in the window changes. This can be fixed by removing hardstatus off from your config file.
Finally, I strongly recommend that you try tmux. Development on GNU screen has mostly stalled, and tmux is an actively maintained and developed replacement which has pretty much a large superset of screen's functionality.

LSF parent job waiting for child

I am using LSF bsub command to submit jobs in Unix environment. However the LSF job is waiting for child jobs to finish.
Here is an example (details about sample scripts below):
Without LSF: If I submit parent.ksh in Unix without using LSF, i.e in command prompt I type ./parent.ksh, parent.ksh get's submitted and get's completed in a second without waiting for child jobs script1.ksh and script2.ksh since these jobs have been submitted in background mode. This is typical behaviour in Unix.
With LSF: However if I submit my parent.ksh using LSF, i.e. bsub parent.ksh, parent.ksh wait for 180 seconds(thats the longest time taken by child number 2 i.e. script2.ksh) after submission. Please note I have expcluded time taken by job in pending status.
This is something I was not expecting, how can I ensure this does not happen?
I had checked, script1.ksh and script2.ksh was invoked in both cases.
parent.ksh:
#!/bin/ksh<br>
/abc/def/script1.ksh &<br>
/abc/def/script1.ksh &<br><br>
script1.ksh:
#!/bin/ksh<br>
sleep 80<br><br>
script2.ksh:
#!/bin/ksh<br>
sleep 180
I guess the reason is that LSF tracks process tree of your job, thus LSF job only completes till these two background processes exits. So you can try to create a new process group for the background process under a new session.

Resources