i made a zombie process with this code :
pid_t child;
cout<<getpid();
child=fork();
if (child>0)
sleep(60);
else
exit(0);
and i'm using this command :
ps -e -o pid,ppid,stat,command
it's okey , but i expect see Z in front of my process(stat) but it's Z+ , what thats mean ?
From the man page for ps, more specifically the process state codes:
Z defunct ("zombie") process, terminated but not reaped by its parent.
+ is in the foreground process group.
When the shell executes your code, it changes your program to a separate foreground process group. Every child of your code is in the same foreground process group as the original program, so even once the parent exits, the child is still in the foreground process group which is why you see the +.
Related
I want to execute a batch file using People code in Application Engine Program. But The program have an issue returning Exec code as a non zero value (Value - 1).
Below is people code snippet below.
Global File &FileLog;
Global string &LogFileName, &Servername, &commandline;
Local string &Footer;
If &Servername = "PSNT" Then
&ScriptName = "D: && D:\psoft\PT854\appserv\prcs\RNBatchFile.bat";
End-If;
&commandline = &ScriptName;
/* Need to commit work or Exec will fail */
CommitWork();
&ExitCode = Exec("cmd.exe /c " | &commandline, %Exec_Synchronous + %FilePath_Absolute);
If &ExitCode <> 0 Then
MessageBox(0, "", 0, 0, ("Batch File Call Failed! Exit code returned by script was " | &ExitCode));
End-If;
Any help how to resolve this issue.
Best bet is to do a trace of the execution.
Thoughts:
Can you log on the the process scheduler you are running this on and execute the script OK?
Is the AE being scheduled or called at run-time?
You should not need to change directory as you are using a fully qualified path to the script.
you should not need to call "cmd /c" as this will create an additional shell for you application to run within, making debuging harder, etc.
Run a trace, and drop us the output. :) HTH
What about changing the working directory to D: inside of the script instead? You are invoking two commands and I'm wondering what the shell is returning to exec. I'm assuming you wrote your script to give the appropriate return code and that isn't the problem.
I couldn't tell from the question text, but are you looking for a negative result, such as -1? I think return codes are usually positive. 0 for success, some other positive number for failure. Negative numbers may be acceptable, but am wondering if Exec doesn't like negative numbers?
Perhaps the PeopleCode ChDir function still works as an alternative to two commands in one line? I haven't tried it for a LONG time.
Another alternative that gives you significant control over the process is to use java.lang.Runtime.exec from PeopleCode: http://jjmpsj.blogspot.com/2010/02/exec-processes-while-controlling-stdin.html.
When an out-of-memory error is raised in a parfor, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?
Here is what happens by default when an out-of-memory error occurs in a parfor: the script terminated, as shown in the screenshot below.
I wish there was a way to just kill one slave (i.e. removing a worker from parpool) or stop using it to release as much memory as possible from it:
If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:
The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.
%Your intended iterator
iterator=1:10;
%flags which indicate what succeeded
succeeded=false(size(iterator));
%result array
result=nan(size(iterator));
FLAG='ANY_WORKER_CRASHED';
while ~all(succeeded)
fprintf('Another try\n')
%determine which iterations should be done
todo=iterator(~succeeded);
%initialize array for the remaining results
partresult=nan(size(todo));
%initialize flags which indicate which iterations succeeded (we can not
%throw erros, it throws aray results)
partsucceeded=false(size(todo));
%flag indicates that any worker crashed. Have to use file based
%solution, don't know a better one. #'
delete(FLAG);
try
parfor falseindex=1:sum(~succeeded)
realindex=todo(falseindex);
try
% The flag is used to let all other workers jump out of the
% loop as soon as one calculation has crashed.
if exist(FLAG,'file')
error('some other worker crashed');
end
% insert your code here
%dummy code which randomly trowsexpection
if rand<.5
error('hit out of memory')
end
partresult(falseindex)=realindex*2
% End of user code
partsucceeded(falseindex)=true;
fprintf('trying to run %d and succeeded\n',realindex)
catch ME
% catch errors within workers to preserve work
partresult(falseindex)=nan
partsucceeded(falseindex)=false;
fprintf('trying to run %d but it failed\n',realindex)
fclose(fopen(FLAG,'w'));
end
end
catch
%reduce poolsize by 1
newsize = matlabpool('size')-1;
matlabpool close
matlabpool(newsize)
end
%put the result of the current iteration into the full result
result(~succeeded)=partresult;
succeeded(~succeeded)=partsucceeded;
end
After quite bit of research, and a lot of trial and error, I think I may have a decent, compact answer. What you're going to do is:
Declare some max memory value. You can set it dynamically using the MATLAB function memory, but I like to set it directly.
Call memory inside your parfor loop, which returns the memory information for that particular worker.
If the memory used by the worker exceeds the threshold, cancel the task that worker was working on. Now, here it get's a bit tricky. Depending on the way you're using parfor, you'll either need to delete or cancel either the task or worker. I've verified that it works with the code below when there is one task per worker, on a remote cluster.
Insert the following code at the beginning of your parfor contents. Tweak as necessary.
memLimit = 280000000; %// This doesn't have to be in parfor. Everything else does.
memData = memory;
if memData.MemUsedMATLAB > memLimit
task = getCurrentTask();
cancel(task);
end
Enjoy! (Fun question, by the way.)
One other option to consider is that since R2013b, you can open a parallel pool with 'SpmdEnabled' set to false - this allows MATLAB worker processes to die without the whole pool being shut down - see the doc here http://www.mathworks.co.uk/help/distcomp/parpool.html . Of course, you still need to arrange somehow to shutdown the workers.
I have a script which logs certain data when a benchmark (some c code like matrix multiplication) runs.
I want to first start the log script when benchmark starts, this is easy since I can just start the binary from the log script and then proceed to log the info.
But the real question is when do I stop it? The benchmark can stop at anytime (The log script shouldn't stop the benchmark). How do I get the info/variable which can be used in the log script to stop it when benchmark program stops?
I was thinking if I can use PID of the benchmark, but then thought there should be a better solution than searching and using the PID.
Thanks!
Perhaps you could try something like this:
#!/bin/bash
#
# Your main script
#
# Run your log program in background
your_log_program &
# The PID of last background program
LOGPROGRAMPID=$!
# Install EXIT trap (EXIT is a bash's special event)
trap 'kill -15 $LOGPROGRAMPID; exit 0' EXIT
# In foreground launch your benchmark program
run_your_benchmark_program
# When benchmark program ends your trap will be launched and it will kill your log
# program.
I am trying to create a scripting widget for R using the tcltk package. But I don't know how to to create a STOP button to interrupt a script coming from the widget. Basically, I would like to have a button, a menu option, and/or a key binding that will interrupt the current script execution, but I can't figure out how to make it work.
One (non-ideal) strategy is to just use the RGui STOP button (or <ESC> or <Ctrl-c> on the console), but this seems cause the tk widget to hang permanently.
Here's a minimal example of the widget based on the tcl/tk examples (http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/evalRcode.html):
require(tcltk)
tkscript <- function() {
tt <- tktoplevel()
txt <- tktext(tt, height=10)
tkpack(txt)
run <- function() {
code <- tclvalue(tkget(txt,"0.0","end"))
e <- try(parse(text=code))
if (inherits(e, "try-error")) {
tkmessageBox(message="Syntax error", icon="error")
return()
}
print(eval(e))
}
tkbind(txt,"<Control-r>",run)
}
tkscript()
In the scripting widget if you try executing Sys.sleep(20) and then interrupt from the console, the widget hangs. The same thing happens if one were to run, for example, an infinite loop, like while(TRUE) 2+2.
I think what I'm experiencing may be similar to the bug reported here: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14730
Also, I should mention that I'm running this on R 3.0.0 on Windows (x64), so maybe the problem is platform-specific.
Any thoughts on how to interrupt the running script without causing the widget to hang?
It depends on what the script is doing; a script that is sitting waiting for the user to do something is easy to interrupt (since you can make it listen for your interruption message) but a script that is doing an intensive loop is rather more tricky. The possible solutions depend on the version of Tcl inside.
Interpreter Cancellation — Requires 8.6
If you are using Tcl 8.6, you can use interpreter cancellation to stop the script. All you have to do is arrange for:
interp cancel -unwind
to be run, and the script will return control back to you. A reasonable way of doing this would be to use the extra Tcl package TclX (or Expect) to install a signal handler that will run the command when a signal is received:
package require Tcl 8.6
package require TclX
# Our signal handler
proc doInterrupt {} {
# Print a message so you can see what's happening
puts "It goes boom!"
# Unwind the stack back to the R code
interp cancel -unwind
}
# Install it...
signal trap sigint doInterrupt
# Now evaluate the code which might try to run forever
Adding signal handling in earlier versions is possible, but not quite as easy as you can't guarantee that things will return control to you so easily; the stack unwinding isn't there.
Execution Time Limiting — Requires 8.5 (or 8.6)
The other thing you could try is setting an execution time limit on a slave interpreter and running the user script in that slave. The time limit machinery will then guarantee a trap back to you every so often, giving you a chance to check for interruption and a way to do the stack unwinding. This is a considerably more complex method.
proc nextSecond {} {
clock add [clock seconds] 1 second
}
interp create child
proc checkInterrupt {} {
if {["decide if the R code wanted an interrupt"]} {
# Do nothing
return
}
# Reset the time limit to another second ahead
interp limit child time -seconds [nextSecond]
}
interp limit child time -seconds [nextSecond] -command checkInterrupt
interp eval child "the user script"
Think of this mechanism being a lot like how an operating system works, and yes, it can stop a tight loop.
Use a Subprocess — Any version of Tcl
The most portable mechanism is to run the script in a subprocess (with the tclsh program; the exact name varies by version, platform and distribution, but it's all variations on that) and just kill off that subprocess when it is no longer wanted with pskill. The downside of this is that you cannot (easily) carry any state over from one execution to another; subprocesses are pretty isolated from each other. The other methods described above can leave the state able to be accessed from another run: they do a real interrupt, whereas this destroys.
Also, I don't know exactly how to start the subprocess in such a way that you can communicate with it from R while it is still running; system and system2 don't seem to quite give enough control, and hacking something with forking is non-portable. Needs an R expert here. Alternatively, use a Tcl script (running inside the R process) to do it with:
set executable "tclsh"; # Adjust this line
set scriptfile "file/where/you/put/the_user/script.tcl"
# Open a bi-directional pipe to talk to the subprocess
set pipeline [open |[list $executable $scriptfile] "r+"]
# Get the subprocess's PID
set thePID [pid $pipeline]
That is actually reasonably portable to Windows (if not perfectly so) but intermediate states with forking are not.
I seem to have found a solution to prevent the tk widget from hanging by embedding the eval in a tryCatch that handles interrupt conditions. Unfortunately, it requires interruption from the console rather than the widget, but it does work. tryCatch is pretty poorly documented, so I'm putting this out here in case anyone else has similar needs in the future.
require(tcltk)
tkscript <- function() {
tt <- tktoplevel()
txt <- tktext(tt, height=10)
tkpack(txt)
run <- function() {
code <- tclvalue(tkget(txt,"0.0","end"))
e <- try(parse(text=code))
if (inherits(e, "try-error")) {
tkmessageBox(message="Parse error", icon="error")
tkfocus(txt)
return()
}
e <- tryCatch(eval(e),
error = function(errmsg)
tkmessageBox(message=as.character(errmsg), icon="error"),
interrupt = function(errmsg)
tkmessageBox(message=as.character(errmsg), icon="error")
)
print(eval(e))
}
tkbind(txt,"<Control-r>",run)
}
tkscript()
Another strategy I stumbled across is using tools::pskill (in the form pskill(Sys.getpid(), SIGINT) as a tk menu option) to interrupt the process, but - at least on Windows - this terminates the entire R process (including the tk widget). So, that's not a great solution but at least seems to exit everything as an absolute fallback.
On OS X 10.4/5/6:
I have a parent process which spawns a child. I want to kill the parent without killing the child. Is it possible? I can modify source on either app.
As NSD asked, it really depends on how it is spawned. If you are using a shell script, for example, you can use the nohup command to run the child through.
If you are using fork/exec, then it is a little more complicated, but no too much so.
From http://code.activestate.com/recipes/66012/
import sys, os
def main():
""" A demo daemon main routine, write a datestamp to
/tmp/daemon-log every 10 seconds.
"""
import time
f = open("/tmp/daemon-log", "w")
while 1:
f.write('%s\n' % time.ctime(time.time()))
f.flush()
time.sleep(10)
if __name__ == "__main__":
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0:
# exit first parent
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #1 failed: %d (%s)" % (e.errno, e.strerror)
sys.exit(1)
# decouple from parent environment
os.chdir("/")
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent, print eventual PID before
print "Daemon PID %d" % pid
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #2 failed: %d (%s)" % (e.errno, e.strerror)
sys.exit(1)
# start the daemon main loop
main()
This is one of the best books ever written. It covers these topics in great and extensive detail.
Advanced Programming in the UNIX Environment, Second Edition (Addison-Wesley Professional Computing Series) (Paperback)
ISBN-10: 0321525949
ISBN-13: 978-0321525949
5 star amazon reviews (I'd give it 6).
If the parent is a shell, and you want to launch a long running process then logout, consider nohup (1) or disown.
If you control the coding of the child, you can trap SIGHUP and handling it in some non-default way (like ignoring it outright). Read the signal (3) and sigaction (2) man pages for help with this. Either way there are several existing questions on StackOverflow with good help.