Can I recursively source a TCL script indefinitely? - recursion

I have a TCL script running inside a TCL shell (synopsys primetime if it's of any difference).
The script is initiated by source <script> from the shell.
The script calls itself recursively after a specific time interval has passed by calling source <script> at the end of the script.
My question is a bit academic: Could there be a stack-overflow issue if the script keeps calling itself in this method?
If I expand the question: What happens when a TCL script sources another script? Does it fork to a child process? if so, then every call forks to another child, which will eventually stack up to a pile of processes - but since the source command itself is not parallel - there is no fork (from my understanding).
Hope the question is clear.
Thanks.

Short answer: yes.
If you're using Tcl 8.5 or before, you'll run out of C stack. There's code to try to detect it and throw a soft (catchable) error if you do. There's also a (lower) limit on the number of recursions that can be done, controllable via interp recursionlimit. Note that this is counting recursive entries to the core Tcl script interpreter engine; it's not exactly recursion levels in your script, though it is very close.
# Set the recursion limit for the current interpreter to 2000
interp recursionlimit {} 2000
The default is 1000, which is enough for nearly any non-recursive algorithm.
In Tcl 8.6, a non-recursive execution engine is used for most commands (including source). This lets your code use much greater recursion depths, limited mainly by how much general memory you have. I've successfully run code with recursion depths of over a million on conventional hardware.
You'll still need to raise the interp recursionlimit though; the default 1000 limit remains because it catches more bugs (i.e., unintentional recursions) than not. It's just that you can meaningfully raise it much more.

The command doesn’t fork a new process. It acts as if the lines in the sourced files were there in place of the invocation of source. They are interpreted by the current interpreter unless you specify otherwise.

Related

nvprof R gputools code never ends

I am trying to run "nvprof" from command line on R. Here is how I am doing it:
./nvprof --print-gpu-trace --devices 0 --analysis-metrics --export-profile /home/xxxxx/%p R
This gives me a R prompt and i write R code. I can do with Rscript too.
Problem i see is when i give --analysis-metrics option it gives me lots of lines similar to
==44041== Replaying kernel "void ger_kernel(cublasGerParams)"
And R process never ends. I am not sure what I am missing.
nvprof doesn't modify process exit behavior, so I think you're just suffering from slowness because your app invokes a lot of kernels. You have two options to speed this up.
1. Selectively profiling metrics
The --analysis-metrics option enables collection of a number of metrics, which requires kernels to be replayed - collecting a different set of metrics for each kernel run.
If your application has a lot of kernel invocations, this can take time. I'd suggest you query the available metrics with the nvprof --query-metrics command, and then manually choose the metrics you are interested in.
Once you know which metrics you want, you can query them using nvprof -m metric_1,metric_2,.... This way, the application will profile less metrics, hence requiring less replays, and running faster.
2. Selectively profiling kernels
Alternatively, you can only profile a specific kernel using the --kernels <context id/name>:<stream id/name>:<kernel name>:<invocation> option.
For example, nvprof --kernels ::foo:2 --analysis-metrics ./your_cuda_app will profile all analysis metrics for the kernel whose name contains the string foo, and only on its second invocation. This option takes regular expressions, and is quite powerful.
You can mix and match the above two approaches to speed up profiling. You will be able to find more help about these and other nvprof options using the command nvprof --help.

Increasing stack size in browsers

Short question: I have a javascript that goes very deep in recursion. How can I increase the stack size so that I can execute it (something like "ulimit -s unlimited" in Unix systems)?
Long story: I have to draw a graph and I use Cytoscape JS (http://js.cytoscape.org/) coupled with the Dagre layout extension (https://github.com/cytoscape/cytoscape.js-dagre). The drawing algorithm goes deep in the recursion and I end up getting "Uncaught RangeError: Maximum call stack size exceeded" in Chrome and "too much recursion" in Firefox. How can I set the stack size to unlimited or very large (i.e. like "ulimit -s unlimited" in Unix systems) so that I can draw the graph?
Thank you!
Chrome has a flag for this:
chromium-browser --js-flags="--stack-size 2048"
You will also want to run ulimit -s unlimited before running the command above, though: otherwise, your deeply recursive Javascript code will crash Chrome.
You cannot alter the stack size in browsers, but you can use a trick called trampolining.
You can find a working code solution here:
How to understand trampoline in JavaScript?
Try changing your algorhythm to not use as much stack space on each iteration of the function. For instance:
Setting local variables to null when not being used.
Use global variables for temporary calculations when possible. That way, that temporary variable won't be on the stack.
Use fewer variables in your recursive function. Reuse the same variables for different things in different parts of the function.
Break your recursive function into several functions. Some of those functions won't be recursive and therefore the local variables in those functions won't carry on when the recursive function calls itself.
Create a global array of things to do and add items to this list instead of calling a function recursively. use the push and pop methods of the array() object.
Have fewer parameters on your recursive function. Pass an object instead.
I hope these ideas help you.

Is there a way to check how much memory an R statement is going to allocate?

I am tuning a data import script, and occasionally I find an approach puts too much into memory in one call (usually this is because I am writing inefficient code). The "failed to allocate" message is only sort of useful in that it tells you how much memory was needed (without an informative traceback) and only if the allocation fails. Regular profiling requires that enough memory be available for allocation (and contiguously placed), which changes depending on the circumstances under which the code is run, and is very slow.
Is there a function that simulates a call to see how much memory will be used, or otherwise efficiently profiles how much memory a line of R will need whether it succeeds or fails? Something that could wrap an existing line of code in a script like System.time would be ideal.
Edit: lsos() does not work for this because it only describes what is stored after a command is run. (see: Reserved memory of R is twice the size of an allocated array)

How can I label my sub-processes for logging when using multicore and doMC in R

I have started using the doMC package for R as the parallel backend for parallelised plyr routines.
The parallelisation itself seems to be working fine (though I have yet to properly benchmark the speedup), my problem is that the logging is now asynchronous and messages from different cores are getting mixed in together. I could created different logfiles for each core, but I think I neater solution is to simply add a different label for each core. I am currently using the log4r package for my logging needs.
I remember when using MPI that each processor got a rank, which was a way of distinguishing each process from one another, so is there a way to do this with doMC? I did have the idea of extracting the PID, but this does seem messy and will change for every iteration.
I am open to ideas though, so any suggestions are welcome.
EDIT (2011-04-08): Going with the suggestion of one answer, I still have the issue of correctly identifying which subprocess I am currently inside, as I would either need separate closures for each log() call so that it writes to the correct file, or I would have a single log() function, but have some logic inside it determining which logfile to append to. In either case, I would still need some way of labelling the current subprocess, but I am not sure how to do this.
Is there an equivalent of the mpi_rank() function in the MPI library?
I think having multiple process write to the same file is a recipe for a disaster (it's just a log though, so maybe "disaster" is a bit strong).
Often times I parallelize work over chromosomes. Here is an example of what I'd do (I've mostly been using foreach/doMC):
foreach(chr=chromosomes, ...) %dopar% {
cat("+++", chr, "+++\n")
## ... some undoubtedly amazing code would then follow ...
}
And it wouldn't be unusual to get output that tramples over each other ... something like (not exactly) this:
+++chr1+++
+++chr2+++
++++chr3++chr4+++
... you get the idea ...
If I were in your shoes, I think I'd split the logs for each process and set their respective filenames to be unique with respect to something happening in that process's loop (like chr in my case above). Collate them later if you must ... ie. map/reduce your log files :-)

What does the load-average used by parallel make represent?

Using GNU make on Windows, what exactly does the load-average value represent?
For example:
make -j --load-average=2.5
What does the 2.5 mean?
It means that make will not start any new thread until the number of runnable processes, averaged over some period of time is below 2.5.
Edit, following vines' remark
a runnable process, in Unix parlance, is a process that is either waiting for CPU time or readily running. Technically it is a process which is in TASK_RUNNING state.
However... this prompted me to re-read the original question, and note its "on Windows" part....
Whereby my original answer is, loosely, correct for GNU Make on Unix-like hosts, it is plain short of factual on Windows. The discrepancy of behavior is due to the fact the the two operating systems provide very different metrics to describe their "current" CPU load. Consequently Make's logic has to interpret these CPU load readings differently, to serve its --load-average feature.
The purpose of the --load-average parameter is to provide guidance to Make as to when it can start new threads; causing Make to share CPU resources with other applications (and within itself) more elegantly.
In Linux, the semantic of this parameter is very close to its name: new Make threads are allowed when the load-average, as reported by the kernel (I'm assuming this is the "one minute" load average, though it could be the five minutes one), is less than the parameter value.
In Windows, Make computes the load average from the weighed-average of the CPU Load (as reported by GetSystemTimes function) and the memory load (eg. from GlobalMemoryStatusEx function).
On Windows - nothing, apparently. This is a UNIX term: http://en.wikipedia.org/wiki/Load_%28computing%29
My copy of Cygwin reports zero load averages when I run the uptime command. I don't think there is a quick way of calculating this on Windows; it was asked on the Cygwin mailing list in the past.
In other words: it's not implemented, so it's always zero.
Here's the implementation of getloadavg, directly from the GNU Make 3.81 sources:
# if !defined (LDAV_DONE) && (defined (__MSDOS__) || defined (WINDOWS32))
# define LDAV_DONE
/* A faithful emulation is going to have to be saved for a rainy day. */
for ( ; elem < nelem; elem++)
{
loadavg[elem] = 0.0;
}
# endif /* __MSDOS__ || WINDOWS32 */
I haven't checked on newer versions of GNU make but I doubt it's changed.

Resources