I am trying to control the CPU affinity of julia using taskset using
taskset -c 1,2 julia Foo.jl
However, internally Julia spawns may child processes to which this affinity mask does not seem to apply.
So mu question is: can I force a process and all its child processes to be limited to cpu-s 1 to 2.
I can see the use case in an HPC environment, so this seems worth a feature-request on github.
That said, taskset takes a PID argument, so here is a potential interim solution:
pids = map(x -> fetch(#spawnat x getpid()), procs())
map(x -> run(`taskset -c 1,2 -p $x`), pids)
(untested though, currently on a mac)
Related
I want to have a short script that opens a Julia REPL in a specific mode, for instance, the shell> mode or the C++ > (from Cxx.jl) mode. How can this be achieved?
Update:
After getting an answer I created a script to start Julia REPL in Cxx.jl C++ mode (and pre-run some C++ code). See it here: https://github.com/cdsousa/cxxrepl.jl.
Whatever this may be good for...
The easiest way (without having dug into the innards of Base.REPL) is to write the appropriate character to STDIN, e.g
write(STDIN.buffer,'?');
If you want to start the REPL and drop to shell mode immediately, call julia as
julia -i -e write(STDIN.buffer,';')
I am slowly getting closer to be able to read and write to/from named pipes of a background process through SBCL. What I do is kick off the program I am trying to read/write from/to:
todd#ubuntu:~/CoreNLP$ cat ./spin | /usr/bin/java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -outputFormat text > ./spout &
[1] 24616
So that all works out fine, so I kick off SBCL and do this:
(defparameter from-corenlp (open "./spout"))
Which also works out fine, but declaring the stream causes SBCL to spill the stream onto the screen (which is all the startup information from the background process). It does not wait until I read from the stream. Is that how things are supposed to work?
The solution, as I posted it to the stanford parser mailing list (stack overflow reformatted a lot of it to something weird, but you get the idea):
It took quite a while, but I finally figured out embedding (for the most part) the CoreNLP program (while in interactive mode) in SBCL Lisp.
First of all, forget using (sb-ext:run-program ...). This combination of spawning Java with a quoted argument (like the asterisk) no matter how well escaped, simply makes the spawned program crash.
Inferior shell seems to kick off the parser but it is only good for a one-off parse, even in the interactive mode. Perhaps I could have done better, but inferior shell needs to be installed and it is poorly documented.
The initial attempted solution of using Unix named pipes ends up being the final one, but it took a bit of work, first with buffering, then with the order of operations, and finally understanding some nuances about the parser program.
First, turning off buffering completely when running the program is important, so running it looks like this:
stdbuf --i=0 --o=0 --e=0 cat ./spin | /usr/bin/java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -outputFormat text > ./spout &
That is supposed to be running the parser in the background accepting input from spin and sending its output to spout. But if you look at the process table in Linux, you will not see it running. It is still waiting for something to pull from the output pipe before it can even run.
So, we run SBCL and start a stream pulling from the parser´s pipe:
(defparameter *from-corenlp* (open "./spout"))
NOW the parser starts running. Here, oddly, it also starts dumping output to the screen, not to the pipe! That is because all of this banner stuff when the parser starts and stops (and apparently even the NLP> prompt) is sent to stderr, not stdout. This is actually a good thing.
So then we declare the stream from Lisp to the parser:
(defparameter *to-corenlp* (open "./spin" :direction :output :if-exists :append))
Then we send some text for the parser to parse:
(write-line "This is the first test." *to-corenlp*)
I ran into a problem here a few times, even. Remember that Lisp has its own buffer so you have to clear out the stream every time:
(finish-output *to-corenlp*)
You then can run this line below a whole bunch of times to verify you obtain the exact same behavior you would have gotten from an interactive session of the parser:
(format t "~a~%" (read-line *from-corenlp*))
Which, if you are a good boy scout, should not only be true, but you can carry on with your interactive slave parser session for as long as you like:
(write-line "This is the second test." *to-corenlp*)
(finish-output *to-corenlp*)
Isn´t that great? And notice I pulled all of that off being terrible at Unix, terrible at Lisp and being a terrible boy scout!
Now so can you!
Is there a way to avoid Google Performance Tools listing files as "??:?", that is, failing to locate which file contains the function it is reporting on? How can I work out which library contains the function being called?
$ env LD_PRELOAD="/usr/lib/libprofiler.so.0" \
CPUPROFILE=output.prof python script.py
$ google-pprof --text --files /usr/bin/python output.prof
Using local file /usr/bin/python.
Using local file output.prof.
Removing _L_unlock_13 from all stack traces.
Total: 433 samples
362 83.6% 83.6% 362 83.6% dtrsm_ ??:?
58 13.4% 97.0% 58 13.4% dgemm_ ??:?
1 0.2% 97.2% 1 0.2% PyDict_GetItem /.../Objects/dictobject.c
1 0.2% 97.5% 1 0.2% PyParser_AddToken /.../Parser/parser.c
...
I am aiming to be able to profile the C code in a python package that has many compiled C extension modules. In the toy example above, what would I do to track down where "dtrsm_" is defined? If there are multiple loaded libraries that contain functions with that same name, is there any way to tell which version is being called?
C/C++ won't compile if the same pre-processed sourcefile (e.g. with #includes expanded) contains duplicate definitions for the same symbol. (Note that in the case of C++, symbols are mangled, according to compiler-specific schemes, to incorporate the argument signature so as to facilitate overloaded functions, which could not otherwise be differentiated.)
The linker is only concerned with unresolved symbols (so there ought be nothings preventing multiple libraries concurrently calling their own respective internally-defined functions with coincident names). If a file invokes a declared but undefined function, and multiple available libraries implement that symbol, then the linker is free to choose (say by precedence in a search-path) which version gets substituted in. (Incidentally, this is the same mechanism by which profilers such as gperftools or hpctoolkit are able to inject themselves and alter the normal behaviour of another application.)
Since different libraries are mapped to separate pages of memory, it ought to be possible to identify (from memory addresses) which library contains the executing version of a function. Indeed, the GNU debugger can identify the library that code is contained by, even when it fails to name a function.
$ gdb python
(gdb) run -c "from numpy import *; linalg.inv(random.random((1000,1000)))"
CTRL-C
(gdb) backtrace
#0 0x00007ffff5ba9df8 in dtrsm_ () from /usr/lib/libblas.so.3
...
#3 0x00007ffff420df83 in ?? () from /.../numpy/linalg/_umath_linalg.so
Linux (or rather the GNU C library) provides the "backtrace" call (for getting a list of pointers from the call stack), and the "backtrace_symbols" call for automatically converting each of those pointers to a descriptive string such as:
"/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fc429929ec5]"
Gperftools can (judging from a query on the github mirror) call the generic "backtrace", but instead of "backtrace_symbols" it "forks out to pprof to do the actual symbolizing". This is a fairly-epic perl script, and looks likely where the "??" comes from.
Crucially, google-pprof is trying to report on the source-file (and line-number) which defines the function, not the binary-file containing the machine-code (that is typically quoted in stack traces). It invokes the "nm" utility. On my system it appears (by running "nm -l -D") that libblas, unlike libc and the python binary, has been stripped of such debugging symbols (presumably for optimisation), explaining the result.
To answer the original question: the call-stack samples should definitively and explicitly specify which version is being called. These can probably be dumped using an option which was added in google-pprof several months ago, or (for time-intensive functions) can be roughly ascertained by manual resampling using gdb. (It's even conceivable that g-pprof can be adjusted to explicitly identify the binaries paths in its output summaries.) Alternatively one can run "nm" (and grep) on the candidate binaries/libraries (of which a short-list can be obtained by running "strings" on the profiler's raw output, among other methods). If the source is accessible (to grep) or the libraries are popular (on the web) then of course (and per Mike Dunlavey) it may be easiest to just query for the function name. In theory the "??:?" may be addressed by carefully recompiling the offending objects.
Just Google the offending function names. The ones you show above are defined in LAPACK. dtrsm is for solving a matrix equation. dgemm is for multiplying matrices.
What you need to know is 1) why they are being called, and 2) how big the matrices are.
To find out why they are being called, what I do is just examine individual stack samples, as here.
The reason matrix size matters is if they are small, these LAPACK routines can actually spend a relatively large fraction of their time just classifying their inputs, such as by calling a function LSAME.
I'm trying to read a massive sexp from file into memory, and it seems to be working out fine for smaller inputs, but on more deeply nested ones sbcl conks out with stack exhaustion. There seems to be a hard recursion limit (at 1000 functions deep) that sbcl simply cannot surpass (strangely, even when its stack size is increased). Example (code is here): make check-c works, but make check-cpp exhausts the stack as below:
INFO: Control stack guard page unprotected
Control stack guard page temporarily disabled: proceed with caution
Unhandled SB-KERNEL::CONTROL-STACK-EXHAUSTED in thread #<SB-THREAD:THREAD
"main thread" RUNNING
{10034E6DE3}>:
Control stack exhausted (no more space for function call frames).
This is probably due to heavily nested or infinitely recursive function
calls, or a tail call that SBCL cannot or has not optimized away.
PROCEED WITH CAUTION.
Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10034E6DE3}>
0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX))
1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100FC9006B}>)
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100FC9003B}>)
...
Why am I using recursion, then? Actually, I'm not, but unfortunately the builtin (read) uses recursion, and that's where the stack overflow is occurring. The other option (which I've started working on) is to write an iterative version of read which relies upon the more limited syntax that I'm feeding into it from a separate program to avoid the complexity of re-implementing read (my (currently broken) attempts at that are in the lisp branch of the above repository).
However, I'd prefer a more canonical solution. Are there alternatives to the builtin read that can parse deeply nested structures by avoiding recursion?
EDIT: This appears to be an insurmountable issue with sbcl itself, not the input data. For a quick example, try running:
(for i in $(seq 1 2000); do
echo -n "("
done; echo -n "2"; for i in $(seq 1 2000); do
echo -n ")"
done; echo) > file
And then in sbcl:
(with-open-file (file "file" :direction :input) (read file))
The same failure occurs.
EDIT: Asked around on #sbcl, and apparently the control stack size really applies only to new threads, and that the stack size for the main thread is affected by a lot of other factors as well. So I tried putting the read in a separate thread. Still didn't work. Checkout this repo and run make check if you're interested.
I don't know what you did (because you didn't show it exactly), but when I start sbcl as follows your example works fine for me:
sbcl --control-stack-size 100
Of course I recommended GNU CLISP and Embedded Common Lisp as they also work A-OK for your example.
I'll add a reference to this answer for future readers: https://stackoverflow.com/a/9002973/816536
I'll also mention that compiling the code with appropriate optimization options may be necessary in many CL implementations to benefit from tail-call optimization.
The following fragment of CL code does not work as I expected with CCL running SLIME. If I
first compile and load the file using C-c C-k, and then run
(rdirichlet #(1.0 2.0 3.0) 1.0)
in the SLIME/CCL REPL, I get the error
value 1.0 is not of the expected type DOUBLE-FLOAT.
[Condition of type TYPE-ERROR]
It works with SBCL. I expected the (setf *read-default-float-format* 'double-float)) to allow me to use values like 1.0. If I load this file into CCL using LOAD at the REPL it works. What am I missing?
(eval-when (:compile-toplevel :load-toplevel :execute)
(require :asdf) (require :cl-rmath) (setf *read-default-float-format* 'double-float))
(defun rdirichlet (alpha rownum)
;;(declare (fixnum rownum))
(let* ((alphalen (length alpha))
(dirichlet (make-array alphalen :element-type '(double-float 0.0 *) :adjustable nil :fill-pointer nil :displaced-to nil)))
(dotimes (i alphalen)
(setf (elt dirichlet i) (cl-rmath::rgamma (elt alpha i) rownum)))
;; Divide dirichlet vector by its sum
(map 'vector #'(lambda (x) (/ x (reduce #'+ dirichlet))) dirichlet)))
UPDATE: I forgot to mention my platform and versions. I'm using Debian squeeze x86. The version of SLIME is from Debian unstable, 1:20120525-2.
CCL is the 1.8 release. I tried it both with the upstream binaries from http://svn.clozure.com/publicsvn/openmcl/release/1.8/linuxx86/ccl, and a binary package created by me - see Package ccl at mentors.debian.net. The result was the same in each case.
It seems probable that this issue is SLIME specific. It would be helpful if people could comment whether they see this behavior or not. Also,
what is the equivalent of C-c C-k in SLIME, if one is running CCL in basic command line mode? (LOAD filename), or something else? Or, to ask a slightly different question, what CCL function is C-c C-k calling?
I notice that calling C-c C-k on the following code
(eval-when (:compile-toplevel :load-toplevel :execute)
(require :asdf) (require :cl-rmath) (setf *read-default-float-format* 'double-float))
(print *read-default-float-format*)
produces DOUBLE-FLOAT, though even though *read-default-float-format* at the REPL immediately afterwards gives SINGLE-FLOAT.
UPDATE 2: It looks like, as Rainer said, that the compilation occurs in a separate thread.
Per the function all-processes in Threads Dictionary
printing all-processes from the buffer using C-c C-k gives
(#<PROCESS worker(188) [Active] #x18BF99CE> #<PROCESS repl-thread(12) [Semaphore timed wait] #x187A186E> #<PROCESS auto-flush-thread(11) [Sleep] #x187A1C9E> #<PROCESS swank-indentation-cache-thread(6) [Semaphore timed wait] #x186C128E> #<PROCESS reader-thread(5) [Active] #x186C164E> #<PROCESS control-thread(4) [Semaphore timed wait] #x186BE3BE> #<PROCESS Swank Sentinel(2) [Semaphore timed wait] #x186BD0D6> #<TTY-LISTENER listener(1) [Active] #x183577B6> #<PROCESS Initial(0) [Sleep] #x1805FCCE>)
CL-USER> (all-processes)
and in the REPL gives
(#<PROCESS repl-thread(12) [Active] #x187A186E> #<PROCESS auto-flush-thread(11) [Sleep] #x187A1C9E> #<PROCESS swank-indentation-cache-thread(6) [Semaphore timed wait] #x186C128E> #<PROCESS reader-thread(5) [Active] #x186C164E> #<PROCESS control-thread(4) [Semaphore timed wait] #x186BE3BE> #<PROCESS Swank Sentinel(2) [Semaphore timed wait] #x186BD0D6> #<TTY-LISTENER listener(1) [Active] #x183577B6> #<PROCESS Initial(0) [Sleep] #x1805FCCE>)
so it seems #<PROCESS worker(188) [Active] #x18BF99CE> is the thread that is doing the compilation. Of course, there still remains the question of why these variables are local to a thread, and also why SBCL does not behave the same way.
I can see that with CCL and some older (which I use) SLIME, too. Haven't tried it with a newer SLIME.
It does not happen with SBCL or LispWorks.
*read-default-float-format* is one of the I/O variables of Common Lisp. Something like WITH-STANDARD-IO-SYNTAX binds them to the standard values and, on exit, restores the previous values. So I suspect that CCL's SLIME code has such a binding in effect. This is confirmed by setting other I/O variables like *read-base* - they are also bound.
CCL, Emacs and SLIME has some layers of code which makes it slightly complex to debug this.
Emacs runs the ELISP code of SLIME and talks to SWANK.
SWANK is the backend of SLIME and is Common Lisp code running in CCL.
SWANK has portable code and some CCL-specific code.
On the Emacs side the SLIME/ELISP function SLIME-COMPILE-AND-LOAD-FILE is used.
On the SWANK side the Common Lisp function swank:compile-file-for-emacs gets called.
Later SWANK:LOAD-FILE gets called.
I don't see where the I/O variables are bound - maybe it is in the CCL networking code?
This seems to be the answer:
If CCL has thread-local I/O variables, then the compilation happens in another thread and won't change the I/O bindings in the REPL thread and also not in any other thread.
This is a difference between various CL implementations. Since the CL standard does not specify threads or processes, it is also not specified if special bindings are global or have a thread-local binding by default...
If you think about it, protecting a thread's I/O variables against changes from other threads makes sense...
So, the correct way to deal with it should be to make sure in each thread independently that the right I/O variable values are in effect.
Let me expand a bit why things are like they are.
Typically a Common Lisp implementation can run more than one thread. Some implementations also allow concurrent threads running at the same time on different cores. These threads can do very different things: one could run a REPL, another one could answer an HTTP request, one could load data from the disk and another one could read the contents of an Email. Lisp in this case runs several different tasks inside one Lisp system.
Lisp has several variables which determine the behavior of I/O operations. For example which format floats are when read or which base integer numbers are in when read. The letter is done by `read-base.
Now imagine that the above disk reading thread has set its *read-base* to 16 for some purpose. Now you change the global in another thread to 8 and then suddenly all other threads have base 8. The consequence: the disk reading thread will suddenly see *read-base* 8 instead of 16 and work differently.
So it makes sense to prevent this in some way. The simplest is that in each thread the running code has their own bindings for the I/O values and then changing the *read-base* won't have effects on other threads. These bindings are usually introduced by LET or a function call. Typically the code would be responsible to bind the variables.
Another way to prevent it is to give each thread a number of initial bindings, which should for example include the I/O bindings. CCL does that. LispWorks for example does that, too. But not for the I/O variables.
Now each Lisp might give you a non-portable way to change the thread-local top binding (CCL has that, too - for example (setf ccl:symbol-value-in-process) ). Still it would not mean that it might change the binding in effect in the REPL. Since the REPL itself is a piece of Lisp code, running in a thread and it could have been setting up its own bindings.
In CCL you can also set the global static binding: (CCL::%SET-SYM-GLOBAL-VALUE sym value). But if you use such functionality you are probably doing something wrong or you have a good reason.
Some background for CCL: http://clozure.com/pipermail/openmcl-devel/2011-June/012882.html
Lore
Don't try to change global bindings from one thread visible for other threads.
Shield your code from changes in other threads by binding the crucial variables to the correct values.
Write a function which sets up the variable values to some values in a controllable fashion.