Override -j setting for one source file? - gnu-make

I have a test script that takes from hours to days to run. The test script repeatedly builds a library and runs its self tests under different configurations.
On desktops and servers, the script enjoys a speedup because it uses -j N, where N is the number of cores available. It will take about 2 hours to run the test script.
On dev-boards like a LeMaker Hikey (8-core ARM64/2GB RAM) and CubieTruck (8-core ARM/2GB RAM), I can't use -j N (for even N=2 or N=4) because one file is a real monster and causes an OOM kill. In this case it can take days for the script to run.
My question is, how can I craft a make recipe that tells GNUmake to handle this one source file with -j 1? Is it even possible?

I'm not sure if it is possible. It isn't clear how Make splits jobs amongst cores.
4.9 Special Built-in Target Names mentions
.NOTPARALLEL
If .NOTPARALLEL is mentioned as a target, then this invocation of make will be run serially, even if the -j option is given. Any
recursively invoked make command will still run recipes in parallel
(unless its makefile also contains this target). Any prerequisites on
this target are ignored.
However, 5.7.3 Communicating Options to a Sub-make says:
The -j option is a special case (see Parallel
Execution).
If you set it to some numeric value N and your operating system
supports it (most any UNIX system will; others typically won’t), the
parent make and all the sub-makes will communicate to ensure that
there are only N jobs running at the same time between them all.
Note that any job that is marked recursive (see Instead of Executing
Recipes) doesn’t count against the total jobs (otherwise we could get
N sub-makes running and have no slots left over for any real work!)
If your operating system doesn’t support the above communication, then
-j 1 is always put into MAKEFLAGS instead of the value you
specified. This is because if the -j option were passed down to
sub-makes, you would get many more jobs running in parallel than you
asked for. If you give -j with no numeric argument, meaning to run
as many jobs as possible in parallel, this is passed down, since
multiple infinities are no more than one.
This suggests to me there is no way to assign a specific job to a single core. It's worth giving a shot though.

Make the large target first,
then everything else afterwards in parallel.
.PHONY: all
all:
⋮
.PHONY: all-limited-memory
all-limited-memory:
${MAKE} -j1 bigfile
${MAKE} all
So now
$ make -j16 all works as expected.
$ make -j4 all-memory-limited builds bigfile serially (exiting if error), carrying on to do the rest in parallel.

Related

MPI OpenMp hybrid

I am trying to run a program written for MPI and OpenMP on a cluster of Linux dual cores.
When I try to set the OMP_NUM_THREADS variable
export OMP_NUM_THREADS=2
I get a message
OMP_NUM_THREADS: Undefined variable.
I don't get a better performance with OpenMP... I also tried:
mpiexec -n 10 -genv OMP_NUM_THREADS 2 ./binary
and omp_set_num_threads(2) inside the program, but it didn't get any better...
Any ideas?
update: when I run mpiexec -n 1 ./binary with omp_set_num_threads(2) execution time is 4s and when I run mpiexec -f machines -n 1 ./binary execution time is 8s.
I would suggest doing an $echo OMP_NUM_THREADS first and further querying for the number of threads inside the program to make sure that threads are being spawned. Use the omp_get_num_threads() function for this. Further if you're using a MacOS then this blogpost can help:
https://whiteinkdotorg.wordpress.com/2014/07/09/installing-mpich-using-macports-on-mac-os-x/
The latter part in this post will help you to successfully compile and run Hybrid programs. Whether a Hybrid program gets better performance or not depends a lot on contention of resources. Excessive usage of locks, barriers - can further slow the program down. It will be great if you post your code here for others to view and to actually help you.

How to use ltrace for mpi programs?

I want to know how to use ltrace to get library function calls of mpi application but simply ltrace doesn't work and my mpirun cannot succeed.
Any idea?
You should be able to simply use:
$ mpiexec -n 4 -other_mpiexec_options ltrace ./executable
But that will create a huge mess since the outputs from the different ranks will merge. A much better option is to redirect the output of ltrace to a separate file for each rank. Getting the rank is easy with some MPI implementations. For example, Open MPI exports the world rank in the environment variable OMPI_COMM_WORLD_RANK. The following wrapper script would help:
#!/bin/sh
ltrace --output trace.$OMPI_COMM_WORLD_RANK $*
Usage:
$ mpiexec -n 4 ... ltrace_wrapper ./executable
This will produce 4 trace files, one for each rank: trace.0, trace.1, trace.2, and trace.3.
For MPICH and other MPI implementations based on it and using the Hydra PM exports PMI_RANK and the above given script has to be modified and OMPI_COMM_WORLD_RANK replaced with PMI_RANK. One could also write an universal wrapper that works with both families of MPI implementations.

When using mpirun with R script, should I copy manually file/script on clusters?

I'm trying to understand how openmpi/mpirun handle script file associated with an external program, here a R process ( doMPI/Rmpi )
I can't imagine that I have to copy my script on each host before running something like :
mpirun --prefix /home/randy/openmpi -H clust1,clust2 -n 32 R --slave -f file.R
But, apparently it doesn't work until I copy the script 'file.R' on clusters, and then run mpirun. Then, when I do this, the results are written on cluster, but I expected that they would be returned to working directory of localhost.
Is there another way to send R job from localhost to multiple hosts, including the script to be evaluated ?
Thanks !
I don't think it's surprising that mpirun doesn't know details of how scripts are specified to commands such as "R", but the Open MPI version of mpirun does include the --preload-files option to help in such situations:
--preload-files <files>
Preload the comma separated list of files to the current working
directory of the remote machines where processes will be
launched prior to starting those processes.
Unfortunately, I couldn't get it to work, which may be because I misunderstood something, but I suspect it isn't well tested because very few use that option since it is quite painful to do parallel computing without a distributed file system.
If --preload-files doesn't work for you either, I suggest that you write a little script that calls scp repeatedly to copy the script to the cluster nodes. There are some utilities that do that, but none seem to be very common or popular, which I again think is because most people prefer to use a distributed file system. Another option is to setup an sshfs file system.

Redirect Error Stream to File and Console in Windows

I want to redirect error stream from java console application to file and console. In normal situation the error is displayed only in console. I want to be displayed in console and file. How can I achieve this? When I write:
java -classpath lib.jar com.hertz.test.Blad 2>error.log
Then the errors are redirecting to file but I don't see them on console. Also does anybody know how to add date and time to the logs in this situation?
I'm working in Windows 2003 Server.
This is of course a simple exercise in piping the output through a filter, in this case the tee command, which is done in Microsoft's command interpreter much the same as in JP Software's TCC/LE and (non-C-shell family) Unix shells:
java -classpath lib.jar com.hertz.test.Blad 2>&1 | tee error-and-output.log
Treating standard output and standard error differently is little more than an exercise in redirection syntax, for which this example here is but one of several possibilities, and is a separate question.
java -classpath lib.jar com.hertz.test.Blad 2>&1 1>con | tee error.log
All that remains is to obtain a tee command. There are several possibilities:
Use a port of a Unix tee command. There are several choices. Oft-mentioned are GNUWin32, cygwin, and unxutils. Less well known, but in some ways better, are the tools in the SFUA utility toolkit, which run in the Subsystem for UNIX-based Applications that comes right there in the box with Windows 7 Ultimate edition and Windows Server 2008 R2. (For Windows XP and Windows Server 2003, one can download and install Services for UNIX version 3.5.) This toolkit has a large number of command-line TUI tools, from mv and du, through the Korn and C shells, to perl and awk. It comes in both x86-64 and IA64 flavours as well as x86-32. The programs run in Windows' native proper POSIX environment, rather than with emulator DLLs (such as cygwin1.dll) layering things over Win32. And yes, the toolkit has tee, as well as some 300 others.
Use one of the many native Win32 tee commands that people have written and published. One such is Ritchie Lawrence's MTEE, which as you can see has /D and /T options to add time and date stamps to each line that it processes.
Use a replacement command interpreter that comes with a built-in TEE command. JP Software's TCC/LE is one such. TCC/LE has a built in TEE command. As you can see, it also has /D and /T options to add time and date stamps to each line that it processes.
As an aside: It's better for your application to add date and time stamps itself than for them to be post-processed by the TEE command. For several reasons, relating both to how applications behave when their standard streams are pipes and to how pipes work, each line of output will not necessarily be processed by TEE at the time that your application generated it in the first place. The leeway will affect both the relative (to one another) and the absolute (to the wall clock) accuracy of the timestamps that you see.

GNU make's -j option

Ever since I learned about -j I've used -j8 blithely. The other day I was compiling an atlas installation and the make failed. Eventually I tracked it down to things being made out of order - and it worked fine once I went back to singlethreaded make. This makes me nervous. What sort of conditions do I need to watch for when writing my own make files to avoid doing something unexpected with make -j?
I think make -j will respect the dependencies you specify in your Makefile; i.e. if you specify that objA depends on objB and objC, then make won't start working on objA until objB and objC are complete.
Most likely your Makefile isn't specifying the necessary order of operations strictly enough, and it's just luck that it happens to work for you in the single-threaded case.
In short - make sure that your dependencies are correct and complete.
If you are using a single threaded make then you can be blindly ignoring implicit dependencies between targets.
When using parallel make you can't rely on the implicit dependencies. They should all be made explicit. This is probably the most common trap. Particularly if using .phony targets as dependencies.
This link is a good primer on some of the issues with parallel make.
Here's an example of a problem that I ran into when I started using parallel builds. I have a target called "fresh" that I use to rebuild the target from scratch (a "fresh" build). In the past, I coded the "fresh" target by simply indicating "clean" and then "build" as dependencies.
build: ## builds the default target
clean: ## removes generated files
fresh: clean build ## works for -j1 but fails for -j2
That worked fine until I started using parallel builds, but with parallel builds, it attempts to do both "clean" and "build" simultaneously. So I changed the definition of "fresh" as follows in order to guarantee the correct order of operations.
fresh:
$(MAKE) clean
$(MAKE) build
This is fundamentally just a matter of specifying dependencies correctly. The trick is that parallel builds are more strict about this than are single-threaded builds. My example demonstrates that a list of dependencies for given target does not necessarily indicate the order of execution.
If you have a recursive make, things can break pretty easily. If you're not doing a recursive make, then as long as your dependencies are correct and complete, you shouldn't run into any problems (save for a bug in make). See Recursive Make Considered Harmful for a much more thorough description of the problems with recursive make.
It is a good idea to have an automated test to test the -j option of ALL the make files. Even the best developers have problems with the -j option of make. The most common issues is the simplest.
myrule: subrule1 subrule2
echo done
subrule1:
echo hello
subrule2:
echo world
In normal make, you will see hello -> world -> done.
With make -j 4, you will might see world -> hello -> done
Where I have see this happen most is with the creation of output directories. For example:
build: $(DIRS) $(OBJECTS)
echo done
$(DIRS):
-#mkdir -p $#
$(OBJECTS):
$(CC) ...
Just thought I would add to subsetbrew's answer as it does not show the effect clearly. However adding some sleep commands does. Well it works on linux.
Then running make shows differences with:
make
make -j4
all: toprule1
toprule1: botrule2 subrule1 subrule2
#echo toprule 1 start
#sleep 0.01
#echo toprule 1 done
subrule1: botrule1
#echo subrule 1 start
#sleep 0.08
#echo subrule 1 done
subrule2: botrule1
#echo subrule 2 start
#sleep 0.05
#echo subrule 2 done
botrule1:
#echo botrule 1 start
#sleep 0.20
#echo "botrule 1 done (good prerequiste in sub)"
botrule2:
#echo "botrule 2 start"
#sleep 0.30
#echo "botrule 2 done (bad prerequiste in top)"

Resources