How should I organise my OCaml project? - functional-programming

I know this question is quite general and I even don't know how to better ask.
I don't have much experiences in C and I just wish I can do similar thing in OCaml as in Java.
For example, in Java, I normally create a project (using Eclipse or other IDE), then I have a src folder and a bin folder. All compiled stuff go to bin.
So for the starter, how could I do something as simple as above? just split the source files and compile files easily?
Normally, how do you guys organise OCaml project files?
The final question is that should I use mli or module? I noticed that ocaml-batteries-included is using mli a lot.

There's plenty of differing opinion about this subject in the OCaml community and it really depends on how structure or flexibility do you want. Myself I use a makefile and ocamlbuild if I'm being lazy and oasis for everything else. You should check out a few random OCaml projects to see how it works and what you would like. For example an oasis project might look like so: https://github.com/avsm/ocaml-github. You only need to look at the _oasis file (think of it like pom.xml if you ever used maven). Running oasis setup should generate all of the build files for you. ocaml setup.ml -all followed by ocaml setup.ml -install would install the library on your system.
As for using mli's. There's a tiny bit of controversy regarding them. Here's a discussion from the mailing lists: http://www.mentby.com/Group/caml-list/why-should-i-use-mli-files.html
My own opinion is that they are optional unless your modules are part of a public API that others will use. In which case they are mandatory.

Nice question !
I will be really interested in seeing other answers, but here is how I organize my projects:
First of all, I use Eclipse with the excellent OCalIDE, it's a really cool eclipse plugin, actively maintained.
If you are an Emacs user, you can use TypeRex (which is dead, but as the OCaml community move very slowly you have all your time).
If you are a Vim user, there is always Omlet but there isn't any really nice solution.
With Eclipse, you can choose "Managed Ocaml Project" which means basically I don't want to worry about compiling stuff and I will never share this project.
Start with that, that's good enough for personal projects and temporary tests. But if you can't you will have to choose between "Ocaml Makefile Project" and "Ocaml project with ocamlbuild". Choose Makefile, it's far more flexible and easy solution.
Eclipse will provide you a default Makefile for Ocaml projects, very well explained in the comments. I recommend you to use it if you are not familiar with the Ocaml build system. If you are, I recommend you to use your own Makefile because the default one is too huge and unreadable (I think).
Great ! Now we have our project in our favorite editor and are ready to put a global structure !
At the root of the project, I follow the convention of a classic GNU tarball, that said:
/
|- src/ # source files
|- lib/ # dependencies
|- test/ # tests files and test binaries
|- _build/ # binaries and object files, sometimes managed by ocamlbuild
|- AUTHORS # who did that marvelous stuff
|- README # what is it
|- Makefile # *always* provide a Makefile, you never know...
|- _tags # when I use ocamlbuild
|- _oasis # when I use oasis
Sometimes, there is no lib directory, that's good. But you should provide a AUTHOR and README file because it's quiet positive for your project.
That was the boring part, what about the src directory ?
OCaml Modular system is very helpful in making things in their own containers. I note in my own experience that:
I don't use internal modules for anything but functors
I keep my modules consistent by themselves (sort of golden rule). For that my modules name are mostly related to datatypes or specific containers with internal states (what javaists call Singletons)
I often make two or three obviously specific modules: the entry point, the module that contains all common types, and the module that contains all commonly used functions. It avoid circular dependency.
I keep all flat in the src/ directory unless I can really see a structure in my modules (those are parsing, those are AI computing, those are network...). The same way I do with C projects.
OCaml is a concise language so you should have few files. As for C projects, try too keep your directory architecture as flat as possible, remember that directory aren't part of module name or namespace, so it's only to programmers convenience !
The mli part: it really depends of the goal of your project. This is my methodology :
the only general rule is: document the mli if it exists. Mlis are here to help compilers and programmers.
Is it for you ? Document in the ml part and generate only mli for typing constraints or too-large-to-browse ml.
You want to distribute your project ? Document in the mli because that the first file we will read if we have the choice. Choose the files you will generate interfaces, maybe some are internals and you don't want them to be read by casual users.
They are always needed when you deal with the OCaml object system as it can become messy very quick.
In the general case I'm happy to keep my mli as few as possible so a guest will immediately know which files are good to know and which are internals/advanced.
Moreover it helps for refactoring because I don't have type constraints that stops me going ahead.
Keep in mind that the test suite is here to make sure we don't broke anything (see OUnit, your distro should have packaged one for you. It's very simple and efficient, good project).
That's all I see. Hope that will help anyone !

Related

Using MSBUILD like a classic MAKEfile -- how do I do this?

I'm frustrated by the lack of flexibility in the Visual Studio project/solution, but I realized that now that it uses MSBUILD it might be quite powerful but just doesn't expose that to the IDE. So I took a look at MSBUILD docs and don't know where to start! I wish there was a Nutshell book for that. Is there any good tutorial someone could point me to?
More specifically, here is the kinds of things I want to do:
Run a utility pre-processor to generate .CPP and .H files, which are then used by a regular C++ project. There are multiple inputs (to figure dependencies of; specifically should know if a normal .h file it uses has changed) and multiple outputs (at least one .cpp and one .h file) that are used as files in another project.
FWIW, the most complex case involves using Qt in a "normal" C++ project that can be built using VS Express 2010 or MSBUILD directly from a script on a server. Since that is a common library, there might be some guides or whatever to help? Note that a VS plug-in is not useful for the building stage, but could be used to initially generate project files that then rely only on MSBUILD and stuff included with the source code.
Would somebody please point me in the right direction?
--John
It gets worse from there, but that's my first goal.
I found the kind of information I was looking for in a book MSBuild Trickery: 99 Ways to Bend the Build Engine to Your Will by Brian Kretzler.
In the first 18 pages I found a few key pieces of information that, along with the on-line documentations I've already gone through, helps clear things up enough to try tackling my project. Details of interest include the processing order of how MSBuild reads and operates on the things in the file, quick points on when wildcard in items are expanded and how to handle generated files, and how to see what's happening in some practical cases or even step in the debugger.
FWIW, I managed to attack my problem without using the murky ".targets"/rules files that I have yet to understand, but only using better documented/exampled features (in particular, a Target that has wildcard items doesn't care that the file name extension is not in any ".target"; is simple enough to copy from example and allows the files to be seen in the IDE Project and added to the list using the IDE; again, the FileExtension there just works OK.)

designing large projects in OCaml [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What is a best practices to write large software projects in OCaml?
How do you structure your projects?
What features of OCaml should and should not be used to simplify code management? Exceptions? First-class modules? GADTs? Object types?
Build system? Testing framework? Library stack?
I found great recommendations for haskell and I think it would be good to have something similar for OCaml.
I am going to answer for a medium-sized project in the conditions that I am familiar with, that is between 100K and 1M lines of source code and up to 10 developers. This is what we are using now, for a project started two months ago in August 2013.
Build system and code organization:
one source-able shell script defines PATH and other variables for our project
one .ocamlinit file at the root of our project loads a bunch of libraries when starting a toplevel session
omake, which is fast (with -j option for parallel builds); but we avoid making crazy custom omake plugins
one root Makefile contains all the essential targets (setup, build, test, clean, and more)
one level of subdirectories, not two
most subdirectories build into an OCaml library
some subdirectories contain other things (setup, scripts, etc.)
OCAMLPATH contains the root of the project; each library subdirectory produces a META file, making all OCaml parts of the projects accessible from the toplevel using #require.
only one OCaml executable is built for the whole project (saves a lot of linking time; still not sure why)
libraries are installed via a setup script using opam
local opam packages are made for software that it not in the official opam repository
we use an opam switch which is an alias named after our project, avoiding conflicts with other projects on the same machine
Source-code editing:
emacs with opam packages ocp-indent and ocp-index
Source control and management:
we use git and github
all new code is peer-reviewed via github pull requests
tarballs for non-opam non-github libraries are stored in a separate git repository (that can be blown away if history gets too big)
bleeding-edge libraries existing on github are forked into our github account and installed via our own local opam package
Use of OCaml:
OCaml will not compensate for bad programming practices; teaching good taste is beyond the scope of this answer. http://ocaml.org/learn/tutorials/guidelines.html is a good starting point.
OCaml 4.01.0 makes it much easier than before to reuse record field labels and variant constructors (i.e. type t1 = {x:int} type t2 = {x:int;y:int} let t1_of_t2 ({x}:t2) : t1 = {x} now works)
we try to not use camlp4 syntax extensions in our own code
we do not use classes and objects unless mandated by some external library
in theory since OCaml 4.01.0 we should prefer classic variants over polymorphic variants
we use exceptions to indicate errors and let them go through happily until our main server loop catches them and interprets them as "internal error" (default), "bad request", or something else
exceptions such as Exit or Not_found can be used locally when it makes sense, but in module interfaces we prefer to use options.
Libraries, protocols, frameworks:
we use Batteries for all commodity functions that are missing from OCaml's standard library; for the rest we have a "util" library
we use Lwt for asynchronous programming, without the syntax extensions, and the bind operator (>>=) is the only operator that we use (if you have to know, we do reluctantly use camlp4 preprocessing for better exception tracking on bind points).
we use HTTP and JSON to communicate with 3rd-party software and we expect every modern service to provide such APIs
for serving HTTP, we run our own SCGI server (ocaml-scgi) behind nginx
as an HTTP client we use Cohttp
for JSON serialization we use atdgen
"Cloud" services:
we use quite a lot of them as they are usually cheap, easy to interact with, and solve scalability and maintenance problems for us.
Testing:
we have one make/omake target for fast tests and one for slow tests
fast tests are unit tests; each module may provide a "test" function; a test.ml file runs the list of tests
slow tests are those that involve running multiple services; these are crafted specifically for our project, but they cover as much as possible as a production service. Everything runs locally either on Linux or MacOS, except for cloud services for which we find ways to not interfere with production.
Setting this all up is quite a bit of work, especially for someone not familiar with OCaml. There is no framework taking care of all that yet, but at least you get the choice of the tools.
OASIS
To add to Pavel answer:
Disclaimer: I am the author of OASIS.
OASIS also has oasis2opam that can help to create OPAM package quickly and oasis2debian to create Debian packages. This is extremly useful if you want to create a 'release' target that automate most of the tasks to upload a package.
OASIS is also shipped with a script called oasis-dist.ml that creates automatically tarball for upload.
Look all this in https://github.com/ocaml.org.
Testing
I use OUnit to do all my tests. This is simple and pretty efficient if you are used to xUnit testing.
Source control/management
Disclaimer: I am the owner/maintainer of forge.ocamlcore.org (aka forge.o.o)
If you want to use git, I recommend to use github. This is really efficient for review.
If you use darcs or subversion, you can create an account on forge.o.o.
In both case having a public mailing list where you send all commit notification is a must have, so that everyone can see them and review them. You can use either Google groups or a mailing list on forge.o.o.
I recommend to have a nice web (github or forge.o.o) page with OCamldoc documentation build everytime you commit. If you have a huge code base this will help you to use the OCamldoc generated documentation right from the beginning (and fix it quickly).
I recommend to create tarballs when you reach a stable stage. Don't just rely on checking out the latest git/svn version. This tip has saved me hours of work in the past. As said by Martin, store all your tarballs in a central place (a git repository is a good idea for that).
This one probably doesn't answer your question completely, but here is my experience regarding build environment:
I really appreciate OASIS. It has a nice set of features, helping not only to build the project, but also to write documentation and support test environment.
Build system
OASIS generates setup.ml file from the specification (_oasis file), which works basically as a building script. It accepts -configure, -build, -test, -distclean flags. I quite used to them while working with different GNU and other projects that usually use Makefiles and I find it convenient that it is possible to use all of them automatically here.
Makefiles. Instead of generating setup.ml, it is also possible to generate Makefile with all options described above available.
Structure
Usually my project that is built by OASIS has at least three directories: src, _build, scripts and tests.
In the former directory all source files are stored in one directory: source (.ml) and interface (.mli) files are stored together. May be if the project is too large, it is worth introducing more subdirectories.
The _build directory is under the influence of OASIS build system. It stores both source and object files there and I like that build files are not interfered with source files, so I can easily delete it in case something goes wrong.
I store multiple shell scripts in the scripts directory. Some of them are for test execution and interface file generation.
All input and output files for tests I store in a separate directory.
Interfaces/Documentation
The use of interface files (.mli) has both advantages and drawbacks for me. It really helps to find type errors, but if you have them, you have to edit them as well when making changes or improvements in your code. Sometimes forgetting this causes nasty errors.
But the main reason why I like interface files is documentation. I use ocamldoc to generate (OASIS supports this feature with -doc flag) html pages with documentation automatically. In my opinion it is enough to write comments describing each function in the interface and not to insert comments in the middle of code. In OCaml functions are usually short and concise and if there is a necessity to insert extra comments there, may be it is better to split the function.
Also be aware of -i flag for ocamlc. The compiler can automatically generate interface file for a module.
Tests
I didn't find a reasonable solution for supporting tests (I would like to have some ocamltest application), that's why I am using my own scripts for executing and verifying use cases. Fortunately, OASIS supports executing custom commands when setup.ml is run with -test flag.
I don't use OASIS for a long time and if anyone knows any other cool features, I would like also to know about them.
Also, it you are not aware of OPAM, it is definitely worth looking at. Without it installing and managing new packages is a nightmare.

What use does ./configure serve (other than checking dependencies)

Why does every source package that uses a makefile come with a ./configure script, what does it do? As far as I can tell, it actually generates the makefile?
Is there anything that can't be done in the makefile?
configure is usually a result of the 'autoconf' system. It can generate headers, makefiles, really anything. 'Usually,' since some are hand-crafted.
The reason for these things is to allow source code to be compiled in disparate environments. There are many variations on Unix / Linux, with slightly different system headers and libraries. configure scripts allow code to auto-adapt.
The configure step is a sort of meta build. It generates the makefile that will work on the specific hardware / distribution you're running. For instance it determines the name of the C or C++ compiler and embeds that in the makefile.
The configure step will also frequently take a set of parameters, the values of which may determine what libraries need to be linked against. For instance if you compile Apache HTTP with SSL enabled it needs to link against more shared libraries than if you don't. Since linking is handled by the makefile, you need an extra step to create a custom makefile (rather than requiring the make command to require dozens or hundreds of options.
Everything can be done from within the makefile but some build systems were implemented otherwise.
I don't personally use configure files for my projects but I admit I mostly export Erlang & Python based projects.
I don't think of the makefile as a script, I think of it as an input to the make utility.
The configure script (as its name suggests) configures the makefile, including as you say resolving dependencies.
If only from the idea of avoiding self-modifying code, the things in the configure script don't really belong in the makefile.
the point is that autoconf autohdr automake form an integrated system the makes cross platform building on unix relatively str8forward. THe docs are really bad and there are lots of horrible gotchas but on the other hand there are a lot of working samples
When I first came across this stuff I thought - "ha I can do that with a nice clean makefile" and proceeded to rework the source that way. Big mistake. Learn to write and edit configure.ac and makefile.am files, you will be happy in the end
To answer your question. Configure is good for
is function foo available on this platform and if so which include and library do I need
letting the builder choose if they want feature wizzbang included in a nice simple consistent way

Tutorials for setting up a robust small-to-medium project *nix build

I'm mostly a spoiled Windows + Visual Studio (or Borland C++ or whatever, in the past) developer. Although my first contact with Unix was around 20 years ago, and I've used Linux on-and-off for some years, I have only a very limited idea of how to set up a build on a *nix system.
For example, I'm OK with the basics of make - I can get a number of files to compile and link. But I don't really know how to set things up to cope with multiple configurations - how to get all the object files and targets for the release version to go to different folders from the debug version etc etc. Yes, I can RTFM and improvise something, but it's a fair guess that I'd improvise something stupid, overcomplex, fragile and WTF, where it'd make so much more sense to copy a common convention if only I knew what the common conventions are.
Also, I can run a configure script, and I'm vaguely aware that they're associated with autoconf, whatever that is, but I have little idea if/why/how I should set this kind of stuff up in my own projects.
Hopefully, this is enough to give the general idea of what I'm looking for. Of course I could ask/search specific questions here, but that assumes I know all the right questions to ask which I almost certainly don't. So - any pointers?
EDIT
Just thought I'd update this with some longer-term experience.
I tried using premake for a while, but couldn't live with it in the long run. In substantial part there's a dislike of Lua behind that.
Now, I'm using cmake. It generates makefiles/visual studio projects/whatever. It has (so far) handled everything I've needed to do, including support for unit testing and custom build steps. And as I got used to the cmake way of doing things, I found it was a good way, allowing me to easily use multiple sets of tools at once - I can be checking test coverage in MinGW GCC while simultaneously debugging in Visual Studio.
That reveals, of course, that I'm still mostly working in Windows - but switching back and forth is easier than ever.
The downsides of cmake...
Although it generates makefiles/whatever, it can't really be seen as a makefile generator. The resulting makefiles are dependent on cmake being installed. To be honest, I don't really understand why they don't drop makefiles altogether for makefile platforms and just do the building directly, slightly reducing the potential for problems.
It wasn't easy to get started.
The second point has mostly been resolved by asking questions here...
How do I fix this cmake file? - problem linking to imported library
How to apply different compiler options for different compilers in cmake?
For the cmake "include" command, what is the difference between a file and a module?
How to adapt my unit tests to cmake and ctest?
I'd strongly suggest using one of the newer cousins of Make instead of autoconf or makefiles for smaller projects. One of the best ones for you (and the one I mostly love) could be premake4. Why do I suggest it? Because it's extremely simple to use, yet quite powerful, and capable of producing GNU Makefiles, Visual Studio projects, Code::Blocks projects and many more. And the premake files are very clear and readable, using the Visual Studio nomenclature that you're already familliar with.
Here's an example:
-- A solution contains projects, and defines the available configurations
solution "MyApplication"
configurations { "Debug", "Release" }
-- A project defines one build target
project "MyApplication"
kind "ConsoleApp"
language "C++"
files { "inc/**.h", "src/**.cpp", "main.cpp" }
configuration "Debug"
defines { "DEBUG" }
flags { "Symbols" }
configuration "Release"
defines { "NDEBUG" }
flags { "Optimize" }
I would stay away from autoconf until you actually need it as it is pretty complex to use, and in most small projects, make is all you need...
I don't have a tutorial, but I will give you an extremely amazing(and portable between both BSD and GNU make) makefile to start with for small projects. I dug it out of the original BSD 4.3 assembler(as)
HDRS = project.h
OBJS = main.o
LDFLAGS =
project: ${OBJS}
${CC} ${LDFLAGS} ${OBJS} -o project
.c.o: ${HDRS}
${CC} ${CFLAGS} -c $*.c
clean:
rm -f *.o ${OBJS} project *.core a.out errs core
just replace Project with your projects name and such..
edit:
technically, you need a BSD 3 clause license for Berkeley... as per:
* Copyright (c) 1982 Regents of the University of California.
* All rights reserved. The Berkeley software License Agreement
* specifies the terms and conditions for redistribution.

What is your experience with non-recursive make? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
A few years ago, I read the Recursive Make Considered Harmful paper and implemented the idea in my own build process. Recently, I read another article with ideas about how to implement non-recursive make. So I have a few data points that non-recursive make works for at least a few projects.
But I'm curious about the experiences of others. Have you tried non-recursive make? Did it make things better or worse? Was it worth the time?
We use a non-recursive GNU Make system in the company I work for. It's based on Miller's paper and especially the "Implementing non-recursive make" link you gave. We've managed to refine Bergen's code into a system where there's no boiler plate at all in subdirectory makefiles. By and large, it works fine, and is much better than our previous system (a recursive thing done with GNU Automake).
We support all the "major" operating systems out there (commercially): AIX, HP-UX, Linux, OS X, Solaris, Windows, even the AS/400 mainframe. We compile the same code for all of these systems, with the platform dependent parts isolated into libraries.
There's more than two million lines of C code in our tree in about 2000 subdirectories and 20000 files. We seriously considered using SCons, but just couldn't make it work fast enough. On the slower systems, Python would use a couple of dozen seconds just parsing in the SCons files where GNU Make did the same thing in about one second. This was about three years ago, so things may have changed since then. Note that we usually keep the source code on an NFS/CIFS share and build the same code on multiple platforms. This means it's even slower for the build tool to scan the source tree for changes.
Our non-recursive GNU Make system is not without problems. Here are some of biggest hurdles you can expect to run into:
Making it portable, especially to Windows, is a lot of work.
While GNU Make is almost a usable functional programming language, it's not suitable for programming in the large. In particular, there are no namespaces, modules, or anything like that to help you isolate pieces from each other. This can cause problems, although not as much as you might think.
The major wins over our old recursive makefile system are:
It's fast. It takes about two seconds to check the entire tree (2k directories, 20k files) and either decide it's up to date or start compiling. The old recursive thing would take more than a minute to do nothing.
It handles dependencies correctly. Our old system relied on the order subdirectories were built etc. Just like you'd expect from reading Miller's paper, treating the whole tree as a single entity is really the right way to tackle this problem.
It's portable to all of our supported systems, after all the hard work we've poured into it. It's pretty cool.
The abstraction system allows us to write very concise makefiles. A typical subdirectory which defines just a library is just two lines. One line gives the name of the library and the other lists the libraries this one depends on.
Regarding the last item in the above list. We ended up implementing a sort of macro expansion facility within the build system. Subdirectory makefiles list programs, subdirectories, libraries, and other common things in variables like PROGRAMS, SUBDIRS, LIBS. Then each of these are expanded into "real" GNU Make rules. This allows us to avoid much of the namespace problems. For example, in our system it's fine to have multiple source files with the same name, no problem there.
In any case, this ended up being a lot of work. If you can get SCons or similar working for your code, I'd advice you look at that first.
After reading the RMCH paper, I set out with the goal of writing a proper non-recursive Makefile for a small project I was working on at the time. After I finished, I realized that it should be possible to create a generic Makefile "framework" which can be used to very simply and concisely tell make what final targets you would like to build, what kind of targets they are (e.g. libraries or executables) and what source files should be compiled to make them.
After a few iterations I eventually created just that: a single boilerplate Makefile of about 150 lines of GNU Make syntax that never needs any modification -- it just works for any kind of project I care to use it on, and is flexible enough to build multiple targets of varying types with enough granularity to specify exact compile flags for each source file (if I want) and precise linker flags for each executable. For each project, all I need to do is supply it with small, separate Makefiles that contain bits similar to this:
TARGET := foo
TGT_LDLIBS := -lbar
SOURCES := foo.c baz.cpp
SRC_CFLAGS := -std=c99
SRC_CXXFLAGS := -fstrict-aliasing
SRC_INCDIRS := inc /usr/local/include/bar
A project Makefile such as the above would do exactly what you'd expect: build an executable named "foo", compiling foo.c (with CFLAGS=-std=c99) and baz.cpp (with CXXFLAGS=-fstrict-aliasing) and adding "./inc" and "/usr/local/include/bar" to the #include search path, with final linking including the "libbar" library. It would also notice that there is a C++ source file and know to use the C++ linker instead of the C linker. The framework allows me to specify a lot more than what is shown here in this simple example.
The boilerplate Makefile does all the rule building and automatic dependency generation required to build the specified targets. All build-generated files are placed in a separate output directory hierarchy, so they're not intermingled with source files (and this is done without use of VPATH so there's no problem with having multiple source files that have the same name).
I've now (re)used this same Makefile on at least two dozen different projects that I've worked on. Some of the things I like best about this system (aside from how easy it is to create a proper Makefile for any new project) are:
It's fast. It can virtually instantly tell if anything is out-of-date.
100% reliable dependencies. There is zero chance that parallel builds will mysteriously break, and it always builds exactly the minimum required to bring everything back up-to-date.
I will never need to rewrite a complete makefile again :D
Finally I'd just mention that, with the problems inherent in recursive make, I don't think it would have been possible for me to pull this off. I'd probably have been doomed to rewriting flawed makefiles over and over again, trying in vain to create one that actually worked properly.
Let me stress one argument of Miller's paper: When you start to manually resolve dependency relationships between different modules and have a hard time to ensure the build order, you are effectively reimplementing the logic the build system was made to solve in the first place. Constructing reliable recursive make build systems is very hard. Real life projects have many interdependent parts whose build order is non-trivial to figure out and thus, this task should be left to the build system. However, it can only resolve that problem if it has global knowledge of the system.
Furthermore, recursive make build-systems are prone to fall apart when building concurrently on multiple processors/cores. While these build systems may seem to work reliably on a single processor, many missing dependencies go undetected until you start to build your project in parallel. I've worked with a recursive make build system which worked on up to four processors, but suddenly crashed on a machine with two quad-cores. Then I was facing another problem: These concurrency issues are next to impossible to debug and I ended up drawing a flow-chart of the whole system to figure out what went wrong.
To come back to your question, I find it hard to think of good reasons why one wants to use recursive make. The runtime performance of non-recursive GNU Make build systems is hard to beat and, quite the contrary, many recursive make systems have serious performance problems (weak parallel build support is again a part of the problem). There is a paper in which I evaluated a specific (recursive) Make build system and compared it to a SCons port. The performance results are not representative because the build system was very non-standard, but in this particular case the SCons port was actually faster.
Bottom line: If you really want to use Make to control your software builds, go for non-recursive Make because it makes your life far easier in the long run. Personally, I would rather use SCons for usability reasons (or Rake - basically any build system using a modern scripting language and which has implicit dependency support).
I made a half-hearted attempt at my previous job at making the build system (based on GNU make) completely non-recursive, but I ran into a number of problems:
The artifacts (i.e. libraries and executables built) had their sources spread out over a number of directories, relying on vpath to find them
Several source files with the same name existed in different directories
Several source files were shared between artifacts, often compiled with different compiler flags
Different artifacts often had different compiler flags, optimization settings, etc.
One feature of GNU make which simplifies non-recursive use is target-specific variable values:
foo: FOO=banana
bar: FOO=orange
This means that when building target "foo", $(FOO) will expand to "banana", but when building target "bar", $(FOO) will expand to "orange".
One limitation of this is that it is not possible to have target-specific VPATH definitions, i.e. there is no way to uniquely define VPATH individually for each target. This was necessary in our case in order to find the correct source files.
The main missing feature of GNU make needed in order to support non-recursiveness is that it lacks namespaces. Target-specific variables can in a limited manner be used to "simulate" namespaces, but what you really would need is to be able to include a Makefile in a sub-directory using a local scope.
EDIT: Another very useful (and often under-used) feature of GNU make in this context is the macro-expansion facilities (see the eval function, for example). This is very useful when you have several targets which have similar rules/goals, but differ in ways which cannot be expressed using regular GNU make syntax.
I agree with the statements in the refered article, but it took me a long time to find a good template which does all this and is still easy to use.
Currenty I'm working on a small research project, where I'm experimenting with continuous integration; automatically unit-test on pc, and then run a system test on a (embedded) target. This is non-trivial in make, and I've searched for a good solution. Finding that make is still a good choice for portable multiplatform builds I finally found a good starting point in http://code.google.com/p/nonrec-make
This was a true relief. Now my makefiles are
very simple to modify (even with limited make knowledge)
fast to compile
completely checking (.h) dependencies with no effort
I will certainly also use it for the next (big) project (assuming C/C++)
I have developed a non-recursive make system for a one medium sized C++ project, which is intended for use on unix-like systems (including macs). The code in this project is all in a directory tree rooted at a src/ directory. I wanted to write a non-recursive system in which it is possible to type "make all" from any subdirectory of the top level src/ directory in order to compile all of the source files in the directory tree rooted at the working directory, as in a recursive make system. Because my solution seems to be slightly different from others I have seen, I'd like to describe it here and see if I get any reactions.
The main elements of my solution were as follows:
1) Each directory in the src/ tree has a file named sources.mk. Each such file defines a makefile variable that lists all of the source files in the tree rooted at the directory. The name of this variable is of the form [directory]_SRCS, in which [directory] represents a canonicalized form of the path from the top level src/ directory to that directory, with backslashes replaced by underscores. For example, the file src/util/param/sources.mk defines a variable named util_param_SRCS that contains a list of all source files in src/util/param and its subdirectories, if any. Each sources.mk file also defines a variable named [directory]_OBJS that contains a list of the the corresponding object file *.o targets. In each directory that contains subdirectories, the sources.mk includes the sources.mk file from each of the subdirectories, and concatenates the [subdirectory]_SRCS variables to create its own [directory]_SRCS variable.
2) All paths are expressed in sources.mk files as absolute paths in which the src/ directory is represented by a variable $(SRC_DIR). For example, in the file src/util/param/sources.mk, the file src/util/param/Componenent.cpp would be listed as $(SRC_DIR)/util/param/Component.cpp. The value of $(SRC_DIR) is not set in any sources.mk file.
3) Each directory also contains a Makefile. Every Makefile includes a global configuration file that sets the value of the variable $(SRC_DIR) to the absolute path to the root src/ directory. I chose to use a symbolic form of absolute paths because this appeared to be the easiest way to create multiple makefiles in multiple directories that would interpret paths for dependencies and targets in the same way, while still allowing one to move the entire source tree if desired, by changing the value of $(SRC_DIR) in one file. This value is set automatically by a simple script that the user is instructed to run when the package is dowloaded or cloned from the git repository, or when the entire source tree is moved.
4) The makefile in each directory includes the sources.mk file for that directory. The "all" target for each such Makefile lists the [directory]_OBJS file for that directory as a dependency, thus requiring compilation of all of the source files in that directory and its subdirectories.
5) The rule for compiling *.cpp files create a dependency file for each source file, with a suffix *.d, as a side-effect of compilation, as described here: http://mad-scientist.net/make/autodep.html. I chose to use the gcc compiler for dependency generation, using the -M option. I use gcc for dependency generation even when using another compiler to compile the source files, because gcc is almost always available on unix-like systems, and because this helps standardize this part of the build system. A different compiler can be used to actually compile the source files.
6) The use of absolute paths for all files in the _OBJS and _SRCS variables required that I write a script to edit the dependency files generated by gcc, which creates files with relative paths. I wrote a python script for this purpose, but another person might have used sed. The paths for dependencies in the resulting dependency files are literal absolute paths. This is fine in this context because the dependency files (unlike the sources.mk files) are generated locally and rather than being distributed as part of the package.
7) The Makefile in each director includes the sources.mk file from the same directory, and contains a line "-include $([directory]_OBJS:.o=.d)" that attempts to include a dependency files for every source file in the directory and its subdirectories, as described in the URL given above.
The main difference between this an other schemes that I have seen that allow "make all" to be invoked from any directory is the use of absolute paths to allow the same paths to be interpreted consistently when Make is invoked from different directories. As long as these paths are expressed using a variable to represent the top level source directory, this does not prevent one from moving the source tree, and is simpler than some alternative methods of achieving the same goal.
Currently, my system for this project always does an "in-place" build: The object file produced by compiling each source file is placed in the same directory as the source file. It would be straightforward to enable out-of place builds by changing the script that edits the gcc dependency files so as to replace the absolute path to the src/ dirctory by a variable $(BUILD_DIR) that represents the build directory in the expression for the object file target in the rule for each object file.
Thus far, I've found this system easy to use and maintain. The required makefile fragments are short and comparatively easy for collaborators to understand.
The project for which I developed this system is written in completely self-contained ANSI C++ with no external dependencies. I think that this sort of homemade non-recursive makefile system is a reasonable option for self-contained, highly portable code. I would consider a more powerful build system such as CMake or gnu autotools, however, for any project that has nontrivial dependencies on external programs or libraries or on non-standard operating system features.
I know of at least one large scale project (ROOT), which advertises using [powerpoint link] the mechanism described in Recursive Make Considered Harmful. The framework exceeds a million lines of code and compiles quite smartly.
And, of course, all the largish projects I work with that do use recursive make are painfully slow to compile. ::sigh::
I written a not very good non-recursive make build system, and since then a very clean modular recursive make build system for a project called Pd-extended. Its basically kind of like a scripting language with a bunch of libraries included. Now I'm also working Android's non-recursive system, so that's the context of my thoughts on this topic.
I can't really say much about the differences in performance between the two, I haven't really paid attention since full builds are really only ever done on the build server. I am usually working either on the core language, or a particular library, so I am only interested in building that subset of the whole package. The recursive make technique has the huge advantage of making the build system be both standalone and integrated into a larger whole. This is important to us since we want to use one build system for all libraries, whether they are integrated in or written by an external author.
I'm now working on building custom version of Android internals, for example an version of Android's SQLite classes that are based on the SQLCipher encrypted sqlite. So I have to write non-recursive Android.mk files that are wrapping all sorts of weird build systems, like sqlite's. I can't figure out how to make the Android.mk execute an arbitrary script, while this would be easy in a traditional recursive make system, from my experience.

Resources