Recursive Make - friend or foe? [closed] - recursion

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm using (GNU) Make in my project. I'm currently putting one makefile per directory and specify the subdirectories using SUBDIRS.
It's been suggested to me that this is not the ideal way of using make, that using a one toplevel make file (or several, split up using include). I've tried migrating/using this layout in the past, but it appears to me that it's unnecessary complicated.
Which are the benefits/drawbacks of using recursive makefiles?

The first thing you should keep in mind (just to eliminate any misunderstanding) is that we're not talking about a single vs. multiple makefiles. Splitting your makefile in one per subdirectory is probably a good idea in any case.
Recursive makefiles are bad primarily because you partition your dependency tree into several trees. This prevents dependencies between make instances from being expressed correctly. This also causes (parts of) the dependency tree to be recalculated multiple times, which is a performance issue in the end (although usually not a big one.)
There are a couple of tricks you need to use in order to properly use the single-make approach, especially when you have a large code base:
First, use GNU make (you already do, I see). GNU make has a number of features which simplifies things, and you won't have to worry about compatibilities.
Second, use target-specific variable values. This will allow you to have, for example, different values of CFLAGS for different targets, instead of forcing you to have a single CFLAGS in your entire make:
main: CFLAGS=-O2
lib: CFLAGS=-O2 -g
Third, make sure you use VPATH/vpath to the full extent supported by GNU make.
You also want to make sure that you do not have multiple source files with the same name. One limitation of VPATH is that it does not allow you to have target-specific VPATH definitions, so the names of your source files will have to co-exist in a single "VPATH namespace".

An article entitled "Recursive Make Considered Harmful" can be found here: http://miller.emu.id.au/pmiller/books/rmch/?ref=DDiyet.Com. (Or at the Aegis project at SourceForge.)
It explores the problems with recursive makefiles, and recommends a single-makefile approach.

I use recursion extensively. each leaf node will have its own makefile, consider:
$LIBS = libfoo libbar
$(LIBS):
cd $# && $(MAKE)
On large systems it would be quite a challenge to not have this structure. What folks talk about "recursive make considered harmful" is an accurate assessment. I think each situation is a little different and we make some compromises.

Run, don't walk, to cmake.org and get Cmake, one of the best build tools available.
You will still be using GNU make, but in this case CMake will generate the makefiles for you.
I can't guarantee 100%, but I have yet to come across a case where it has not correctly handled dependencies between subdirectories correctly (ie the problem that plagues the recursive make). At the very least it is a lot easier to maintain Cmakefiles than makefiles. Highly recommended.
Do not use GNU autotools - that way madness lies!

The benefit that I've gotten from this in the past is that it's easier to build files in a single subdirectory. You can do this with dependencies, but it's a bit more work to keep all of the targets straight. Basically, this makes it easier to make changes and test one library without having to deal with the full complexity of the larger project.

To throw in a third option, you could use GNU Autotools.
Mostly used for other reasons, but may also helpful at organizing a multi-directory build.
http://www.lrde.epita.fr/~adl/autotools.html
It has to be noted, though, that the result is a recursive version.

Makepp is your friend.
http://makepp.sourceforge.net/
Backwards compatible with make
Automatic scanning for header dependencies
Graceful handling of directory trees

The issue with recursive make is the time overhead of evaluating all the different make files vs. evaluating one large make file. Part of this is just spawning processes but also (IIRC) you tend to be forced into assuming that other makes files did something and rebuilding when you don't really need to.
My take on it is to have a single make file per "Unit", that more or less amounts to having a make file for each chunk of code that you expect could be used on it's own (e.g. as an independent library)
OTOH my current project breaks this all over the place as I'm generating make files during the build. :b

Related

Why is .Net Core's PublishSingleFile doing self extraction instead of resource embedding assemblies?

To me, merging .Net binaries so that I'm left with one single file executable was always a both interesting and important feature. From what I've read, there are three approaches regarding this:
ILMerge
Really merges the assemblies. Comparable to creating one single .csproj and pulling in all source code.
Fody.Costura
Keeps the assemblies seperate but just includes their files as ressources and bends the resolution accordingly.
.Net Core 3+ PublishSingleFile / Warp
From what I understand the approaches are similar. Extract the files to a temp dir on first execution and just run them normally.
While the topics are quite related and sometimes do overlap I don't intend to mix up single file, self-contained deployments and trimming - I'm mainly thinking of the single file part here.
I think any of the three approaches has it's own advantages and disadvantages and no one is "the" way to go.
ILMerge
+ Seems like the most natural option for "merging"
+ Requires no runtime loading, if everything is there and works than that's it; once the image is in memory we're done
- Loading everything at once everytime might lead to lower startup performance, relevant to desktop apps
- Changes quite much about the internals, assembly names change, probably library internals get more involved with app code etc.; this might lead to hard to debug errors, in the worst case popping up in production, I guess. It just changes things developers may have taken for granted (i.e. name of their assembly).
- Probably not conceptual but doesn't support .Net Core => dealbreaker
Fody.Costura
+ Enables compression for dependencies
+ Changes much less internals, probably leading to less unforeseen situations
+ Probably requires the OS to load the full image but not the CLR to load the libs at once so might improve start-up performance
- Requires de-gzipping the dependencies every time; might be a performance penalty CPU-wise, could be a performance benefit IO-wise
Self extracting
+ Compressed too
+ Once the code is extracted and running, the only difference is where it's located. Thus, lib developers can even still find their assembly on disk etc., making the least runtime differences I guess.
- First time startup can take some noticable time and is quite IO intensive, kind of an installation.
+ Subsequent runs behave like non-single-file deployments, having their respective (dis)advantages compared to the other two.
- Comes with some non trivial problems. When to update the extracted files? Who deletes old versions of the extracted files? What if the disk is full/...?
- Executing is not side-effect free anymore.
- More then 2x the disk space needed.
In my personal obviously non-exhaustive and non-scientific experience, (3) sometimes had some rough edges and showed strange behaviour while (2) worked really good in the years I've been using it (comaring Framework vs Core though). But probably thats either due to the implementation of (3) beeing quite new or my misunderstanding.
The intention of me posting this question is to (a) understand this better in general and (b) understand why Microsoft chose (3). So, it'd be great
if I missed differences between the three to point them out (feel free to edit the question to keep things in one place).
if I made false assumptions to point them out.
if there are more options in general or if I'm wrong about one of the concepts to point them out
if there's content about the reasoning, why Microsoft selected (3) as the superior way (that would probably deepen my understanding for the pros and cons) to point me in the right direction.
While I understand it's hard to compete with a framework feature, I'm a bit sad that Costura is in maintenance mode, given I think it has advantages over the frameworks approach, making the second option a viable choice. Thus, one of the reasons I'm asking this question is to understand if there are any severe advantages of (3) over (2) I'm overlooking.

gnu file tree...is this advisable to beginners?

I am in academics, where most of the coding is done in fortran, C(++) (and little bit of python). My question is about projects of 2-3k lines of code using fortran.
In gnu's code, we generally see a very standard set of directories, eg src, doc etc and files like README, ChangeLog etc (I have not found anything that says these are standard directory, so, I assume them as good practice).
Now, for teaching new students (undergrads taking their 1st coding course), is it advisable to introduce them to this "standard" tree with the exact names? I have not seen many projects in academic world follow this.
So, for the beginners, what is better?
These are not standards in the sense of "standardised by some organization", but best practices ("coding conventions") entered in common use many years ago. So I think is good to introduce students to these practices if they will have to not only write, but also install and use software already made by others.

'make'-like dependency-tracking library?

There are many nice things to like about Makefiles, and many pains in the butt.
In the course of doing various project (I'm a research scientist, "data scientist", or whatever) I often find myself starting out with a few data objects on disk, generating various artifacts from those, generating artifacts from those artifacts, and so on.
It would be nice if I could just say "this object depends on these other objects", and "this object is created in the following manner from these objects", and then ask a Make-like framework to handle the details of actually building them, figuring out which objects need to be updated, farming out work to multiple processors (like Make's -j option), and so on. Makefiles can do all this - but the huge problem is that all the actions have to be written as shell commands. This is not convenient if I'm working in R or Perl or another similar environment. Furthermore, a strong assumption in Make is that all targets are files - there are some exceptions and workarounds, but if my targets are e.g. rows in a database, that would be pretty painful.
To be clear, I'm not after a software-build system. I'm interested in something that (more generally?) deals with dependency webs of artifacts.
Anyone know of a framework for these kinds of dependency webs? Seems like it could be a nice tool for doing data science, & visually showing how results were generated, etc.
One extremely interesting example I saw recently was IncPy, but it looks like it hasn't been touched in quite a while, and it's very closely coupled with Python. It's probably also much more ambitious than I'm hoping for, which is why it has to be so closely coupled with Python.
Sorry for the vague question, let me know if some clarification would be helpful.
A new system called "Drake" was announced today that targets this exact situation: http://blog.factual.com/introducing-drake-a-kind-of-make-for-data . Looks very promising, though I haven't actually tried it yet.
This question is several years old, but I thought adding a link to remake here would be relevant.
From the GitHub repository:
The idea here is to re-imagine a set of ideas from make but built for R. Rather than having a series of calls to different instances of R (as happens if you run make on R scripts), the idea is to define pieces of a pipeline within an R session. Rather than being language agnostic (like make must be), remake is unapologetically R focussed.
It is not on CRAN yet, and I haven't tried it, but it looks very interesting.
I would give Bazel a try for this. It is primarily a software build system, but with its genrule type of artifacts it can perform pretty arbitrary file generation, too.
Bazel is very extendable, using its Python-like Starlark language which should be far easier to use for complicated tasks than make. You can start by writing simple genrule steps by hand, then refactor common patterns into macros, and if things become more complicated even write your own rules. So you should be able to express your individual transformations at a high level that models how you think about them, then turn that representation into lower level constructs using something that feels like a proper programming language.
Where make depends on timestamps, Bazel checks fingerprints. So if at any one step produces the same output even though one of its inputs changed, then subsequent steps won't need to get re-computed again. If some of your data processing steps project or filter data, there might be a high probability of this kind of thing happening.
I see your question is tagged for R, even though it doesn't mention it much. Under the hood, R computations would in Bazel still boil down to R CMD invocations on the shell. But you could have complicated muliti-line commands assembled in complicated ways, to read your inputs, process them and store the outputs. If the cost of initialization of the R binary is a concern, Rserve might help although using it would make the setup depend on a locally accessible Rserve instance I believe. Even with that I see nothing that would avoid the cost of storing the data to file, and loading it back from file. If you want something that avoids that cost by keeping things in memory between steps, then you'd be looking into a very R-specific tool, not a generic tool like you requested.
In terms of “visually showing how results were generated”, bazel query --output graph can be used to generate a graphviz dot file of the dependency graph.
Disclaimer: I'm currently working at Google, which internally uses a variant of Bazel called Blaze. Actually Bazel is the open-source released version of Blaze. I'm very familiar with using Blaze, but not with setting up Bazel from scratch.
Red-R has a concept of data flow programming. I have not tried it yet.

Goto is considered harmful, but did anyone attempt to make code using goto re-usable and maintainable?

Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf). I was wondering is anyone attempted to find a way to make code using goto's re-usable and maintainable and not-harmful by adding any other language extensions or developing a language which allows for gotos.
The reason I ask the question is that it occurs to me that code written in Assembly language often used goto's and global variables to make the program work well within a limited space. The Atari 2600 which had 128 bytes of ram and the program was loaded from ROM cartridge. In this case, it was better to use unstructured programming and to make the most of the freedoms this allows to make the most of a very limited space for the program.
When you compare this with a game programmed today without the use of gotos, the game takes up much more space.
Then it occurs to me that perhaps its possible to program with the use of gotos if some rules or other language changes are made to support this, then the negative effects of gotos could be reduced or eliminated. Has anyone tried to find a way to make goto's NOT considered harmful by creating a language or some rules to follow which allow gotos to be not harmful.
If no-one looked for a way to use gotos in a non-harmful way then perhaps we adopted structured programming un-necessarily based solely on this paper? Perhaps there is another solution which allows for the use of gotos without the down-side.
Comparing gotos to structured programming is comparing a situation where the programmer has to remember what every labels in the code actually mean and do, and where there are, to a situation where the conditional branches are explicitly described.
As of the advantages of the goto statement regarding the place a program might take, I think that games today are big because of the graphic and sound resources they use. That is, show 1,000,000 polygons. The cost of a goto compared to that is totally neglectable.
Moreover, the structural statements are ultimately compiled into goto ("jmp") statements by the compiler when outputting assembly.
To answer the question, it might be possible to make goto less harmful by creating naming and syntax conventions. Enforcing these conventions into rules is however pretty much what structural programming does.
Linus Torvald argued once that goto can make source code clearer, but goto is useful in so very special cases that I would not dare use it as a programmer.
This question is somehow related to yours, since I think this one of the most common situations where a goto is needed.

Why is functional programming good? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've noticed that there are certain core concepts that a lot of functional programming fanatics cling to:
Avoiding state
Avoiding mutable data
Minimizing side effects
etc...
I'm not just wondering what other things make functional programming, but why these core ideas are good? Why is it good to avoid state, and the rest?
The simple answer is that if you don't have extra state to worry about, your code is simpler to reason about. Simpler code is easier to maintain. You don't need to worry about things outside a particular piece of code (like a function) to modify it. This has really useful ramifications for things like testing. If your code does not depend on some state, it becomes much easier to create automated tests for that code, since you do not need to worry about initializing some state.
Having stateless code makes it simpler to create threaded programs as well, since you don't need to worry about two threads of execution modifying/reading a shared piece of data at the same time. Your threads can run independent code, and this can save loads of development time.
Essentially, avoiding state creates simpler programs. In a way, there's less "moving parts" (i.e., ways lines of code can interact), so this will generally mean that the code is more reliable and contains less faults. Basically, the simpler the code, the less can go wrong. To me this is the essence of writing state-less code.
There are plenty of other reasons to create stateless, "functional" code, but they all boil down to simplicity for me.
In addition to what #Oleksi said, there is another important thing: referential transparency and transactional data structures. Of course, you do not need a functional programming language to do so, but it's a bit easier with them.
Purely functional data structures are guaranteed to remain the same - if one function returned a tree, it will always be the same tree, and all the further transforms would create new copies of it. It's much easier to backtrack to any previous version of a data structure this way, which is important for many essential algorithms.
Very generally, functional programming means:
encouraging the use of (first-class) functions
discouraging the use of (mutable) state
Why is mutation a problem? Think about it: mutation is to data structures what goto is to control flow. I.e., it allows you to arbitrarily "jump" to something completely different in a rather unstructured manner. Consequently, it is occasionally useful, but most of the time rather harmful to readability, testability, and compositionality.
One typical functional feature is "no subtyping". While it sounds a little bit odd to call this a feature, it is, for two (somehow related) reasons:
Subtyping relationships lead to a bunch of not-so-obvious problems. If you don't limit yourself to single or mixin inheritance, you end up with the diamond problem. More important is that you have to deal with variance (covariance, contravariance, invariance), which quickly becomes a nightmare, especially for type parameters (a.k.a. generics). There are several more reasons, and even in OO languages you hear statements like "prefer composition over inheritance".
On the other hand, if you simply leave out subtyping, you can reason much more detailled about your type system, which leads to the possibility to have (almost) full type inference, usually implemented using extensions of Hindley Milner type inference.
Of course sometimes you'll miss subtyping, but languages like Haskell have found a good answer to that problem: Type classes, which allow to define a kind of common "interface" (or "set of common operations") for several otherwise unrelated types. The difference to OO languages is that type classes can be defined "afterwards", without touching the original type definitions. It turns out that you can do almost everything with type classes that you can do with subtyping, but in a much more flexible way (and without preventing type inference). That's why other languages start to employ similar mechnisms (e.g. implicit conversions in Scala or extension methods in C# and Java 8)

Resources