How to make a single makefile that applies the same command to sub-directories? - recursion

For clarity, I am running this on windows with GnuWin32 make.
I have a set of directories with markdown files in at several different levels - theoretically they could be in the branch nodes, but I think currently they are only in the leaf nodes. I have a set of pandoc/LaTeX commands to run to turn the markdown files into PDFs - and obviously only want to recreate the PDFs if the markdown file has been updated, so a makefile seems appropriate.
What I would like is a single makefile in the root, which iterates over any and all sub-directories (to any depth) and applies the make rule I'll specify for running pandoc.
From what I've been able to find, recursive makefiles require you to have a makefile in each sub-directory (which seems like an administrative overhead that I would like to avoid) and/or require you to list out all the sub-directories at the start of the makefile (again, would prefer to avoid this).
Theoretical folder structure:
root
|-make
|-Folder AB
| |-File1.md
| \-File2.md
|-Folder C
| \-File3.md
\-Folder D
|-Folder E
| \-File4.md
|-Folder F
\-File5.md
How do I write a makefile to deal with this situation?

Here is a small set of Makefile rules that hopefuly would get you going
%.pdf : %.md
pandoc -o $# --pdf-engine=xelatex $^
PDF_FILES=FolderA/File1.pdf FolderA/File2.pdf \
FolderC/File3.pdf FolderD/FolderE/File4.pdf FolderD/FolderF/File5.pdf
all: ${PDF_FILES}
Let me explain what is going on here. First we have a pattern rule that tells make how to convert a Markdown file to a PDF file. The --pdf-engine=xelatex option is here just for the purpose of illustration.
Then we need to tell Make which files to consider. We put the names together in a single variable PDF_FILES. This value for this variable can be build via a separate scripts that scans all subdirectories for .md files.
Note that one has to be extra careful if filenames or directory names contain spaces.
Then we ask Make to check if any of the PDF_FILES should be updated.
If you have other targets in your makefile, make sure that all is the first non-pattern target, or call make as make all
Updating the Makefile
If shell functions works for you and basic utilities such as sed and find are available, you could make your makefile dynamic with a single line.
%.pdf : %.md
pandoc -o $# --pdf-engine=xelatex $^
PDF_FILES:=$(shell find -name "*.md" | xargs echo | sed 's/\.md/\.pdf/g' )
all: ${PDF_FILES}
MadScientist suggested just that in the comments
Otherwise you could implement a script using the tools available on your operating system and add an additional target update: that would compute the list of files and replace the line starting with PDF_FILES with an updated list of files.

Final version of the code that worked for Windows, based on #DmitiChubarov and #MadScientist's suggestions is as follows:
%.pdf: %.md
pandoc $^ -o $#
PDF_FILES:=$(shell dir /s /b *.md | sed "s/\.md/\.pdf/g")
all: ${PDF_FILES}

Related

Rsync all files (recursively) from one dir to another, maintaining only a portion of the original dir structure

I have two directories:
Directory #1, 'C'
C's absolute path:
/A/B/C
Directory #2, 'T'
T's absolute path:
/Q/R/T
I want to use rsync, to copy all files, recursively, from C, and copy them in to T, while maintaining the original directory structure - but only from B onwards.
Example to make it clearer: suppose 'B' has only 3 files nested within it:
/A/B/f1.txt
/A/B/C/f2.txt
/A/B/C/D/f3.txt
Then I want to end up with only f2.txt and f3.txt being copied over, with the final filepaths as follows (notice how I keep the directory structure, only from B onwards):
/Q/R/T/B/C/f2.txt
/Q/R/T/B/C/D/f3.txt
Here is the catch: I must execute the rsync cmd from within /Q/R/. So when I execute this command, my pwd must be /Q/R/.
Can anyone help me figure out how to do this?
[If I did not have this constraint of where my cwd must be, I could cd to /A/B, and then execute: rsync . /Q/R/T/ --recursive --relative . Unfortunately, I can not do that for reasons that would take a lot of pointless explaining here. And when I try to execute rsync /A/. /Q/R/T/ --recursive --relative, I end up with not only everything within A, but maintaining that first part of the dir structure (/A/) that I don't want. (Note - in the real life scenario the dir structure is much more complex then this, this is just the general problem.]
The rsync command includes a couple of options which are suitable for this scenario. They are:
--include=PATTERN - Don't exclude files matching PATTERN
--exclude=PATTERN - Exclude files matching PATTERN
An excellent description and examples of the --exclude flag can be found here.
Solution
Given the directory structures provided in your question and your pwd being set to /Q/R/. Running the following command will meet your requirement:
rsync ../../A/ T/ --recursive --include A/B/** --exclude B/*.*
Edit:
If you do want /A/B/f1.txt to copy to /Q/R/T/B/f1.txt (as it's unclear in your question because you don't show it in the "I want to end up with" example"). Then omit the --exclude B/*.* part, so the complete command is reduced to:
rsync ../../A/ T/ --recursive --include A/B/**
or reduced even further in complexity to just:
rsync ../../A/** T/ --recursive
Explanation of the command
../../A/
The first argument provides the path to the source directory. I.e. The relative position within the hierarchical tree of names (Based on your pwd being /Q/R).
T/
The second argument provides the path to the destination directory. Again this is a relative position within the hierarchical tree of names (and is also based on the pwd being /Q/R).
--recursive
The first option is to recurse into the directories.
--include A/B/**
This says that you want to include all the assets (files/folders), however many levels deep, from within the folder named B which resides inside folder A.
--exclude B/*.*
This says that you want to exclude any assets (files/folders), whose name includes a dot [.] plus extension, which reside inside folder B (at the top level). This will prevent the file named f1.txt from being copied. You could be even more specific here and use --exclude B/f1.txt instead, however I'm assuming in real life you perhaps have additional files you want to exclude here too.
Additional notes
Both the --include and --exclude options can be utilized multiple times. This can be very useful for some scenarios too as it enables you to be specific about what to include and/or exclude during the copy process.
For example, lets assume that your source directory /A/B/, (as described in your question), also contains a folder named X. So its path is A/B/X.
Lets say that we also do not want to copy this folder named X (in the same way as you currently do not want to copy /A/B/f1.txt).
For this scenario we add another --exclude option as follows:
rsync ../../A/ T/ --recursive --include A/B/** --exclude B/*.* --exclude X/
Note the additional --exclude X/ at the end.
You mention...
(Note - in the real life scenario the dir structure is much more complex then this, this is just the general problem.
... in your question, so you may find it necessary to add additional --exclude=PATTERN to truly meet your requirements.
Grunt
As you have included the gruntjs flag with your question, then you may want to consider utilizing plug-ins which can run shell commands like rsync such as:
grunt-shell
grunt-exec

writing a recursive make recipe with prerequisite on parent directory

I am trying to write a recursive make recipe. In this recipe, each target is dependent on a file with an equal name on the parent directory. A minimal (non-working) example:
foo/.dirstamp:
mkdir $(dir $#)
touch $#
.SECONDEXPANSION:
%/.dirstamp: $$(dir $$*).dirstamp
mkdir $(dir $#)
touch $#
With this example, I would expect make foo/bar/qux/lol/.dirstamp to generate the whole directory tree (if it does not exist), touching all .dirstamp files along the way. However, it does not work:
$ ls # note that there is nothing, make is meant to create the dir tree
Makefile
$ make --debug=v foo/bar/qux/lol/.dirstamp
GNU Make 4.0
[...]
Reading makefiles...
Reading makefile 'Makefile'...
Updating goal targets....
Considering target file 'foo/bar/qux/lol/.dirstamp'.
File 'foo/bar/qux/lol/.dirstamp' does not exist.
Finished prerequisites of target file 'foo/bar/qux/lol/.dirstamp'.
Must remake target 'foo/bar/qux/lol/.dirstamp'.
make: *** No rule to make target 'foo/bar/qux/lol/.dirstamp'. Stop.
It works fine as long as the recursive recipe only needs to be expanded twice, e.g., make foo/bar/.dirstamp works fine.
How can this work for an arbitrary number of levels? How can I handle a recursive expansion for the target and prerequisites names?
Note: my real problem is that the prerequisites of my recipes are in a root
directory different from the target so I am using the recipe above to duplicate the directory tree. I know about mkdir -p which seems to work fine in GNU systems. I am still interested on knowing how I would solve the recursion problem for arbitrary levels. which no longer works because part of the team is using Mac and mounting this directories over smb.
More details on the actual problem: prerequisites are in data/x/y/z while targets go into results/x/y/z. However, the results directory tree does not exist and needs to be created as needed. To solve this, I made the creation of parent directories an order-only prerequisite (via the .dirstamp files on my minimal example above).
can't copy data into results, that's several TB of data;
can't have the targets created in data, that's read-only;
can't use mkdir -p because the results directory will not be local, mounted over smb, and others may use non-GNU systems;
After an hint from #EtanReisner on the question:
make won't apply a rule more than once. That's a built-in (intentional) limitation. Without working around that with manual recursion or manually building the set of targets and using a static pattern rule (which may or may not actually work I'm not sure) there's not much you can do about this.
I worked up this solution:
RESULT_DIRS := $(patsubst data/%, results/%, $(shell find data/* -type d -print))
DIRSTAMPS := $(addsuffix /.dirstamp, $(RESULT_DIRS))
results/.dirstamp:
mkdir $(dir $#)
touch $#
.SECONDEXPANSION:
$(DIRSTAMPS): $$(dir $$(patsubst %/.dirstamp, %, $$#)).dirstamp
mkdir $(dir $#)
touch $#
It will duplicate the data directory tree in results as the dirstamp files are required. They are required by making them prerequisites of the other recipes (note the | which makes them order-only prerequisites):
results/%/foo.analysis: data/%/foo.data | results/%/.dirstamp
$(SOME_ANALYSIS_PROGRAM) $^ > $#

How to write a makefile executing make one directory level up

Can I write a wrapper makefile that will cd one level up and execute there make with all the command options I have given the wrapper?
In more detail:
Directory project contains a real Makefile with some different targets.
Directory project/resources contains the wrapper Makefile which should call Makefile in project.
When I am in my shell in directory project/resources, I execute
make TARGET
and the Makefile there just cds one directory up and calls
make TARGET
in the directory project.
Is this possible? And how?
You could use a very simple Makefile for all your sub-directories:
%:
$(MAKE) -C .. $#
% is a last resort match-anything pattern rule that will match any target... for which there is no implicit rule (GNU make has an incredibly large number of implicit rules). So, if none of your targets are covered by an implicit rule, this should work. Else you will have to tell make not to use the implicit rules it knows. This can be done (with GNU make) by calling make with the -r option:
cd project/resources
make -r <anything>
will call make in project for target <anything>. The main drawback is that the -r flag is passed to the sub-make and so the implicit rules will not apply neither in project, which can be a problem. If it is you can obtain the same effect by adding an empty .SUFFIXES target to theMakefile in project/resources:
.SUFFIXES:
%:
$(MAKE) -C .. $#
With my version of GNU make (3.82) it works like a charm and the sub-make has all the default implicit rules.
Yes, you can have a makefile which works for "any" target.
The GNU make manual discusses this in the Overriding Part of Another Makefile section:
Sometimes it is useful to have a makefile that is mostly just like another makefile. You can often use the ‘include’ directive to include one in the other, and add more targets or variable definitions. However, it is invalid for two makefiles to give different recipes for the same target. But there is another way.
In the containing makefile (the one that wants to include the other), you can use a match-anything pattern rule to say that to remake any target that cannot be made from the information in the containing makefile, make should look in another makefile. See Pattern Rules, for more information on pattern rules.
For example, if you have a makefile called Makefile that says how to make the target ‘foo’ (and other targets), you can write a makefile called GNUmakefile that contains:
foo:
frobnicate > foo
%: force
#$(MAKE) -f Makefile $#
force: ;
If you say ‘make foo’, make will find GNUmakefile, read it, and see that to make foo, it needs to run the recipe ‘frobnicate > foo’. If you say ‘make bar’, make will find no way to make bar in GNUmakefile, so it will use the recipe from the pattern rule: ‘make -f Makefile bar’. If Makefile provides a rule for updating bar, make will apply the rule. And likewise for any other target that GNUmakefile does not say how to make.
The way this works is that the pattern rule has a pattern of just ‘%’, so it matches any target whatever. The rule specifies a prerequisite force, to guarantee that the recipe will be run even if the target file already exists. We give the force target an empty recipe to prevent make from searching for an implicit rule to build it—otherwise it would apply the same match-anything rule to force itself and create a prerequisite loop!
One option: use a wrapper file to execute the commands to do that. Just be sure your target make files don't include the child directory that has the wrapper, or else you can create an endless loop. For example,
clean:
pushd .. && make clean && popd
Using the comment of user Renaud Pacalet and the answer to a different question the following one-liner is as close as I could get. The whole Makefile reads:
IGNORE := $(shell $(MAKE) -C .. $(MAKECMDGOALS))
This solutions comes with a few caveats:
Command line option -B does not get passed through to the subsequent make call.
The output of the subsequently called make process (in the project directory) is not printed to stdout.
The wrapper make process reports for any given target at the end :
make: *** No rule to make target TARGET. Stop.

Makefile rule depend on directory content changes

Using Make is there a nice way to depend on a directories contents.
Essentially I have some generated code which the application code depends on. The generated code only needs to change if the contents of a directory changes, not necessarily if the files within change their content. So if a file is removed or added or renamed I need the rule to run.
My first thought is generate a text file listing of the directory and diff that with the last listing. A change means rerun the build. I think I will have to pass off the generate and diff part to a bash script.
I am hoping somehow in their infinite intelligence might have an easier solution.
Kudos to gjulianm who got me on the right track. His solution works perfect for a single directory.
To get it working recursively I did the following.
ASSET_DIRS = $(shell find ../../assets/ -type d)
ASSET_FILES = $(shell find ../../assets/ -type f -name '*')
codegen: ../../assets/ $(ASSET_DIRS) $(ASSET_FILES)
generate-my-code
It appears now any changes to the directory or files (add, delete, rename, modify) will cause this rule to run. There is likely some issue with file names here (spaces might cause issues).
Let's say your directory is called dir, then this makefile will do what you want:
FILES = $(wildcard dir/*)
codegen: dir # Add $(FILES) here if you want the rule to run on file changes too.
generate-my-code
As the comment says, you can also add the FILES variable if you want the code to depend on file contents too.
A disadvantage of having the rule depend on a directory is that any change to that directory will cause the rule to be out-of-date — including creating generated files in that directory. So unless you segregate source and target files into different directories, the rule will trigger on every make.
Here is an alternative approach that allows you to specify a subset of files for which additions, deletions, and changes are relevant. Suppose for example that only *.foo files are relevant.
# replace indentation with tabs if copy-pasting
.PHONY: codegen
codegen:
find . -name '*.foo' |sort >.filelist.new
diff .filelist.current .filelist.new || cp -f .filelist.new .filelist.current
rm -f .filelist.new
$(MAKE) generate
generate: .filelist.current $(shell cat .filelist.current)
generate-my-code
.PHONY: clean
clean:
rm -f .filelist.*
The second line in the codegen rule ensures that .filelist.current is only modified when the list of relevant files changes, avoiding false-positive triggering of the generate rule.

Reproducible Research: Convert sas7bdat data files to csv files by invoking statTransfer using GNU make

QUESTION:
I'm very new to GNU Make. Is there a better way to programmatically convert statistical datasets from sas7bdat to csv files and keep them in sync with each other using GNU Make to promote reproducible research? Would you approach this problem differently from a coding perspective or is there a better way to promote reproducible research? Can I add an additional pre-requisite (i.e. statTransferOptions.txt) while using static pattern rules?
The solution needs to:
Find all sas7bdat files in all subdirectories
Read statTransfer options
Convert the sas7bdat file to csv file using statTransfer command line tool with options
Given the current limitations of statTransfer, I think this will require a two step process:
Build statTransfer command file (.stcmd) for each SAS data file (.sas7bdat)
Build csv file for each stcmd file by executing statTransfer (st) using options in stcmd file
target stcmd and csv files should reside in same subdirectory as pre-requisite sas7bdat file
Find out-of-date stcmd and csv files and update them if a new sas7bdat file exists or if base option file changes
CONTEXT:
I have inherited a large statistical report which is published annually. In previous years, analysis was done in SAS. We are now using R. Some of the sas7bdat files generated by SAS Enterprise Guide do not import correctly with the sas7bdat package. StatTransfer, a commercial product, has a command-line interface and does convert sas7bdat files to csv files properly; however, there are options that improve conversion (e.g., writing of date formats). The sas7bdat files are in multiple subdirectories corresponding to the type of dataset and the year.
This approach was further prompted by:
Gandrud, Christopher (2013-06-21). Reproducible Research with R and RStudio (Chapman & Hall/CRC The R Series) (pp. 104-105). Chapman and Hall/CRC. Kindle Edition.
TROUBLESHOOTING:
This almost does what I want: Recursive wildcards in GNU make?
SUGGESTED MAKEFILE?
RDIR := .
######
#PREP#
######
# Use BASH shell to create list of source sas7bdat files
SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')
# Use pattern substring functions to define variable list of filenames
# to be used as targets in recipes
STCMD_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))
#########
#TARGETS#
#########
all: $(STCMD_OUT) $(CSV_OUT)
# I think the name "static pattern rules" is misleading
# but I found this to be helpful:
# http://www.gnu.org/software/make/manual/make.html#Static-Pattern
# can I add statTransferOptions.txt as a pre-requisite while using static pattern rules?
$(STCMD_OUT): $(RDIR)/$(#D)/%.stcmd: $(RDIR)/$(#D)/%.sas7bdat
cp $(RDIR)/statTransferOptions.txt $#
echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v >> $#
echo quit >> $#
$(CSV_OUT): $(RDIR)/$(#D)/%.csv: $(RDIR)/$(#D)/%.stcmd
st $(RDIR)/$<
clean:
rm $(STCMD_OUT)
rm $(CSV_OUT)
REVISED MAKEFILE AFTER INPUT FROM SO:
RDIR := .
######
#PREP#
######
# Create list of source sas7bdat files
SASDATA := $(shell find $(RDIR) -type f -name '*.sas7bdat')
STCMD_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))
#########
#TARGETS#
#########
all: $(STCMD_OUT) $(CSV_OUT)
$(STCMD_OUT): %.stcmd: %.sas7bdat statTransferOptions.txt
cp $(RDIR)/statTransferOptions.txt $#
echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v -y >> $#
echo quit >> $#
$(CSV_OUT): %.csv: %.stcmd
st $(RDIR)/$<
clean:
rm $(STCMD_OUT)
rm $(CSV_OUT)
However, correct option might be to debug CRAN sas7bdat package so that the entire toolchain is available rather than invoke proprietary statTransfer.
In SO, we generally don't have the time or energy (or, often, interest) to go read related papers, options, alternatives, etc. It works best if you simply and clearly specify the code you have problems with (in this case, the makefile which is provided so that's great), the exact problem you have including error messages or incorrect outputs (this is not obvious from your question), what you wanted to happen that did not happen, because this is not always clear, and perhaps any additional thoughts or directions you've tried and have not worked.
I'm not sure exactly what the problem you're having is, but I see a number of issues with your makefile. First, this will work but is highly inefficient:
SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')
You should use the := form of assignment here. Probably you should use it when setting STCMD_OUT and CSV_OUT as well, although this is less critical.
Most important, though, these rules are not right:
$(STCMD_OUT): $(RDIR)/$(#D)/%.stcmd: $(RDIR)/$(#D)/%.sas7bdat
You cannot use automatic variables like $# (or any of their alternative forms) in the target or prerequisite lists. The automatic variables are only defined within the recipe of the rule. You can use secondary expansion for this, but I'm not sure why you're trying to do this. Why not just use:
$(STCMD_OUT): %.stcmd: %.sas7bdat
? Ditto for the other static pattern rule?
As for your question, yes, it's perfectly fine to add extra prerequisites such as statTransferOptions.txt to the static pattern rule.

Resources