How can a Makefile replacement pattern produce more than one output per input? - gnu-make

In our code base we have a code generator which takes foo.xyz and produces two source files foo-in.c and foo-out.c.
In an application's Makefile I would like to list the sources as:
SOURCES=main.c gadget.c foo.xyz
Then the corresponding OBJECTS variable should expand to:
OBJECTS=main.o gadget.o foo-in.o foo-out.o
but I'm unable to find whether it is possible to do this expansion generically using GNU Make. The common $(SOURCES:.c=.o) replacement pattern replaces a single source file with a single object file.
How can I write a substitution pattern which will produce multiple output files per input file?

Well, while writing the question I found a usable solution.
SOURCES=main.c gadget.c foo.xyz
OBJECTS=$(patsubst %.c,%.o,$(filter %.c,$(SOURCES))) \
$(patsubst %.xyz,%-in.o,$(filter %.xyz,$(SOURCES))) \
$(patsubst %.xyz,%-out.o,$(filter %.xyz,$(SOURCES)))
app: $(OBJECTS)
$(LD) -o $# $(LDFLAGS) $(OBJECTS)
%-in.c %-out.c: %.xyz
# Very special codegen rule
touch $(patsubst %.xyz,%-in.c,$<)
touch $(patsubst %.xyz,%-out.c,$<)
When converting from $(SOURCES) to $(OBJECTS) use two separate patsubst calls to the filteres out .xyz files. This way, both the %-in.o and %-out.o files ends up in the object list.
Another solution could be to create an intermediate sources list using the same trick but substituting xyz with the corresponding -in.c and -out.c patterns. Then the objects list could be created in the traditional way. An added benefit of this method would be that creating a rule which generates all source code files is trivial.

Related

How to make a single makefile that applies the same command to sub-directories?

For clarity, I am running this on windows with GnuWin32 make.
I have a set of directories with markdown files in at several different levels - theoretically they could be in the branch nodes, but I think currently they are only in the leaf nodes. I have a set of pandoc/LaTeX commands to run to turn the markdown files into PDFs - and obviously only want to recreate the PDFs if the markdown file has been updated, so a makefile seems appropriate.
What I would like is a single makefile in the root, which iterates over any and all sub-directories (to any depth) and applies the make rule I'll specify for running pandoc.
From what I've been able to find, recursive makefiles require you to have a makefile in each sub-directory (which seems like an administrative overhead that I would like to avoid) and/or require you to list out all the sub-directories at the start of the makefile (again, would prefer to avoid this).
Theoretical folder structure:
root
|-make
|-Folder AB
| |-File1.md
| \-File2.md
|-Folder C
| \-File3.md
\-Folder D
|-Folder E
| \-File4.md
|-Folder F
\-File5.md
How do I write a makefile to deal with this situation?
Here is a small set of Makefile rules that hopefuly would get you going
%.pdf : %.md
pandoc -o $# --pdf-engine=xelatex $^
PDF_FILES=FolderA/File1.pdf FolderA/File2.pdf \
FolderC/File3.pdf FolderD/FolderE/File4.pdf FolderD/FolderF/File5.pdf
all: ${PDF_FILES}
Let me explain what is going on here. First we have a pattern rule that tells make how to convert a Markdown file to a PDF file. The --pdf-engine=xelatex option is here just for the purpose of illustration.
Then we need to tell Make which files to consider. We put the names together in a single variable PDF_FILES. This value for this variable can be build via a separate scripts that scans all subdirectories for .md files.
Note that one has to be extra careful if filenames or directory names contain spaces.
Then we ask Make to check if any of the PDF_FILES should be updated.
If you have other targets in your makefile, make sure that all is the first non-pattern target, or call make as make all
Updating the Makefile
If shell functions works for you and basic utilities such as sed and find are available, you could make your makefile dynamic with a single line.
%.pdf : %.md
pandoc -o $# --pdf-engine=xelatex $^
PDF_FILES:=$(shell find -name "*.md" | xargs echo | sed 's/\.md/\.pdf/g' )
all: ${PDF_FILES}
MadScientist suggested just that in the comments
Otherwise you could implement a script using the tools available on your operating system and add an additional target update: that would compute the list of files and replace the line starting with PDF_FILES with an updated list of files.
Final version of the code that worked for Windows, based on #DmitiChubarov and #MadScientist's suggestions is as follows:
%.pdf: %.md
pandoc $^ -o $#
PDF_FILES:=$(shell dir /s /b *.md | sed "s/\.md/\.pdf/g")
all: ${PDF_FILES}

GNU make pattern rules with different file base names

I have a data processing job that I would like to automate with Make. Hundreds of files need to be processed, in several steps.
Unfortunately, the base name will change for at least one of the steps, but it would be easy to write these dependencies into a separate file that then is included.
However, I'd like to avoid also writing the build instructions (which are quite complicated) for all these files separately.
I envisage something along these lines:
# automatically generated rules, included into make file
dir1/test.bb: dir2/test_other_name.aa
# (many more rules like the above, linking xxx.bb to yyy.aa)
# pattern rule
%.bb: %.aa
# build step using $# $>
What I would like is the pattern rule to provide the rules, and the explicit rule defining the dependencies. Can something like this be achieved?
When make's noddy patterns don't cut the mustard,
just write out the rules explicitly.
(This has the happy side effect of not using pattern rules.)
Let's say you have a function src-to-target which will generate the target filename (i.e., $(call src-to-target,dir2/test_other_name.aa) expands to dir1/test.bb.
Also, you have a list of sources in ${srcs}, and ${recipe} is a list of shell commands using $#, $< etc.
define src-to-target = ... # $1:source
define recipe =
echo Building $# from $<
⋮
endef
define generate-rule = # $1:source
target := $(call src-to-taget,$1)
targets += $${target}
$${target}: $1 ; $${recipe}
endef
$(foreach _,${srcs},$(eval $(call generate-rule,$_)))
.PHONY: all
all: ${targets} ; : $# Success
The $(foreach ...) does all the work here.
So, looking at that in painful detail,
First expand ${srcs}
Set $_ to the first in the list (dir2/test_other_name.aa say)
Expand $(call generate-rule,$_)
Expand $(call generate-rule,dir2/test_other_name.aa)
$1 is set to dir2/test_other_name.aa, and the expansion of $(generate-rule) follows, leading to this block of text
target := dir1/test.bb
targets += ${target}
${target}: dir2/test_other_name.aa ; ${recipe}
As a side effect, $(eval) swallows the above text. The expansion of the $(eval) though is empty.
$_ is set to the next source file.
Wash, lather, rinse, repeat
Once the $(foreach) is complete,
${targets} contains the complete list of targets.
Parallel safe too.
What's not to like?

How does GNU make's "file" function work?

I am thinking I may need to use the file function in GNU make, and just can not follow the example they give. I have looked online, but don't see any post with more explanation. Here is the example they give:
program: $(OBJECTS)
$(file >$#.in,$^)
$(CMD) $(CMDFLAGS) #$#.in
#rm $#.in
I think I know what it is doing at a high level as it is explained in the manual.
$#.in
is a list of all the target files
$^
is a list of the source files
I am not sure how #$#.in is used on the third line or what there is an # sign at the beginning. What does that mean please? What does it supposed to do?
The key to the operation of that recipe is given in the prose immediately preceding it in the manual:
Many commands use the convention that an argument prefixed with an # specifies a file containing more arguments. Then you might write your recipe in this way:
program: $(OBJECTS)
$(file >$#.in,$^)
$(CMD) $(CMDFLAGS) #$#.in
#rm $#.in
$# is the target file (there is only one of those in any given recipe)
$#.in is the target file with .in added to the end of the name.
$^ is the "list" of the all the prerequisites of the target.
#$#.in is the name of the target with .in at the end and # at the start.
So the $(file ...) call in that recipe writes the list of prerequisites of the target into a file called program.in in "overwrite" mode and then passes that file name to the $(CMD) command using the #filename convention that was mentioned.

Reproducible Research: Convert sas7bdat data files to csv files by invoking statTransfer using GNU make

QUESTION:
I'm very new to GNU Make. Is there a better way to programmatically convert statistical datasets from sas7bdat to csv files and keep them in sync with each other using GNU Make to promote reproducible research? Would you approach this problem differently from a coding perspective or is there a better way to promote reproducible research? Can I add an additional pre-requisite (i.e. statTransferOptions.txt) while using static pattern rules?
The solution needs to:
Find all sas7bdat files in all subdirectories
Read statTransfer options
Convert the sas7bdat file to csv file using statTransfer command line tool with options
Given the current limitations of statTransfer, I think this will require a two step process:
Build statTransfer command file (.stcmd) for each SAS data file (.sas7bdat)
Build csv file for each stcmd file by executing statTransfer (st) using options in stcmd file
target stcmd and csv files should reside in same subdirectory as pre-requisite sas7bdat file
Find out-of-date stcmd and csv files and update them if a new sas7bdat file exists or if base option file changes
CONTEXT:
I have inherited a large statistical report which is published annually. In previous years, analysis was done in SAS. We are now using R. Some of the sas7bdat files generated by SAS Enterprise Guide do not import correctly with the sas7bdat package. StatTransfer, a commercial product, has a command-line interface and does convert sas7bdat files to csv files properly; however, there are options that improve conversion (e.g., writing of date formats). The sas7bdat files are in multiple subdirectories corresponding to the type of dataset and the year.
This approach was further prompted by:
Gandrud, Christopher (2013-06-21). Reproducible Research with R and RStudio (Chapman & Hall/CRC The R Series) (pp. 104-105). Chapman and Hall/CRC. Kindle Edition.
TROUBLESHOOTING:
This almost does what I want: Recursive wildcards in GNU make?
SUGGESTED MAKEFILE?
RDIR := .
######
#PREP#
######
# Use BASH shell to create list of source sas7bdat files
SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')
# Use pattern substring functions to define variable list of filenames
# to be used as targets in recipes
STCMD_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT = $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))
#########
#TARGETS#
#########
all: $(STCMD_OUT) $(CSV_OUT)
# I think the name "static pattern rules" is misleading
# but I found this to be helpful:
# http://www.gnu.org/software/make/manual/make.html#Static-Pattern
# can I add statTransferOptions.txt as a pre-requisite while using static pattern rules?
$(STCMD_OUT): $(RDIR)/$(#D)/%.stcmd: $(RDIR)/$(#D)/%.sas7bdat
cp $(RDIR)/statTransferOptions.txt $#
echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v >> $#
echo quit >> $#
$(CSV_OUT): $(RDIR)/$(#D)/%.csv: $(RDIR)/$(#D)/%.stcmd
st $(RDIR)/$<
clean:
rm $(STCMD_OUT)
rm $(CSV_OUT)
REVISED MAKEFILE AFTER INPUT FROM SO:
RDIR := .
######
#PREP#
######
# Create list of source sas7bdat files
SASDATA := $(shell find $(RDIR) -type f -name '*.sas7bdat')
STCMD_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.stcmd, $(SASDATA))
CSV_OUT := $(patsubst $(RDIR)/%.sas7bdat, $(RDIR)/%.csv, $(SASDATA))
#########
#TARGETS#
#########
all: $(STCMD_OUT) $(CSV_OUT)
$(STCMD_OUT): %.stcmd: %.sas7bdat statTransferOptions.txt
cp $(RDIR)/statTransferOptions.txt $#
echo copy $(RDIR)/$< delim $(RDIR)/$(basename $<).csv -v -y >> $#
echo quit >> $#
$(CSV_OUT): %.csv: %.stcmd
st $(RDIR)/$<
clean:
rm $(STCMD_OUT)
rm $(CSV_OUT)
However, correct option might be to debug CRAN sas7bdat package so that the entire toolchain is available rather than invoke proprietary statTransfer.
In SO, we generally don't have the time or energy (or, often, interest) to go read related papers, options, alternatives, etc. It works best if you simply and clearly specify the code you have problems with (in this case, the makefile which is provided so that's great), the exact problem you have including error messages or incorrect outputs (this is not obvious from your question), what you wanted to happen that did not happen, because this is not always clear, and perhaps any additional thoughts or directions you've tried and have not worked.
I'm not sure exactly what the problem you're having is, but I see a number of issues with your makefile. First, this will work but is highly inefficient:
SASDATA = $(shell find $(RDIR) -type f -name '*.sas7bdat')
You should use the := form of assignment here. Probably you should use it when setting STCMD_OUT and CSV_OUT as well, although this is less critical.
Most important, though, these rules are not right:
$(STCMD_OUT): $(RDIR)/$(#D)/%.stcmd: $(RDIR)/$(#D)/%.sas7bdat
You cannot use automatic variables like $# (or any of their alternative forms) in the target or prerequisite lists. The automatic variables are only defined within the recipe of the rule. You can use secondary expansion for this, but I'm not sure why you're trying to do this. Why not just use:
$(STCMD_OUT): %.stcmd: %.sas7bdat
? Ditto for the other static pattern rule?
As for your question, yes, it's perfectly fine to add extra prerequisites such as statTransferOptions.txt to the static pattern rule.

How can convert a dictionary file (.dic) with an affix file (.aff) to create a list of words?

Im looking at a dictionary file (".dic") and its associated "aff" file. What I'm trying to do is combine the rules in the "aff" file with the words in the "dic" file to create a global list of all words contained within the dictionary file.
The documentation behind these files is difficult to find. Does anyone know of a resource that I can learn from?
Is there any code out there that will already do this (am I duplicating an effort that I don't need to)?
thanks!
According to Pillowcase, here it's an example of usage:
# Download dictionary
wget -O ./dic/es_ES.aff "https://raw.githubusercontent.com/sbosio/rla-es/master/source-code/hispalabras-0.1/hispalabras/es_ES.aff"
wget -O ./dic/es_ES.dic "https://raw.githubusercontent.com/sbosio/rla-es/master/source-code/hispalabras-0.1/hispalabras/es_ES.dic"
# Compile program
wget -O ./dic/unmunch.cxx "https://raw.githubusercontent.com/hunspell/hunspell/master/src/tools/unmunch.cxx"
wget -O ./dic/unmunch.h "https://raw.githubusercontent.com/hunspell/hunspell/master/src/tools/unmunch.h"
g++ -o ./dic/unmunch ./dic/unmunch.cxx
# Generate dictionary
./dic/unmunch ./dic/es_ES.dic ./dic/es_ES.aff 2> /dev/null > ./dic/es_ES.txt.bk
sort ./dic/es_ES.txt.bk > ./dic/es_ES.txt # Opcional
rm ./dic/es_ES.txt.bk # Opcional
You need a utility called munch.exe to apply the aff rules to the dic file.
These could be Hunspell dictionary files. Unfortunately, the command to create a "global" or unmunched wordlist only fully support simple .aff and .dic files.
From the documentation.
unmunch: list all recognized words of a MySpell dictionary
Syntax:
unmunch dic_file affix_file
Try it and see what happens. For generating all wordforms for one word only, look here.

Resources