Making multiple files from multiple files with one command in gnu make - gnu-make

Assume 1000 files with extension .xhtml are in directory input, and that a certain subset of those files (with output paths in $(FILES), say) need to be transformed via xslt to files with the same name in directory output. A simple make rule would be:
$(FILES): output/%.xhtml : input/%.xhtml
saxon s:$< o:$# foo.xslt
This works, of course, doing the transform one file at a time. The problem is that I want to use saxon's batch processing to do all the files at one time, since, given the number of files, that would be much faster, considering the overhead of loading java and saxon for each file. Saxon allows the -s (source) option to be a directory and processes all files in that directory, placing the results with the same name in the directory specified in the -o: option.
I'm aware of the well-known technique to get GNU make to do a single command to update multiple files by using pattern rules:
output/%.xhtml: input/%.xhtml
saxon s:input -o:output foo.xslt
But in my case this suffers from two problems. First, it will run the transform on all files in the input directory, not just the ones that have changed; and second, it will not limit the transform to the subset of files specified in $(FILES). The GNU make feature of running a recipe given in a pattern rule only once for all matched targets does not work in the case of so-called "static pattern rules" (see [here]), as the rule given at the top of the post is known.
In order to use the saxon batching feature, I need to create a temporary directory, copy to it only those files to be processed, then run the transform with that temporary directory as the input directory. I tried creating a temporary directory, and remember its name using a target-specific variable for future use, using
$(FILES): TMPDIR:=$(shell mktemp -d)
but this creates a new temporary directory for every single target that is out-of-date. In any case, I'm not sure how to structure the rule that would then copy the necessary files into that directory. I don't want to create the temporary directory at the time the makefile is parsed, since I have a non-recursive make system that will parse all make files, even those not related to the current top-level target, and don't want to create the temporary directory for situations in which it is not necessary/will not be used.
I'm well aware that many questions have been asked on SO in the past about creating multiple files from a single input; one solution is (non-static) pattern rules; other solutions involve phony targets. However, in this case I'm stuck as to how to put all this together.
I can identify the files that changed and copy them using the static pattern rule
$(FILES): output/%.xhtml : input/%.xhtml
TMPDIR=`mktemp -d`
cp $< $(TMPDIR)
but actually I would prefer to copy the files with a single cp command, whereas this copies them one by one. Perhaps there is some application here of cp -u?
I also considered using an ad-hoc extension for those files needing updating but could not see how to get this to work either. I'm about to give up and just run the saxon transform on all files when any of them have changed, but is there any better way?

Personally, I wouldn't try to do this from the command line. That's partly because I'm not a shell scripting wizard. I'm not an Ant wizard either, but because the requirement is to process files that haven't changed, this seems to fall very much into Ant territory. On the other hand, Ant will recompile the stylesheet for each transformation, which is an overhead you might want to avoid; if that's the case then your best bet is probably to write a little Java application. It's probably only 100 lines or less.
Final possibility is to do the processing within Saxon: that is, a single transformation that reads multiple input files using the collection() function and generates multiple result files using xsl:result-document. Saxon (commercial editions) offers an extension function last-modified that allows you to filter the files to be processed. With 1000 files you might also want the extension function saxon:discard-document() to prevent the heap filling.

Personally, I like your original one-compiler-per-file formulation. Does not this work well with make's -j n flag?
You can of course batch up files by copying, and then running saxon at the end. Recursive make (ugh!) can sort out the ordering. Something like:
.PHONY: all
all:
rm -rf tmpdir
${MAKE} tmpdir/sentinel
saxon -s:tmpdir -o:output foo.xslt
tmpdir/sentinel: $(FILES) ; touch $#
$(FILES): output/%.xhtml: input/%.xhtml
ln $< $(patsubst input/%,tmpdir/%,$<)
This does work, though I am very queasy about lying to make (the static pattern rule purports to create the target in output/, but in fact does its dirty deed in tmpdir/).
Note in the recipe for tmpdir/sentinel, that $? is correctly set to the list of output files that are out of date. This might be useful if you can pass a bunch of files to saxon rather than a folder.

I think one issue here is that 'saxon' supports either one file or all files in a directory, so isn't suitable for batch processing without copying to temporary directories.
Otherwise, this is quite simple to do by using a timestamp marker file as a proxy target. For example:
output/.timestamp : $(FILES)
mkdir -p $(#D)
$(COMMAND) -outputdir=output $?
touch $#
The three commands are:
Ensure that the output directory exists.
Run the batch command on files newer than the timestamp file.
Update the timestamp file (creating it if necessary).
Remembering that each line of a command is executed in its own subshell, and that if any command line fails, then subsequent lines are not invoked.
This approach is useful with Java builds.

Related

Folderstructure with rsync in bash

I looked up the forum but didn't find an article which matches my problem. Maybe there is some, and you can help me out with it.
My problem is I want to sync an folder with the command rsync -a -v. The point is I got 5 different Maschinen. On every maschine is a scratch folder I want to sync into the folder: ~/work_dir/scratch_maschines and inside the /scratch_maschines folder should be a folder for maschine_a, maschine_b and so on.
On the maschines it is always the same path: /scratch/my_name. So when I use now this command for the first two maschines:
rsync -a -v --exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk' --delete sp02:/scratch/my_name ~/work_dir/scratch_maschine01; rsync -a -v --exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk' --delete maschine02:/scratch/my_name ~/work_dir/scratch_maschine02
I got a folders for scratch_maschine01 and scratch_maschine02 in my working directory but inside these folders are not direct my data there is first a folder inside with my_name and this folder contains the data. So my question is how can I use the rsync command and get the files from the scratch directorys straight to the folders for each machine?
You might want to consider reformulating your commands similar to the following:
START=`pwd`
EXCLUDES="--exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk'"
{ SOURCE="sp02:/scratch/my_name"
REMOTE="${HOME}/work_dir/scratch_maschine01"
cd "${SOURCE}"
rsync --recursive -v --delete ${EXCLUDES} "./" "${REMOTE}/"
}>${START}/job.log 2>${START}/job.err
The key elements there are
the --recursive which will rsync will expand to include all content and subdirs of the SOURCE directory.
the / behind the ${SOURCE} notifies rsync to limit itself to content of the SOURCE directory, but not the directory itself.
the / behind the ${REMOTE} notifies rsync to limit itself to depositing content into that directory and expect it to already exist, to specifically fail if that does not already exist at REMOTE; this ensures that the remote site doesn't attempt a failsafe PWD and deposit files elsewhere than expected.
The above approach lends itself to a function form that could be placed into a loop with pre-attempt condition checks, along with having a complementary case for all variable assignments grouped under a destination heading (i.e. case statements).
Using such an approach with meaningful labels for variables lends itself to a type of implicit documentation, making the code more meaningful to someone not familiar with the code, as well as a refresher for yourself after a long period of not working or using the code.
I try to avoid the "~" because I prefer to always enclose definitions for variables in double quotes, to avoid issues that might arise from paths that may include unexpected characters or spaces. That way, you are sure to have your defined paths correctly interpreted by commands in scripts.
Lastly, I prefer to use the long form for the rsync options (and almost every other command) so that I don't have to refer to the manual every time to translate the single-character options when trying to understand what is coded, if the need arises for troubleshooting unexpected errors (I have always had poor memory).
My own backup command is as follows. The only reason why the
${PathMirror}${dirC}/
is not encapsulated in single quotes within the double quotes for COM is because I know those variables all evaluate to non-complex strings which cannot be misinterpreted.

How to build a rpm that installs host dependent files

I have to build one rpm that copies the contents of file A to /path/to/tartetfile if the hostname is A. In all other cases the contents of B should be copied to /path/to/targetfile. I'm aware that this may be a misusage of rpm, but I still have to do it like this. Do you have any ideas how to get this done in an elegant way?
My solution at the moment would be to create an empty /path/to/targetfile in my BUILD directory as well as a /tmp/contents.tar.gz that contains the files A and B. In the postinstall routine i then would extract the relevant parts of /tmp/contents.tar.gz to /path/to/targetfile and delete the tarball afterwards. In the pre-uninstall routine I'd then touch the /tmp/contents.tar.gz to supress rpm reporting errors for an already deleted file.
To me this seems to be a very dirty way to get this done. Do you have better ones?
If you plan on abusing rpm for things it was not desinged for, you'll have to do dirty tricks.
I don't see another workaround for you. I fail to see the use of removing the tar.gz etc, unless that (little?) extra space is really a problem for you. I would propose:
package all files (A and B) into some specific directory (/usr/lib/your-package or whatever), not in compressed format.
in the %post section create just symlinks so that /path/to/targetfile points to /usr/lib/your-package/A or /usr/lib/your-package/B (symlinks take up almost no space). This has the additional value that ls -l /path/to/targetfile will show you which which file it points to, giving you the information whether this is file A or B.
in your %files section declare %ghost /path/to/targetfile for a nice cleanup upon removal.

VIM: how to get the file path/directory of opened buffer and do something?

my scenario is: I'm using vim to open some .cpp files, for example
vim 1.cpp src/2.cpp root/src/3.cpp
Sometimes, I wish to rebuild 3.cpp so I have to use another window to
"rm root/src/3.o"
and inside vim, type
":make"
This works fine, NP. But I am looking for a .vimrc function/command that:
When I switch to buffer, e.g. "root/src/3.cpp" and press this command, vim will detect the directory of "root/src" and the file name without suffix "3", and automatically execute a command of "rm root/src/3.o".
In this case, I can casually switch to any buffer and re-trigger the build of this very file.
Note I don't wish to map gmake tool command like "make clean" because we use several different make utilities like scons, cmake, etc.
So how to write this function/command in .vimrc? Thanks.
:call system('rm '.expand('%:p:r')) as #Kent said, or even simply :!rm %:p:r.
But I'm quite surprised you need to do that. Tools in charge of compilation chains usually understand dependencies (which ever the tool is), and you shouldn't need to remove the object file that often to need a mapping to do it for you.
PS: it's perfectly possible (but I need to update the doc) to support CMake, or out-of-source compilation from vim. But indeed, with out-of-sources compilation, you wouldn't need to delete those files manually, a :make clean if :make already works.
you can get root/src/3 from root/src/3.cpp buffer by:
expand('%:p:r')
Then you are free to concatenate the .o to end, and build the command.

Rsync previous half-copied files?

I found rsync behaves differently in the following two situations:
(1) All the files are copied by using rsync, then using rsync again will be fast (skip all the files);
(2) Use cp to copy files, then using rsync will be slow (or may be run freshly?)
So my confusion is "Does rsync generate any internal things on the files so that it can refer to avoid duplicate checking?"
rsync -a (in archive mode, which I presume you ran) retains all attributes of a file, including creation/modification time. cp does not. I suppose something in the file attributes that's different when you use cp, probably a later modification time, in the destination files, made rsync think they are newer files, so it either recopied them or had to check the contents.

Add last n lines of files to tar/zip

I need to regularly send a collection of log files that can grow quite large, so I would like to only send the last n lines of the each of the files.
for example:
/usr/local/data_store1/file.txt (500 lines)
/usr/local/data_store2/file.txt (800 lines)
Given a file with a list of needed files named files.txt, I would like to create an archive (tar or zip) with the last 100 lines of each of those files.
I can do this by creating a separate directory structure with the tail-ed files, but that seems like a waste of resources when there's probably some piping magic that can happen to accomplish it. Full directory structure also must be preserved since files can have the same names in different directories.
I would like the solution to be a shell script if possible, but perl (without added modules) is also acceptable (this is for Solaris machines that don't have ruby/python/etc.. installed on them.)
You could try
tail -n 10 your_file.txt | while read line; do zip /tmp/a.zip $line; done
where a.zip is the zip file and 10 is n or
tail -n 10 your_file.txt | xargs tar -czvf test.tar.gz --
for tar.gz
You are focusing in an specific implementation instead of looking at the bigger picture.
If the final goal is to have an exact copy of the files on the target machine while minimizing the amount of data transfered, what you should use is rsync, which automatically sends only the parts of the files that have changed and also can automatically compress while sending and decompress while receiving.
Running rsync doesn't need any more daemons on the target machine that the standard sshd one, and to setup automatic transfers without passwords you just need to use public key authentication.
There is no piping magic for that, you will have to create the folder structure you want and zip that.
mkdir tmp
for i in /usr/local/*/file.txt; do
mkdir -p "`dirname tmp/${i:1}`"
tail -n 100 "$i" > "tmp/${i:1}"
done
zip -r zipfile tmp/*
Use logrotate.
Have a look inside /etc/logrotate.d for examples.
Why not put your log files in SCM?
Your receiver creates a repository on his machine from where he retrieves the files by checking them out.
You send the files just by commiting them. Only the diff will be transmitted.

Resources