Chained string substitions in gmake - gnu-make

In GNU make, I'd like to perform multiple string substitutions on a blob of text containing several "placeholders", e.g.:
MYTEXT:= blabla _FIRST_PLACEHOLDER_ blabla _SECOND_PLACEHOLDER_whateverblabla_THIRD_PLACEHOLDER_blablabla
So I'd like to replace the "placeholders" with values as follows:
_FIRST_PLACEHOLDER_ => FIRST_VAL
_SECOND_PLACEHOLDER_ => SECOND_VAL
_THIRD_PLACEHOLDER_ => THIRD_VAL
...
The following shows a hideous way of obtaining the result I'd like:
$(subst _FIRST_PLACEHOLDER_,FIRST_VAL, $(subst _SECOND_PLACEHOLDER_,SECOND_VAL, $(subst _THIRD_PLACEHOLDER_,THIRD_VAL, $(MYTEXT))))
A solution would be straightforward to find outside the make world, but is there a better way than the above to perform such a recursive substitution while remaining within the confines of make? I tried using $(foreach), but this simply concatenates the result of each substitution applied once to the initial $(MYTEXT).

Iterative solution
This solution requires overwriting of the variables _p and _x.
# -*- gnu-make -*-
ORIGINAL := 123__PLACE_HOLDER__1567__PLACE_HOLDER__2890
REPLACEMENT_LIST := \
__PLACE_HOLDER__1=ABC \
__PLACE_HOLDER__2=DEF \
_replace1 = $(eval _x := $(subst $(word 1,$(1)),$(word 2,$(1)),$(_x)))
replace = $(strip \
$(eval _x := $(strip $(2))) \
$(foreach _p,$(strip $(1)),$(call _replace1,$(subst =, ,$(_p)))) \
$(_x) \
$(eval _x :=) \
)
$(info ORIGINAL: '$(ORIGINAL)')
$(info REPLACEMENT: '$(call replace,$(REPLACEMENT_LIST),$(ORIGINAL))')
.PHONY: all
all:
Example run:
$ make
ORIGINAL: '123__PLACE_HOLDER__1567__PLACE_HOLDER__2890'
REPLACEMENT: '123ABC567DEF890'
make: Nothing to be done for 'all'.
Recursive solution
This solution has the advantage of not modifying any variable.
_replace2 = $(subst $(word 1,$(1)),$(word 2,$(1)),$(2))
_replace1 = $(call replace,$(2),$(call _replace2,$(subst =, ,$(1)),$(3)))
replace = $(if $(1),$(call _replace1,$(firstword $(1)),$(wordlist 2,1000000,$(1)),$(2)),$(2))
or
_replace1 = $(subst $(word 1,$(1)),$(word 2,$(1)),$(2))
replace = $(if $(1),$(call replace,$(wordlist 2,1000000,$(1)),$(call _replace1,$(subst =, ,$(firstword $(1))),$(2))),$(2))

Related

having R print a system call that contains "", '', and escape character \

I need to run a perl command from within an R script. I would normally do this via:
system(paste0('my command'))
However, the command I want to paste contains both single and double quotes and an escape character. Specifically, I would like to paste this command:
perl -pe '/^>/ ? print "\n" : chomp' in.fasta | tail -n +2 > out.fasta
I have tried escaping the double quotes with more escape characters, which allows me to pass the command, but it then prints all 3 escape characters, which causes the command to fail. Is there a good way around this, such that I can save the above perl line as a string in R, that I can then pass to the system() function?
Hey I haven't tested your particular perl call (since it involves particular file/directory etc) but tried something trivial by escaping the quotes and it seems to work. You might want to refer this question for more as well.
My approach,
# shouldnt have any text expect for an empty string
my_text <- try(system(" perl -e 'print \"\n\"' ", intern = TRUE))
my_text
[1] ""
# should contain the string - Hello perl from R!
my_text2 <- try(system(" perl -e 'print \"Hello perl from R!\"' ", intern = TRUE))
my_text2
[1] "Hello perl from R!"
So based on the above trials I think this should work for you -
try(system(command = "perl -pe '/^>/ ? print \"\n\" : chomp' in.fasta | tail -n +2 > out.fasta", intern = TRUE))
Note - intern = TRUE just captures the output as a character vector in R.

Define specific output count in EXPR command

I have a scenario wherein I want to have 9 character count in expr.
I have sample code which is:
var1=012345678 #this is 9 characters
sum=`expr $var1 + 1`
echo "$sum"
Here is the result:
./sample.sh : 12345679 #this is only 8 characters
My expected output:
./sample.sh : 012345679
Any help on this?
The leading zero is removed when doing the math.
You can force a 9 length output using printf "%09d" 123.
When you try to use the the syntax ((sum=${var1} + 1 )) you have another problem: When the first digit is 0, bash expects a different radix.
You can remove the first 0 with
var1=012345678
echo "${var1#0}"
This only helps with your input, not with 00012.
Removing the leading zeroes and printing the sum can be done with echo $((10#$var1))
var1=00012345678
((sum=$((10#$var1)) + 1))
printf "%09d\n" $sum
This can be solved easier with
var1=00012345678
echo "${var1} 1" |awk '{ printf("%09d\n", $1 + $2) }'
You can avoid the echo with
awk -v var1=$var1 'BEGIN { printf("%09d\n", var1 + 1) }'
The BEGIN is used for parsing without an inputfile.
The option -v is a clean way to use a shell variable inside an awk script.
Do not try things with quotes, one day it will shoot your own foot:
# Don't do this
awk 'BEGIN { printf("%09d\n", '${var1}' + 1) }' # Just do not do it

Better string interpolation in R

I need to build up long command lines in R and pass them to system(). I find it is very inconvenient to use paste0/paste function, or even sprintf function to build each command line. Is there a simpler way to do like this:
Instead of this hard-to-read-and-too-many-quotes:
cmd <- paste("command", "-a", line$elem1, "-b", line$elem3, "-f", df$Colum5[4])
or:
cmd <- sprintf("command -a %s -b %s -f %s", line$elem1, line$elem3, df$Colum5[4])
Can I have this:
cmd <- buildcommand("command -a %line$elem1 -b %line$elem3 -f %df$Colum5[4]")
For a tidyverse solution see https://github.com/tidyverse/glue. Example
name="Foo Bar"
glue::glue("How do you do, {name}?")
With version 1.1.0 (CRAN release on 2016-08-19), the stringr package has gained a string interpolation function str_interp() which is an alternative to the gsubfn package.
# sample data
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
# do the string interpolation
stringr::str_interp("command -a ${line$elem1} -b ${line$elem3} -f ${df$Colum5[4]}")
#[1] "command -a 10 -b 30 -f 4"
This comes pretty close to what you are asking for. When any function f is prefaced with fn$, i.e. fn$f, character interpolation will be performed replacing ... with the result of running ... as an R expression.
library(gsubfn)
cmd <- fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
Here is a self contained reproducible example:
library(gsubfn)
# test inputs
line <- list(elem1 = 10, elem3 = 30)
df <- data.frame(Colum5 = 1:4)
fn$identity("command -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## [1] "command -a 10 -b 30 -f 4"
system
Since any function can be used we could operate directly on the system call like this. We have used echo here to make it executable but any command could be used.
exitcode <- fn$system("echo -a `line$elem1` -b `line$elem3` -f `df$Colum5[4]`")
## -a 10 -b 30 -f 4
Variation
This variation would also work. fn$f also performs substitution of $whatever with the value of variable whatever. See ?fn for details.
with(line, fn$identity("command -a $elem1 -b $elem3 -f `df$Colum5[4]`"))
## [1] "command -a 10 -b 30 -f 4"
Another option would be to use whisker.render from https://github.com/edwindj/whisker which is a {{Mustache}} implementation in R. Usage example:
require(dplyr); require(whisker)
bedFile="test.bed"
whisker.render("processing {{bedFile}}") %>% print
Not really a string interpolation solution, but still a very good option for the problem is to use the processx package instead of system() and then you don't need to quote anything.
library(GetoptLong)
str = qq("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
cat(str)
qqcat("region = (#{region[1]}, #{region[2]}), value = #{value}, name = '#{name}'")
https://cran.r-project.org/web/packages/GetoptLong/vignettes/variable_interpolation.html

How to turn strings on the command line into individual positional parameters

My main question is how to split strings on the command line into parameters using a terminal command in Linux?
For example
on the command line:
./my program hello world "10 20 30"
The parameters are set as:
$1 = hello
$2 = world
$3 = 10 20 30
But I want:
$1 = hello
$2 = world
$3 = 10
$4 = 20
$5 = 30
How can I do it correctly?
You can reset the positional parameters $# by using the set builtin. If you do not double-quote $#, the shell will word-split it producing the behavior you desire:
$ cat my_program.sh
#! /bin/sh
i=1
for PARAM; do
echo "$i = $PARAM";
i=$(( $i + 1 ));
done
set -- $#
echo "Reset \$# with word-split params"
i=1
for PARAM; do
echo "$i = $PARAM";
i=$(( $i + 1 ));
done
$ sh ./my_program.sh foo bar "baz buz"
1 = foo
2 = bar
3 = baz buz
Reset $# with word-split params
1 = foo
2 = bar
3 = baz
4 = buz
As an aside, I find it mildly surprising that you want to do this. Many shell programmers are frustrated by the shell's easy, accidental word-splitting — they get "John", "Smith" when they wanted to preserve "John Smith" — but it seems to be your requirement here.
Use xargs:
echo "10 20 30" | xargs ./my_program hello world
xargs is a command on Unix and most Unix-like operating systems used
to build and execute command lines from standard input. Commands such as
grep and awk can accept the standard input as a parameter, or argument
by using a pipe. However, others such as cp and echo disregard the
standard input stream and rely solely on the arguments found after the
command. Additionally, under the Linux kernel before version 2.6.23,
and under many other Unix-like systems, arbitrarily long lists of
parameters cannot be passed to a command,[1] so xargs breaks the list
of arguments into sublists small enough to be acceptable.
(source)

Get specific lines from a text file

I am working on a UNIX box, and trying to run an application, which gives some debug logs to the standard output. I have redirected this output to a log file, but now wish to get the lines where the error is being shown.
My problem here is that a simple
cat output.log | grep FAIL
does not help out. As this shows only the lines which have FAIL in them. I want some more information along with this. Like the 2-3 lines above this line with FAIL. Is there any way to do this via a simple shell command? I would like to have a single command line (can have pipes) to do the above.
grep -C 3 FAIL output.log
Note that this also gets rid of the useless use of cat (UUOC).
grep -A $NUM
This will print $NUM lines of trailing context after matches.
-B $NUM prints leading context.
man grep is your best friend.
So in your case:
cat log | grep -A 3 -B 3 FAIL
I have two implementations of what I call sgrep, one in Perl, one using just pre-Perl (pre-GNU) standard Unix commands. If you've got GNU grep, you've no particular need of these. It would be more complex to deal with forwards and backwards context searches, but that might be a useful exercise.
Perl solution:
#!/usr/perl/v5.8.8/bin/perl -w
#
# #(#)$Id: sgrep.pl,v 1.6 2007/09/18 22:55:20 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}
Pre-Perl Unix solution (using plain ed, sed, and sort - though it uses getopt which was not necessarily available back then):
#!/bin/ksh
#
# #(#)$Id: old.sgrep.sh,v 1.5 2007/09/15 22:15:43 jleffler Exp $
#
# Special grep
# Finds a pattern and prints lines either side of the pattern
# Line numbers are always produced by ed (substitute for grep),
# which allows us to eliminate duplicate lines cleanly. If the
# user did not ask for numbers, these are then stripped out.
#
# BUG: if the pattern occurs in in the first line or two and
# the number of lines to go back is larger than the line number,
# it fails dismally.
set -- `getopt "f:b:hn" "$#"`
case $# in
0) echo "Usage: $0 [-hn] [-f x] [-b y] pattern [files]" >&2
exit 1;;
esac
# Tab required - at least with sed (perl would be different)
# But then the whole problem would be different if implemented in Perl.
number="'s/^\\([0-9][0-9]*\\) /\\1:/'"
filename="'s%^%%'" # No-op for sed
f=3
b=3
nflag=no
hflag=no
while [ $# -gt 0 ]
do
case $1 in
-f) f=$2; shift 2;;
-b) b=$2; shift 2;;
-n) nflag=yes; shift;;
-h) hflag=yes; shift;;
--) shift; break;;
*) echo "Unknown option $1" >&2
exit 1;;
esac
done
pattern="${1:?'No pattern'}"
shift
case $# in
0) tmp=${TMPDIR:-/tmp}/`basename $0`.$$
trap "rm -f $tmp ; exit 1" 0
cat - >$tmp
set -- $tmp
sort="sort -t: -u +0n -1"
;;
*) filename="'s%^%'\$file:%"
sort="sort -t: -u +1n -2"
;;
esac
case $nflag in
yes) num_remove='s/[0-9][0-9]*://';;
no) num_remove='s/^//';;
esac
case $hflag in
yes) fileremove='s%^$file:%%';;
no) fileremove='s/^//';;
esac
for file in $*
do
echo "g/$pattern/.-${b},.+${f}n" |
ed - $file |
eval sed -e "$number" -e "$filename" |
$sort |
eval sed -e "$fileremove" -e "$num_remove"
done
rm -f $tmp
trap 0
exit 0
The shell version of sgrep was written in February 1989, and bug fixed in May 1989. It then remained unchanged except for an administrative change (SCCS to RCS transition) in 1997 until 2007, when I added the -h option. I switched to the Perl version in 2007.
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx
You can use sed to print specific lines, lets say you want line 20
sed '20 p' -n FILE_YOU_WANT_THE_LINE_FROM
Done.
-n prevents echoing lines from the file. The part in quotes is a sed rule to apply, it specifies that you want the rule to apply to line 20, and you want to print.
With GNU grep on Windows:
$ grep --context 3 FAIL output.log
$ grep --help | grep context
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-NUM same as --context=NUM

Resources