Get the first letter of a make variable - gnu-make

Is there a better way to get the first character of a GNU make variable than
FIRST=$(shell echo $(VARIABLE) | head -c 1)
(which is not only unwieldy but also calls the external shell)?

This is pretty horrible, but at least it doesn't invoke shell:
$(eval REMAINDER := $$$(VAR)) # variable minus the first char
FIRST := $(subst $(REMAINDER),,$(VAR)) # variable minus that

The GNU Make Standard Library provides a substr function
substr
Arguments: 1: A string
2: Start offset (first character is 1)
3: Ending offset (inclusive)
Returns: Returns a substring
I haven't tested it, but $(call substr,$(VARIABLE),1,1) should work

Since I came across this in my own search and didn't find what I was looking for here is what I ended up using to parse a hex number that could be applied to any known set of characters
letters := 0 1 2 3 4 5 6 7 8 9 a b c d e f
nextletter = $(strip $(foreach v,$(letters),$(word 2,$(filter $(1)$(v)%,$(2)) $v)))
then
INPUT := 40b3
firstletter := $(call nextletter,,$(INPUT))
secondletter := $(call nextletter,$(firstletter),$(INPUT))
thirdletter := $(call nextletter,$(firstletter)$(secondletter),$(INPUT))
etc.
It's ugly but it's shell agnostic

Related

csplit in zsh: splitting file based on pattern

I would like to split the following file based on the pattern ABC:
ABC
4
5
6
ABC
1
2
3
ABC
1
2
3
4
ABC
8
2
3
to get file1:
ABC
4
5
6
file2:
ABC
1
2
3
etc.
Looking at the docs of man csplit: csplit my_file /regex/ {num}.
I can split this file using: csplit my_file '/^ABC$/' {2} but this requires me to put in a number for {num}. When I try to match with {*} which suppose to repeat the pattern as much as possible, i get the error:
csplit: *}: bad repetition count
I am using a zshell.
To split a file on a pattern like this, I would turn to awk:
awk 'BEGIN { i=0; }
/^ABC/ { ++i; }
{ print >> "file" i }' < input
This reads lines from the file named input; before reading any lines, the BEGIN section explicitly initializes an "i" variable to zero; variables in awk default to zero, but it never hurts to be explicit. The "i" variable is our index to the serial filenames.
Subsequently, each line that starts with "ABC" will increment this "i" variable.
Any and every line in the file will then be printed (in append mode) to the file name that's generated from the text "file" and the current value of the "i" variable.

Loop and process over blocks of lines between two patterns in awk?

This is actually a continued version of thisquestion:
I have a file
1
2
PAT1
3 - first block
4
PAT2
5
6
PAT1
7 - second block
PAT2
8
9
PAT1
10 - third block
and I use awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag'
to extract the blocks of lines.
Extracting them works ok, but I'm trying to iterate over these blooks in a block-by-block fashion and do some processing with each block (e.g. save to file, process with other scripts etc.).
How can I construct such a loop?
Problem is not very clear but you may do something like this:
awk '/PAT1/ {
flag = 1
++n
s = ""
next
}
/PAT2/ {
flag = 0
printf "Processing record # %d =>\n%s", n, s
}
flag {
s = s $0 ORS
}' file
Processing record # 1 =>
3 - first block
4
Processing record # 2 =>
7 - second block
This might work for you (GNU sed):
sed -ne '/PAT1/!b;:a;N;/PAT2/!ba;e echo process:' -e 's/.*/echo "&"|wc/pe;p' file
Gather up the lines between PAT1 and PAT2 and process the collection.
In the example above, the literal process: is printed.
The command to print the result of the wc command for the collection is built and printed.
The result of the evaluation of the above command is printed.
N.B. The position of the p flag in the substitution command is critical. If the p is before the e flag the pattern space is printed before the evaluation, if the p flag is after the e flag the pattern space is post evaluation.

split one line into multiple line using number of bytes

I want to split single line into multiple line of 8 bytes each. And I am using the fold command and since this file contains the special characters the fold command does not work and it breaks in the middle of multibyte character.
File Content
あいbbえおかcc髙①こさし㈱㈱ちつて髙aabbc
Command Used
fold -b8 dummy_file.dat
Appreciate any help on this.
The problem here is that your text contains multi-bytes characters that will be broken by the fold command if we split them on 2 lines.
echo "あいbbえおかcc髙①こさし㈱㈱ちつて髙aabbc" | fold -b8
あいbb
えお��
�cc髙��
�こさ�
��㈱㈱
ちつ��
�髙aabb
c
If you want to have 8 characters per line you can use the following sed command:
echo "あいbbえおかcc髙①こさし㈱㈱ちつて髙aabbc" | sed 's/.\{8\}/&\n/g'
あいbbえおかc
c髙①こさし㈱㈱
ちつて髙aabb
c
that add a breakline after each occurrence of 8 characters.
If you do not want to display 8 characters but want to constraint each line to be at most 8 bytes without breaking the text content then you can use the python script:
import sys
def utf8len(s):
return len(s.encode('utf-8'))
entry = unicode(sys.stdin.read(),'utf-8')
tmp = ''
for c in entry:
if utf8len(tmp)+utf8len(c) > 8:
print tmp
tmp = c
elif utf8len(tmp)+utf8len(c) == 8:
print tmp,c
tmp = ''
else:
tmp += c
if tmp:
print tmp
output:
echo -n "あいbbえおかcc髙①こさし㈱㈱ちつて髙aabbc" | python max8bytes.py
あいb b
えお
かcc 髙
①こ
さし
㈱㈱
ちつ
て髙a a
bbc
Explanations:
You define a function that will count how many bytes you have per char.
You read char by char stdin and you avoid to have more than 8 bytes on the same line. If you do not want to have less than you can add some spaces char at the end of each line.

Extract data before and after matching (BIG FILE )

I have got a big file ( arounf 80K lines )
my main goal is to find the patterns and pring for example 10 lines before and 10 lines after the pattern .
the pattern accures multiple times across the file .
using the grep command :
grep -i <my_pattern>* -B 10 -A 10 <my_file>
i get only some of the data , i think it must be something related to the buffer size ....
i need a command ( grep , sed , awk ) that will handle all the matching
and will print 10 line before and after the pattern ...
Example :
my patterns hides here :
a
b
c
pattern_234
c
b
a
a
b
c
pattern_567
c
b
a
this happens multiple times across the file .
running this command :
grep -i pattern_* -B 3 -A 3 <my_file>
will get he right output :
a
b
c
c
b
a
a
b
c
c
b
it works but not full time
if i have 80 patterns not all the 80 will be shown
awk to the rescue
awk -vn=4 # pass the argument of context line count
'{
for(i=1;i<=n;i++) # store the past n lines in an indexed array
p[i]=p[i+1];
p[n+1]=$0
}
/pattern/ # if pattern matched
{
c=n+1; # set the counter to after match line count
for(i=1;i<=n;i++) # print previously saved entries
print p[i]
}
c-->0' # print the lines after match until counter runs out
will print 4 lines before and 4 lines after the match of pattern, change the value of n as per your need.
if non-symmetric before/after you need two variables
awk -vb=2 -va=3 '{for(i=1;i<=b;i++) p[i]=p[i+1];p[b+1]=$0} /pattern/{c=a+1;for(i=1;i<=b;i++) print p[i]} c-->0'

Grep examples - can't understand

Given the following commands:
ls | grep ^b[^b]*b[^b]
ls | grep ^b[^b]*b[^b]*
I know that ^ marks the start of the line, but can anyone give me a brief explanation about
these commands? what do they do? (Step by step)
thanks!
^ can mean two things:
mark the beginning of a line
or it negates the character set (whithin [])
So, it means:
lines starting with 'b'
matching any (0+) characters Other than 'b'
matching another 'b'
followed by something not-'b' (or nothing at all)
It will match
bb
bzzzzzb
bzzzzzbzzzzzzz
but not
zzzzbb
bzzzzzxzzzzzz
1)starts with b and name continues with a 0 or more characters which are not b and then b and then continues with a character which is not b
2)starts with b and name continues with a 0 or more characters which are not b and then b and then continues with 0 or more characters which are not b

Resources