Unix awk Substring string comparison - unix

I want to find if a substring is contained in a string using Unix AWK command.
eg, pseudocode:
a= commandline
b=line
if(b is contained in a)
print "success "

$ awk 'BEGIN{a="commandline";b="line";if (a ~ b){print "success"}}'
success

Related

Match function in unix to find if string ends with particular input value?

I need a regular expression that I can use in match() function to see if value given at command line argument exists at the end of a given string.
I am using awk then trying use match function to get above result:
while read line; do
cat $line | awk -v value="$2.$" '{ if ( match("$1,value) != 0 ) print match("arvind","ind.$") " " "arvind" }'
done < xreffilelist.txt

jq: How to output quotes on raw output on windows

Using raw output I have to quote some values of the output.
echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \" + .a + \".\""
generates
Result is: b.
but how can I generate
Result is: "b".
Unfortunately it has to run on Windows called from inside a CMD file.
You need to escape the slashes to escape a "
$ echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \\\"\" + .a + \"\\\".\""
Result is: "b".
A hacky workaround with less backslashing could be:
jq -r ".[] | \"Result is: \" + (.a|tojson)"
[REVISED to reflect OP goal.]
Since you're trying to output double quotes in a double quoted string, you need to escape the inner quotes. And to escape the inner quotes, you need to also escape the escaping backslashes. So a literal double quote would have to be entered as \\\". You can do this a little cleaner by using string interpolation instead of regular string concatenation.
jq -r ".[] | \"Result is: \\\"\(.a)\\\".\""

Find the number of occurences of a word using awk and a variable

I am trying to find the number of fields that contain the word entered by the user. I do not know the syntax to use the variable in my awk statement. If I just use a literal string of $i == "Washington" it works, but I need it to use the input. When I try this it returns nothing:
<code>
read Choice
awk '{
for (i=1;i<=NF;i++)
if ( $i == "$Choice")
c++
}
END{
print c}' DC_Area.csv
</code>
Shell variables are not visible in awk. That's why the code in the OP doesn't work. Use the -v option to pass on shell variables to awk.
Try
read Choice
awk -v Choice="$Choice" '{
for (i=1;i<=NF;i++)
if ( $i == Choice)
c++
}
END{
print c}' DC_Area.csv

How do I fetch this substring using awk?

I have a string let's say
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
Now I want to fetch only CUSTOM_executable from the above string. This is what I have tried so far in Unix
echo $k|awk -F '_' '{print $2}'
Can you explain how can i do this
Try this :
$ echo "$k"
CHECK_111_CUSTOM_executable.acs
code:
echo "$k" | awk 'BEGIN{FS=OFS="_"}{sub(/.acs/, "");print $3, $4}'
Assume the variable ${SOMETHING} has the value SOMETHING just for simplicity.
The following assignment, therefore,
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
sets the value of k to CHECK_SOMETHING_CUSTOM_executable.acs.
When split into fields on _ by awk -F '_' (note the single quotes aren't necessary here).
You get the following fields:
$ echo "$k" | awk -F _ '{for (i=0; i<=NF; i++) {print i"="$i}}'
0=CHECK_SOMETHING_CUSTOM_executable.acs
1=CHECK
2=SOMETHING
3=CUSTOM
4=executable.acs
So to get the output you want simply use
echo "$k" | awk -F _ -v OFS=_ '{print $3,$4}'
Suppose if SOMETHING variable is having 111_222_333 (or) 111_222_333_444,
Use this:
$ k=CHECK_${SOMETHING}_CUSTOM_executable.acs
$ echo $k | awk 'BEGIN{FS=OFS="_"}{ print $(NF-1),$NF }'
(Or)
echo $k | awk -F_ '{ print $(NF-1), $NF }' OFS=_
Explanation :
NF - The number of fields in the current input record.
Try this simple awk:
awk -F[._] '{print $3"_"$4}' <<<"$k"
CUSTOM_executable
The -F[._] defines both dot and underline as field separator. Then awk prints the filed number 3 and 4 from $k as input.
If the k contains k='CHECK_${111_111}_CUSTOM_executable.acs', then use filed with numbers $4 and $5:
awk -F[._] '{print $4"_"$5}' <<<"$k"
CHECK_${111_111}_CUSTOM_executable.acs
| $1| |$2 | |$3| | $4 | | $5 | |$6|
You do not need to use awk, it can be done in bash easily. I assume that $SOMETHING does not contains _ characters (also CUSTOM and executable part is just some text, they also not contains _). Then:
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k#*_}; l=${l#*_}; l=${l%.*};
This cuts anything from the beginning to the 2nd _ char, and chomps off anything after the last . char. Result is put into the l env.var.
If $SOMETHING may contain _ then a little bit work has to be done (I assume the CUSTOM and executable part does not contain _):
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k%_*}; l=${l%_*}; l=${k#${l}_*}; l=${l%.*};
This chomps off everything after the last but one _ character, the cuts the result off from the original string. The last statement chomps the extension off. The result is in l env.var.
Or it can be done using regex:
[[ $k =~ ([^_]+_[^_]+)\.[^.]+$ ]] && l=${BASH_REMATCH[1]}
This matches any string containing two words separated by _ and finished with .<extension>. The extension part is chomped off and result is in l env.var.
I hope this helps!

AWK : Comparing difference of files with different delimiter

1.txt
1|2|3
4|5|6
7|3|6
2.txt (double pipe)
1||2||3
4||5||6
expected
7|3|6
I want to compare 1.txt and 2.txt and print the difference . Note that the numbers of columns can vary each time
awk -F"|" 'NR==FNR{a[$0]++;next} !(a[$0])' 2.txt 1.txt
How can I modify the code to include delimiters in each files.
The code below works for first field alone but I am not sure how it separated the fields by double pipe
awk -F"|" 'NR==FNR{a[$1]++;next} !(a[$1])' 2.txt 1.txt
One simple workaround would be to squeeze the double delimiters in the second file before feeding to awk:
awk -F"|" 'NR==FNR{a[$0]++;next} !(a[$0])' <(tr -s '|' < 2.txt) 1.txt
For your sample input, it'd produce:
7|3|6
EDIT: You assert that
awk -F"|" 'NR==FNR{a[$1]++;next} !(a[$1])' 2.txt 1.txt
works. It doesn't do what you expect. It compares only the first field and not the entire line.
You can use this awk,
awk -F"|" 'NR==FNR{gsub(/\|\|/,"|",$0);a[$0]++;next} !(a[$0])' 2.txt 1.txt
I typically use bash features to accomplish this:
diff 1.txt <(sed 's/||/|/g' < 2.txt)
You can use regexp as a delimiter in gawk, like so if you don't mind if your output is unsorted (as arrays in awk), you can do it with a single command:
gawk 'BEGIN {FS="\\|\\|*"} {gsub(FS,"|") ; a[$0]++} END {for (k in a) {if ( a[k] > 0 ) { print k } } }'
BEGIN {FS="\\|\\|*"} ==> The field separator is one or more |
{gsub(FS,"|") ; a[$0]++} ==> On every line normalize the number of separator |s to one and store the line in an array, or if it's already in the array, increment the value related to it
END {for (k in a) {if ( a[k] > 0 ) { print k } } } finally print every array element where it found more than once.

Resources