Assign awk output to awk variable - unix

How can I call an awk script from an awk script and assign the output of the first to a variable in the second?
I have an awk script that reads files each night, checks each line of data and writes to a new file. I need to add in some additional formatting to one of the fields. I already have a standalone awk script that does the formatting so all I need to do is call this awk script for the appropriate fields and assign the value that is normally printed to a variable.
To put it in context, the following prints the required formatting to the screen (because that's what title_case.awk does), but I can’t use the value for further processing.
print old_name | ("/production/bin/title_case.awk")
so I need something like the following:
new_name = 'old_name | ("/production/bin/title_case.awk")
Thanks,
Ger

You can try using getline into a variable ( http://www.gnu.org/software/gawk/manual/gawk.html#Getline_002fVariable_002fPipe )
("/production/bin/title_case.awk "old_name) | getline new_name

Related

How to encrypt every name in a list ZSH scripting using a for loop

I'm new to zsh scripting and I was wondering if it's possible to use the sha256sum function to encrypt every value in a list.
Here is what I have tried so far:
#!/bin/zsh
filenames=`cat filenames.txt`
output='shaNames.txt'
for name in $filenames
do
echo -n $name | sha256sum >> $output
done
What I'm trying to accomplish is to encrypt every name in the list and append it to a new text file.
Any suggestions on what am I doing wrong are appreciated.
You are assigning the output of cat filenames.txt to a multiline variable. The for loop will then only loop once over the content.
What you want to do instead is e.g.:
for name in $(cat filenames.txt)
do
echo -n "$name" | sha256sum >> "$output"
done
Note that while you can still use them, backticks are deprecated in favor of $(somecommand).
Also note that you should always put variables in double quotes, as they could contain spaces.
Your method would fail anyways if one line of your textfile would contain a space.
You could use the following instead:
while read name
do
echo -n "$name" | sha256sum >> "$output"
done < filenames.txt
To anyone who might need the same. What I was doing wrong was assigning the values in the file to a single string variable instead of a list.
To correct that one must use:
filenames=(`cat filenames.txt`)
The parenthesis indicates that a list or array is stored in the filenames variable.

Calling a function from awk with variable input location

I have a bunch of different files.We have used "|" as delimeter All files contain a column titled CARDNO, but not necessarily in the same location in all of the files. I have a function called data_mask. I want to apply to CARDNO in all of the files to change them into NEWCARDNO.
I know that if I pass in the column number of CARDNO I can do this pretty simply, say it's the 3rd column in a 5 column file with something like:
awk -v column=$COLNUMBER '{print $1, $2, FUNCTION($column), $4, $5}' FILE
However, if all of my files have hundreds of columns and it's somewhere arbitrary in each file, this is incredibly tedious. I am looking for a way to do something along the lines of this:
awk -v column=$COLNUMBER '{print #All columns before $column, FUNCTION($column), #All columns after $column}' FILE
My function takes a string as an input and changes it into a new one. It takes the value of the column as an input, not the column number. Please suggest me Unix command which can pass the column value to the function and give the desired output.
Thanks in advance
If I understand your problem correctly, the first row of the file is the header and one of those columns is named CARDNO. If this is the case then you just search for the header in that file and process accordingly.
awk 'BEGIN{FS=OFS="|";c=1}
(NR==1){while($c != "CARDNO" && c<=NF) c++
if(c>NF) exit
$c="NEWCARDNO" }
(NR!=1){$c=FUNCTION($c)}
{print}' <file>
As per comment, if there is no header in the file, but you know per file, which column number it is, then you can simply do:
awk -v c=$column 'BEGIN{FS=OFS="|"}{$c=FUNCTION($c)}1' <file>

Median Calculation in Unix

I need to calculate median value for the below input file. It is working fine for odd occurrences but not for even occurrences. Below is the input file and the script used. Could you please check what is wrong with this command and correct the same.
Input file:
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
Desired Output:
AR,3.31
Unix command:
cat test.txt | sort -t"," -k2n,2 | awk '{arr[NR]=$1} END { if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}'
Don't forget that your input file has an additional line, containing the header. You need to take an additional step in your awk script to skip the first line.
Also, due to the fact you're using the default field separator, $1 will contain the whole line, so your code arr[NR/2]+arr[NR/2+1])/2 is never going to work. I would suggest that you changed it so that awk splits the input on a comma, then use the second field $2.
sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
I also removed your useless use of cat. Most tools, including sort and awk, are capable of reading in files directly, so you don't need to use cat with them.
Testing it out:
$ cat file
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
$ sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
3.31
It shouldn't be too difficult to modify the script slightly to change the output to whatever you want.

How to extract the given list of lines from a file without writing a script

The task is this:
For a given list of numbers(potentially very long) like this:
1
5
8
...(very long)
extract the corresponding line from the second file.
I had to write a simple python code to accomplish this task, but I was wondering if there is a way to do this without resorting to the scripts. Something along the lines of using process substitutions and combination of coreutils:
SOME_COMMANDLINE_FU <(cat first_file) second_file
The below is the python code I wrote:
#!/usr/bin/env python
import sys
# select.py <LINE_INDEX> <FILE>
line_numbers = open(sys.argv[1],"r").readlines()
line_numbers = map(int, line_numbers)
with open(sys.argv[2],"r") as f:
index = 1
for line in f:
if index in line_numbers:
print line,
index = index + 1
Just loop through the numbers file and store them in an array. Then, read the seconf file and check on each line if its number is in the stored array:
awk 'FNR==NR {a[$1]; next} FNR in a' file1 file2
The FNR==NR {} trick makes {} to be executed when reading the first file. Then, the rest is executed when reading the second one. More info in Idiomatic awk.
I guess that depends on what is considered a script. One way to extract the lines is to use awk:
awk '{system("awk NR=="$1" second_file")}' first_file

Field spearator to used if they are not escaped using awk

i have once question, suppose i am using "=" as fiels seperator, in this case if my string contain for example
abc=def\=jkl
so if i use = as fields seperator, it will split into 3 as
abc def\ jkl
but as i have escaped 2nd "=" , my output should be as
abc def\=jkl
Can anyone please provide me any suggestion , if i can achieve this.
Thanks in advance
I find it simplest to just convert the offending string to some other string or character that doesn't appear in your input records (I tend to use RS if it's not a regexp* since that cannot appear within a record, or the awk builtin SUBSEP otherwise since if that appears in your input you have other problems) and then process as normal other than converting back within each field when necessary, e.g.:
$ cat file
abc=def\=jkl
$ awk -F= '{
gsub(/\\=/,RS)
for (i=1; i<=NF; i++) {
gsub(RS,"\\=",$i)
print i":"$i
}
}' file
1:abc
2:def\=jkl
* The issue with using RS if it is an RE (i.e. multiple characters) is that the gsub(RS...) within the loop could match a string that didn't get resolved to a record separator initially, e.g.
$ echo "aa" | gawk -v RS='a$' '{gsub(RS,"foo",$1); print "$1=<"$1">"}'
$1=<afoo>
When the RS is a single character, e.g. the default newline, that cannot happen so it's safe to use.
If it is like the example in your question, it could be done.
awk doesn't support look-around regex. So it would be a bit difficult to get what you want by setting FS.
If I were you, I would do some preprocessing, to make the data easier to be handled by awk. Or you could read the line, and using other functions by awk, e.g. gensub() to remove those = s you don't want to have in result, and split... But I guess you want to achieve the goal by playing field separator, so I just don't give those solutions.
However it could be done by FPAT variable.
awk -vFPAT='\\w*(\\\\=)?\\w*' '...' file
this will work for your example. I am not sure if it will work for your real data.
let's make an example, to split this string: "abc=def\=jkl=foo\=bar=baz"
kent$ echo "abc=def\=jkl=foo\=bar=baz"|awk -vFPAT='\\w*(\\\\=)?\\w*' '{for(i=1;i<=NF;i++)print $i}'
abc
def\=jkl
foo\=bar
baz
I think you want that result, don't you?
my awk version:
kent$ awk --version|head -1
GNU Awk 4.0.2

Resources