Search list of ids in a log file in unix - unix

Sorry to ask this, might be a trivial question, tried awk script as well. But I think I am new to that.
I have a list of Ids in a file i.e. ids.txt
1xre23
223dsf
234ewe
and a log file with FIX messages which might contain those ids.
sample: log file abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=abcd23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=xyzw23^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
I want to check how many ids matched in that log file.
Ids are large almost 10K, and log file size is around 300MB.
sample output I am looking for is.
output:
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A

Try something like with grep command:
grep -w -f ids.txt abc.log
Output:
35=D^A54=1xre23^A22=s^A120=GBP^A<br>
35=D^A54=234ewe^A22=s^A120=GBP^A<br>
35=D^A54=223dsf^A22=s^A120=GBP^A<br>

If you like to use awk this should do:
awk -F"[=^]" 'FNR==NR {a[$0];next} $4 in a' ids.txt abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
This store the ids.txt in array a
If fourth field (separated by = and ^) contains the ID, print it.
You can also do it the other way around:
awk 'FNR==NR {a[$0];next} {for (i in a) if ($0~i) print}' abc.log ids.txt
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Store all data from abc.log in array a
Then test if line contains data for id.txt
If yes, print the line.

Related

Awk command to perform action on lines excluding 1st and last

I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)
This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv
$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f
You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.
Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv

Calling a function from awk with variable input location

I have a bunch of different files.We have used "|" as delimeter All files contain a column titled CARDNO, but not necessarily in the same location in all of the files. I have a function called data_mask. I want to apply to CARDNO in all of the files to change them into NEWCARDNO.
I know that if I pass in the column number of CARDNO I can do this pretty simply, say it's the 3rd column in a 5 column file with something like:
awk -v column=$COLNUMBER '{print $1, $2, FUNCTION($column), $4, $5}' FILE
However, if all of my files have hundreds of columns and it's somewhere arbitrary in each file, this is incredibly tedious. I am looking for a way to do something along the lines of this:
awk -v column=$COLNUMBER '{print #All columns before $column, FUNCTION($column), #All columns after $column}' FILE
My function takes a string as an input and changes it into a new one. It takes the value of the column as an input, not the column number. Please suggest me Unix command which can pass the column value to the function and give the desired output.
Thanks in advance
If I understand your problem correctly, the first row of the file is the header and one of those columns is named CARDNO. If this is the case then you just search for the header in that file and process accordingly.
awk 'BEGIN{FS=OFS="|";c=1}
(NR==1){while($c != "CARDNO" && c<=NF) c++
if(c>NF) exit
$c="NEWCARDNO" }
(NR!=1){$c=FUNCTION($c)}
{print}' <file>
As per comment, if there is no header in the file, but you know per file, which column number it is, then you can simply do:
awk -v c=$column 'BEGIN{FS=OFS="|"}{$c=FUNCTION($c)}1' <file>

Median Calculation in Unix

I need to calculate median value for the below input file. It is working fine for odd occurrences but not for even occurrences. Below is the input file and the script used. Could you please check what is wrong with this command and correct the same.
Input file:
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
Desired Output:
AR,3.31
Unix command:
cat test.txt | sort -t"," -k2n,2 | awk '{arr[NR]=$1} END { if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}'
Don't forget that your input file has an additional line, containing the header. You need to take an additional step in your awk script to skip the first line.
Also, due to the fact you're using the default field separator, $1 will contain the whole line, so your code arr[NR/2]+arr[NR/2+1])/2 is never going to work. I would suggest that you changed it so that awk splits the input on a comma, then use the second field $2.
sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
I also removed your useless use of cat. Most tools, including sort and awk, are capable of reading in files directly, so you don't need to use cat with them.
Testing it out:
$ cat file
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
$ sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
3.31
It shouldn't be too difficult to modify the script slightly to change the output to whatever you want.

Value / pair update in awk

I have a basic CSV that contains key/value. The first two columns being the key and the third column being the value.
Example file1:
12389472,1,136-7402
23247984,1,136-7402
23247984,2,136-7402
34578897,1,136-7402
And in another file I have a list of keys that need their value changed in the first file. I'm trying to change the value to 136-7425
Example file2:
23247984,1
23247984,2
Here's what I'm currently doing:
/usr/xpg4/bin/awk '{FS=",";OFS=","}NR==FNR{a[$1,$2]="136-7425";next}{$3=a[$1,$2]}1' file2 file1 > output
Which is working but it's leaving the value blank for keys not found in file2. I'd like to only change the value for keys present in file2, and leave the current value for keys not found.
Can anyone point out what I'm doing wrong? Or perhaps there's an easier way to accomplish this.
Thanks!
Looks like you're just zapping the third field for keys that don't exist in the first file. Try this:
awk '{FS=OFS=","}NR==FNR{a[$1,$2]="136-7425";next} ($1,$2) in a{$3=a[$1,$2]} 1' file2 file1 > output
or (see comments below):
awk '{FS=OFS=","}NR==FNR{seen[$1,$2]++;next} seen[$1,$2]{$3="136-7425"} 1' file2 file1 > output
FYI an array named seen[] is also similarly and commonly used to remove duplicates from input, e.g.:
awk '!seen[$0]++' file
this line should work for you:
awk -F, -v OFS="," 'NR==FNR{a[$1,$2]=1;next}a[$1,$2]{$3="136-7425"}7' file2 file1

grep: how to show the next lines after the matched one until a blank line [not possible!]

I have a dictionary (not python dict) consisting of many text files like this:
##Berlin
-capital of Germany
-3.5 million inhabitants
##Earth
-planet
How can I show one entry of the dictionary with the facts?
Thank you!
You can't. grep doesn't have a way of showing a variable amount of context. You can use -A to show a set number of lines after the match, such as -A3 to show three lines after a match, but it can't be a variable number of lines.
You could write a quick Perl program to read from the file in "paragraph mode" and then print blocks that match a regular expression.
as andy lester pointed out, you can't have grep show a variable amount of context in grep, but a short awk statement might do what you're hoping for.
if your example file were named file.dict:
awk -v term="earth" 'BEGIN{IGNORECASE=1}{if($0 ~ "##"term){loop=1} if($0 ~ /^$/){loop=0} if(loop == 1){print $0}}' *.dict
returns:
##Earth
-planet
just change the variable term to the entry you're looking for.
assuming two things:
dictionary files have same extension (.dict for example purposes)
dictionary files are all in same directory (where command is called)
If your grep supports perl regular expressions, you can do it like this:
grep -iPzo '(?s)##Berlin.*?\n(\n|$)'
See this answer for more on this pattern.
You could also do it with GNU sed like this:
query=berlin
sed -n "/$query/I"'{ :a; $p; N; /\n$/!ba; p; }'
That is, when case-insensitive $query is found, print until an empty line is found (/\n$/) or the end of file ($p).
Output in both cases (minor difference in whitespace):
##Berlin
-capital of Germany
-3.5 million inhabitants

Resources