unix search a pattern on one column and remove those lines - unix

I am trying to search one column with a particular pattern, and eliminate those rows and create a new file without that pattern.
Sample Data:
col1|col2|col3|col4
abc|test123|demo|test
def|test345|exam|write
ghf|456|test|account
ijk|789|travel|destination
Expected Output:
col1|col2|col3|col4
ghf|456|test|account
ijk|789|travel|destination
I want to search for the pattern "test" in 2nd column, and remove those rows from the source file, and create a new file as shown in the expected output.
File is a delimited file "|".

awk -F"|" '{if(index($2,"test")==0) printf "%s\n", $0}' test > test_out
test is original file.
test_out is final expected file.

Related

Calling a function from awk with variable input location

I have a bunch of different files.We have used "|" as delimeter All files contain a column titled CARDNO, but not necessarily in the same location in all of the files. I have a function called data_mask. I want to apply to CARDNO in all of the files to change them into NEWCARDNO.
I know that if I pass in the column number of CARDNO I can do this pretty simply, say it's the 3rd column in a 5 column file with something like:
awk -v column=$COLNUMBER '{print $1, $2, FUNCTION($column), $4, $5}' FILE
However, if all of my files have hundreds of columns and it's somewhere arbitrary in each file, this is incredibly tedious. I am looking for a way to do something along the lines of this:
awk -v column=$COLNUMBER '{print #All columns before $column, FUNCTION($column), #All columns after $column}' FILE
My function takes a string as an input and changes it into a new one. It takes the value of the column as an input, not the column number. Please suggest me Unix command which can pass the column value to the function and give the desired output.
Thanks in advance
If I understand your problem correctly, the first row of the file is the header and one of those columns is named CARDNO. If this is the case then you just search for the header in that file and process accordingly.
awk 'BEGIN{FS=OFS="|";c=1}
(NR==1){while($c != "CARDNO" && c<=NF) c++
if(c>NF) exit
$c="NEWCARDNO" }
(NR!=1){$c=FUNCTION($c)}
{print}' <file>
As per comment, if there is no header in the file, but you know per file, which column number it is, then you can simply do:
awk -v c=$column 'BEGIN{FS=OFS="|"}{$c=FUNCTION($c)}1' <file>

How to accept dynamic attributes (no of columns present in csv file)

How to accept dynamic attributes (no of columns present in csv file)? But I want to do aggregation on particular column for ex cust_id, notification_type,count . And output of these should be stored in another csv file.
I have tried this
awk 'BEGIN{FS=OFS=","}{a[$2 OFS $3]+=$4}END{for(i in a)print i,a[i]}' file_name
It is single line command. I want proper script.
It should be wrapped in script
like script_name.sh input_file output Folder
Sample file (actual file may in GB's)
1,A,OTC,1
2,B,RC,1
3,C,PB,1
4,A,OTC,1
5,A,RC,1
6,B,RC,1
Output Should be this:-
1,A,OTC,2
,RC,1
2,B,RC,1
3,C,PB,1

Search for lines matching pattern in specific column in UNIX

I am using HDFS to get data that meets a pattern in a specific column and want it to output the entire line. (Expecting ~2 million of 7 million lines to output)
Here is my exact situation:
I would like the entire line in a file where the data in the 4th column starts with a "5"
For example my data set:
HK|20151010|65|5005
KR|20151009|38|5092
MD|20150925|98|1943
BG|20150826|82|4892
HK|20151017|14|5002
I want the command to yield the following results:
HK|20151010|65|5005
KR|20151009|38|5092
HK|20151017|14|5002
Thank you so much! (Note: I cannot search the entire line because there are matches in other columns where the column data will begin with a 5)
how about:
awk -F'|' '$4~/^5/' file
if the 4th column is always the last col, this line should work too:
grep '|5[^|]*$' file
grep can be used to do this with some [^x]+x magic. Here is the regular expression both in basic and extended forms:
grep '^\([^|]\+|\)\{3\}5'
egrep '^([^|]+\|){3}5'

Search list of ids in a log file in unix

Sorry to ask this, might be a trivial question, tried awk script as well. But I think I am new to that.
I have a list of Ids in a file i.e. ids.txt
1xre23
223dsf
234ewe
and a log file with FIX messages which might contain those ids.
sample: log file abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=abcd23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=xyzw23^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
I want to check how many ids matched in that log file.
Ids are large almost 10K, and log file size is around 300MB.
sample output I am looking for is.
output:
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Try something like with grep command:
grep -w -f ids.txt abc.log
Output:
35=D^A54=1xre23^A22=s^A120=GBP^A<br>
35=D^A54=234ewe^A22=s^A120=GBP^A<br>
35=D^A54=223dsf^A22=s^A120=GBP^A<br>
If you like to use awk this should do:
awk -F"[=^]" 'FNR==NR {a[$0];next} $4 in a' ids.txt abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
This store the ids.txt in array a
If fourth field (separated by = and ^) contains the ID, print it.
You can also do it the other way around:
awk 'FNR==NR {a[$0];next} {for (i in a) if ($0~i) print}' abc.log ids.txt
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Store all data from abc.log in array a
Then test if line contains data for id.txt
If yes, print the line.

Value / pair update in awk

I have a basic CSV that contains key/value. The first two columns being the key and the third column being the value.
Example file1:
12389472,1,136-7402
23247984,1,136-7402
23247984,2,136-7402
34578897,1,136-7402
And in another file I have a list of keys that need their value changed in the first file. I'm trying to change the value to 136-7425
Example file2:
23247984,1
23247984,2
Here's what I'm currently doing:
/usr/xpg4/bin/awk '{FS=",";OFS=","}NR==FNR{a[$1,$2]="136-7425";next}{$3=a[$1,$2]}1' file2 file1 > output
Which is working but it's leaving the value blank for keys not found in file2. I'd like to only change the value for keys present in file2, and leave the current value for keys not found.
Can anyone point out what I'm doing wrong? Or perhaps there's an easier way to accomplish this.
Thanks!
Looks like you're just zapping the third field for keys that don't exist in the first file. Try this:
awk '{FS=OFS=","}NR==FNR{a[$1,$2]="136-7425";next} ($1,$2) in a{$3=a[$1,$2]} 1' file2 file1 > output
or (see comments below):
awk '{FS=OFS=","}NR==FNR{seen[$1,$2]++;next} seen[$1,$2]{$3="136-7425"} 1' file2 file1 > output
FYI an array named seen[] is also similarly and commonly used to remove duplicates from input, e.g.:
awk '!seen[$0]++' file
this line should work for you:
awk -F, -v OFS="," 'NR==FNR{a[$1,$2]=1;next}a[$1,$2]{$3="136-7425"}7' file2 file1

Resources