Multiple field seperators each of which appears multiple times using awk - unix

I am trying to separate fields using awk but have met some problems when I have multiple separators each of which appears multiple times.
For example, if I type
echo "aa#######=#3413.5" | awk -F "#+|#+|=" '{print $1","$2","$3","$4","$5}'
then the results are:
aa,,,,3413.5
but what I want is
aa,3413.5
I have searched online for a long time, but other questions are related to either multiple separators appearing one time for each, i.e. "#|#", or a single separator appearing multiple times, i.e. "#+".
Anyone has ideas about how to separate fields in my case?
Thanks a lot!

awk -F '[##=]+'
seems to work.
awk -F "#+|#+|="
this one matches only for string like #####, ####, or =.
see following URL for detail:
http://www.math.utah.edu/docs/info/gawk_5.html#SEC28

Related

Bash/R searching columns in a huge table

I have a huge table I want to extract information from. Firstly, I want to extract a certain line based on a pattern -> I've done that successfully with grep. However this line has loads of columns and I'm interested only in a couple of them that have a certain pattern in them (partial match - beginning of the string). Is it possible to extract only the columns and the number of the column (the nth column) for some partial matches? Hope I was clear enough.
Languages: Preferably in bash but I can also work in R, alternatively I'm open to suggestions if you think another language can be more helpful.
Thanks!
Awk is perfect for stuff like this. To help you write a script I think we need more details. But I'm guessing you'll want to use the print feature of awk. To print out the nth column of a file "your_file" do:
awk '{print $n}' your_file
In solving your problem you may also want to loop over all N columns which you can do via:
for i in {1..N} ;
do
awk -v col=${i} '{print $col}' your_file ;
done

AWK Command to Add values

I have a scenario where I am trying to add three fields in a line, that I receive it in my input file. I have written something which I felt doesn't follow the best unix practice and need some suggestions how to write the best possible. Attached my sample line.
My questions are:
Is it possible to add all three fields using one awk command?
The input file may not contain some of these fields ( basaed on scenarios), is awk able handle this? Should I do any check ?
Some time the values may contain "~" at the end , how to consider only numeric?
Input File1 Sample Line
CLP*4304096*20181231*BH*0*AH>444*158330.97~*FB*0*SS>02*0*SS>03*0*J1*0~
Input File2 sample line
CLP*4304096*20181231*BH*0*AH>444*158330.97*FB*0
Script I have written
clp=$(awk -F'[*]' '/CLP/{print $7}' $file)
ss02=$(awk -F'[*]' '/CLP/{print $11}' $file)
ss03=$(awk -F'[*]' '/CLP/{print $13}' $file)
clpsum=clp+ss02+ss03
I know it's not the best way, pls let me know how can I handle both the input file 1 ( it has 158330.97~) scenario and file 2 scenario.
Thanks!
in 1 awk command:
awk 'BEGIN{FS="*"}{var1=$7;var2=$11;var3=$13; var4=var1+var2+var3; printf("var4 = %.2f\n",var4)}' file.txt
works as long as fields are the same ~ you might want someone with a more robust answer if you want to handle whenever a file comes in with numbers in different fields ~etc. Hope this helps in anyway.

Unix redact data

I want to mask only the 2nd column of the data.
Input:
First_name,second_name,phone_number
ram,prakash,96174535
hari,pallavi,98888234
anurag,aakash,82783784
Output Expected:
First_name,second_name,phone_number
ram,*******,96174535
hari,*******,98888234
anurag,******,82783784
The sed program will do this just fine:
sed '2,$s/,[^,]*,/,*****,/'
The 2,$ only operates on lines 2 through to the end of the file (to leave the header line alone) and the substitute command s/,[^,]*,/,*****,/ will replace anything between the first and second comma with the mask *****.
Note that I've specifically used a fixed number of asterisks in the replacement string. Whether you're hiding passwords or anonymising data (as seems to be the case here), you don't want to leak any information, including the size of the names being replaced.
If you really want to use the same number of characters as in the original data, and you also want to cater for the possibility of replacing multiple fields, you can use something like:
awk -F, 'BEGIN{OFS=FS}NR==1{print;next}{gsub(/./,"*",$2);gsub(/./,"*",$4);print}'
This will also leave the first line untouched but will anonymise columns two and four (albeit with the information leakage previously mentioned):
echo 'First_name,second_name,phone_number,other
ram,prakash,96174535,abc
hari,pallavi,98888234,def
anurag,aakash,82783784,g
bob,santamaria,124,xyzzy' | awk -F, 'BEGIN{OFS=FS}NR==1{print;next}{gsub(/./,"*",$2);gsub(/./,"*",$4);print}'
First_name,second_name,phone_number,other
ram,*******,96174535,***
hari,*******,98888234,***
anurag,******,82783784,*
bob,**********,124,*****
Doing multiple columns with full anonymising would entail using $2="*****" rather than the gsub (for both columns of course).
Another in awk. Using gsub to replace every char in $2 with an *:
$ awk 'BEGIN{FS=OFS=","}NR>1{gsub(/./,"*",$2)}1' file
First_name,second_name,phone_number
ram,*******,96174535
hari,*******,98888234
anurag,******,82783784
try following too once and let me know if this helps you.
awk -F"," 'NR>1{$2="*******"} 1' OFS=, Input_file

Search for multiple patterns in a file not necessarily on the same line

I need a unix command that would search multiple patterns (basically an AND), however those patterns need not be on the same line (otherwise I could use grep AND command). For e.g. suppose I have a file like following:
This is first line.
This is second line.
This is last line.
If I search for words 'first' and 'last', above file should be included in the result.
Try this question, seems to be the same as yours with plenty of solutions: How to find patterns across multiple lines using grep?
I think instead of AND you actually mean OR:
grep 'first\|last' file.txt
Results:
This is first line.
This is last line.
If you have a large number of patterns, add them to a file; for example if patterns.txt contains:
first
last
Run:
grep -f patterns.txt file.txt

extracting first line from file using awk command

I've been going through an online UNIX course and have come across this question which I'm stuck on. Would appreciate any help!
You are provided with a set of files each one of which contains personal details about an individual. Each file is laid out in the following format, with one file per individual:
name:Niko Tanaka
age:41
occupation:Doctor
I know the answer has to be in the form:
n=$(awk -F: ' / /{print }' filename)
n=$(awk -F: '/name/{print $2}' infile)
Whatever is inside of / / are regular expressions. In this case you just want to match on the line that contains 'name'.

Resources