How to use awk script to run through a textfile - unix

I need to run through a textfile, doctors.txt, which is written in the format:
Sarah,Jenny,Charles;Dr. Hampton
Jenny,Lucy,Harry;Dr. Fritz
Ben,Kaitlyn,Connor,Charles;Dr. Hampton
and have it output:
Dr. Hampton: Sarah Jenny Charles Ben Kaitlyn Connor
Dr. Fritz: Jenny Lucy Harry
(if someone is mentioned more than once I can't have them repeat)
I need to do this using awk, I currently am having issues even trying to make it print anything:
My code is:
#!/user/bin/awk -f
awk 'BEGIN {for i in $(doctors.txt) {
split(i,doctors,";");}
END{print doctors[1]}'
When I run it, I get
awk: 3: unexpected character '''
awk: 5: unexpected character '''
Could someone help me with this please?

Try this awk
awk -F\; '{gsub(/,/," ");a[$2]=a[$2]?a[$2]" "$1:$1} END {for (i in a) print i": "a[i]}' doctors.txt
Dr. Fritz: Jenny Lucy Harry
Dr. Hampton: Sarah Jenny Charles Ben Kaitlyn Connor Charles
To use it in a script:
#!/bin/bash
awk -F\; '{gsub(/,/," ");a[$2]=a[$2]?a[$2]" "$1:$1} END {for (i in a) print i": "a[i]}' doctors.txt > doctors2.txt
How does it work:
a[$2]= # give array a[$2] the following value
a[$2] # test if array a[$2] have data already
? # If yes then
a[$2]" "$1 # add $1 to the variable already stored there
: # If no the
$1 " just sett array a[$2] to value in $1
This part a[$2]=a[$2]?a[$2]" "$1:$1 can be replaced by
if (a[$2]) a[$2]=a[$2]" "$1; else a[$2]=$1
Can be shorten some: (do not need the test, since the extra space is ok)
awk -F\; '{gsub(/,/," ");a[$2]=a[$2]" "$1} END {for (i in a) print i":"a[i]}' doctors.txt

May be you can use perl for this:
perl -F";" -lane '#a=split /,/,$F[0];
$x{$F[1]}.="#a";
END{print "$_:$x{$_}" for(keys %x)}' your_file
Tested here
If you insist on awk:
awk -F';' '{
gsub(/,/," ",$1);
a[$2]=a[$2]""$1}
END{for(i in a)print i":"a[i]
}' yourfile
Tested the awk version here

awk -F ";" '{print $1}' doctors.txt

Related

Compare 2 nd columns from 2 files unix

Compare 2 nd columns from 2 files, unmatch match first file record s write into output file
Example:
# delimiter
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
Excepted output:
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
I used awk command not working can you help me
awk -F '#' 'NR==FNR{a[$2]; next} FNR==1 || !($1 in a)' Client_id.txt Filename_clientid.txt
alternative
$ join -t# -j2 <(sort -t# -k2 file1) <(sort -t# -k2 file2)
RIA000026#RIA000026_MA_plan_BTR_09282022_6.xml#ramesh
The number of zeroes is not the same in both files. If they are the same, you can check that the field 2 value of Filename_clientid.txt does not occur in a
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA00025
suresh#RIA00024
vamshi#RIA00027
Example
awk -F'#' 'NR==FNR{a[$2]; next} !($2 in a)' Client_id.txt Filename_clientid.txt
Output
RIA00026_MA_plan_BTR_09282022_6.xml#RIA000026
With corrected inputs (was wrong with number of zeroes):
file1
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA000026_MA_plan_BTR_09282022_6.xml#RIA000026
file2
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
ramesh#RIA000026
code
awk -F'#' 'NR==FNR{a[$1]=$0;next} $2 in a{print a[$2]}' file1 file2
Output
RIA000026_MA_plan_BTR_09282022_6.xml

awk $4 column if column = value with characters thereafter

I have a file with the following data within for example:
20 V 70000003d120f88 1 2
20 V 70000003d120f88 2 2
20x00 V 70000003d120f88 2 2
10020 V 70000003d120f88 1 5
I want to get the sum of the 4th column data.
Using the the below command, I can acheive this, however the row 20x00 is excluded. I want to everything to start with 20 must be sumed and nothing before that, so 20* for example:
cat testdata.out | awk '{if ($1 == '20') print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The output value must be:
5
How can I achieve this using awk. The below I attempted also does not work:
cat testdata.out | awk '$1 ~ /'20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
There is no need to use 3 processes, anything can be done by one AWK process. Check it out:
awk '$1 ~ /^20/ { a+=$4 } END { print a }' testdata.out
explanation:
$1 ~ /^20/ checks to see if $1 starts with 20
if yes, we add $4 in the variable a
finally, we print the variable a
result 5
EDIT:
Ed Morton rightly points out that the result should always be of the same type, which can be solved by adding 0 to the result.
You can set the exit status if it is necessary to distinguish whether the result 0 is due to no matches
(output status 0) or matching only zero values ​​(output status 1).
The exit code for different input data can be checked e.g. echo $?
The code would look like this:
awk '$1 ~ /^20/ { a+=$4 } END { print a+0; exit(a!="") }' testdata.out
Figured it out:
cat testdata.out | awk '$1 ~ /'^20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The above might not work for all cases, but below will suffice:
i=20
cat testdata.out | awk '{if ($1 == "'"$i"'" || $1 == ""'"${i}"'"x00") print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'

No results while making precise matching using awk

I am having rows like this in my source file:
"Sumit|My Application|PROJECT|1|6|Y|20161103084527"
I want to make a precise match on Column 3 i.e. I do not want to use '~' operator while writing my awk command. However the command:
awk -F '|' '($3 ~ /'"$Var_ApPJ"'/) {print $3}' ${Var_RDR}/${Var_RFL};
is fetching me correct result but the command:
awk -F '|' '($3 == "${Var_ApPJ}") {print $3}' ${Var_RDR}/${Var_RFL};
fails to do so. Can anyone help in explaining why it happens and I am willing to use '==' because I do not want to match if the value is "PROJECT1" in source file.
Parameter Var_ApPJ="PROJECT"
${Var_RDR}/${Var_RFL} -> Refers to source file.
Refer to this part of documentation to know how to pass variable to awk.
I found an alternative way of '==' with '~':
awk -F '|' '($3 ~ "^${Var_ApPJ}"$) {print $3}' ${Var_RDR}/${Var_RFL};
here is the problem -
try below command -
awk -F '|' '$3 == Var_ApPJ {print $3}' ${Var_RDR}/${Var_RFL};
Remove curly braces and bracket.
vipin#kali:~$ cat kk.txt
a 5 b cd ef gh
vipin#kali:~$ awk -v var1="5" '$2 == var1 {print $3}' kk.txt
b
vipin#kali:~$
OR
#cat kk.txt
a 5 b cd ef gh
#var1="5"
#echo $var1
5
#awk '$2 == "'"$var1"'" {print $3}' kk.txt ### With "{}"
b
#
#awk '$2 == "'"${var1}"'" {print $3}' kk.txt ### without "{}"
b
#

Unix utilities, sum the data under the save entries

I have this little problem that I want to ask:
So I have a file named "quest", which has:
Tom 100 John 10 Tom 100
How do I use awk to output something like:
Tom 200
I'd appreciate your help. I tried to look up online but I am not sure what I am look for. Thanks ahead!!
I do know how to use regular expression /Tom/ to grep the entry, but I am not sure how to proceed from there.
You can try something like:
$ awk '{
for(i=1; i<=NF; i+=2)
names[$i] = ((names[$i]) ? names[$i]+$(i+1) : $(i+1))
}
END{
for (name in names) print name, names[name]
}' quest
Tom 200
John 10
You basically iterate over the fields creating keys for all odd fields and assigning values of even fields to them. If the key already exists, you just add to the existing value.
This expects your file format to have Names on odd fields (for eg. 1, 3, 5 .. etc) and values on even fields (eg 2, 4, 6 .. etc).
In the END block, you just print entire array content.
I guess you need calculate all users' mark, not only Tom, here is the code:
xargs -n2 < file|awk '{a[$1]+=$2}END{for (i in a) print i,a[i]}'
Tom 200
John 10
and one-liner of awk
awk '{for (i=1;i<=NF;i+=2) a[$i]+=$(i+1)}END{for (i in a) print i,a[i]}' file
Tom 200
John 10
$ echo 'Tom 100 John 10 Tom 100' | grep -o '[0-9]*' | paste -sd+ | bc
210
grep -o '[0-9]*' produces
100
10
100
paste -sd+ produces
100+10+100
bc calculates the result.
However, this only works for small input since bc has limitation in input size.
In that case you can use awk '{s+=$0}END{print s}' instead of paste -sd+ | bc.
However note that GNU Awk treats all number as floting point, it produces inaccurate result when number is too large.
awk '/Tom/{
for(i=1;i<=NF;i++)
if($i=="Tom")s+=$(i+1);
print "Tom",s;s=0}' your_file
Test
Here is a way to do it in awk (no loop):
awk -v RS=" " '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Tom 200
John 10
If there are more than one line like this
cat quest
Tom 100 John 10 Tom 100
Paul 20 Tom 40 John 10
Then do this with gnu awk:
awk -v RS=" |\n" '{n=$1;getline;a[n]+=$1} END {for (i in a) print i,a[i]}' quest
Paul 20
Tom 240
John 20
And if you do not like getline
awk -v RS=" |\n" 'NR%2 {n=$1;next}{a[n]+=$1} END {for (i in a) print i,a[i]}' quest

Joining two consecutive lines using awk or sed

How would I join two lines using awk or sed?
I have data that looks like this:
abcd
joinabcd
efgh
joinefgh
ijkl
joinijkl
I need an output like the one below:
joinabcdabcd
joinefghefgh
joinijklijkl
awk '!(NR%2){print$0p}{p=$0}' infile
You can use printf with a ternary:
awk '{printf (NR%2==0) ? $0 "\n" : $0}'
awk 'BEGIN{i=1}{line[i++]=$0}END{j=1; while (j<i) {print line[j+1] line[j]; j+=2}}' yourfile
No need for sed.
Here it is in sed:
sed 'h;s/.*//;N;G;s/\n//g' < filename
They say imitation is the sincerest form of flattery.
Here's a Perl solution inspired by Dimitre's awk code:
perl -lne 'print "$_$p" if $. % 2 == 0; $p = $_' infile
$_ is the current line
$. is the line number
Some improvement to the "sed" script above that will take the following:
1008
-2734406.132904
2846
-2734414.838455
4636
-2734413.594009
6456
-2734417.316269
8276
-2734414.779617
and make it :
1008 -2734406.132904
2846 -2734414.838455
4636 -2734413.594009
6456 -2734417.316269
8276 -2734414.779617
the "sed" is : "sed 'h;s/.*//;G;N;s/\n/ /g'"
This is the answer to the question "How to make the count and the file appear on the same line" in the command:
find . -type f -exec fgrep -ci "MySQL" {} \; -print
Bryon Nicolson's answer produced the best result.

Resources