Would like to sort Input.csv file based on fields $1 and $5 and generate country wise A-Z order.
While doing sort need to consider country name either from $1 or $5 if any of the fields are blank.
Input.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
,,,,mno,50,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
Desired Output.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
I have tried below command but not getting desired output. Please suggest..
head -1 Input.csv > Output.csv; sort -t, -k1,1 -k5,5 <(tail -n +2 Input.csv) >> Output.csv
awk to the rescue!
$ awk -F, '{print ($1==""?$5:$1) "\t" $0}' file | sort | cut -f2-
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
def,20,02-Aug-13,Aug,,,,,
def,20,02-Jul-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
here the header starting with uppercase and data is lowercase. If this is not a valid assumption special handling of header required as you did above or better with awk
$ awk -F, 'NR==1{print; next} {print ($1==""?$5:$1) "\t" $0 | "sort | cut -f2-"}' file
Is this what you want? (Omitted first line)
cat file_containing_your_lines | awk 'NR != 1' | sed "s/,/\t/g" | sort -k 1 -k 5 | sed "s/\t/,/g"
Related
I am having rows like this in my source file:
"Sumit|My Application|PROJECT|1|6|Y|20161103084527"
I want to make a precise match on Column 3 i.e. I do not want to use '~' operator while writing my awk command. However the command:
awk -F '|' '($3 ~ /'"$Var_ApPJ"'/) {print $3}' ${Var_RDR}/${Var_RFL};
is fetching me correct result but the command:
awk -F '|' '($3 == "${Var_ApPJ}") {print $3}' ${Var_RDR}/${Var_RFL};
fails to do so. Can anyone help in explaining why it happens and I am willing to use '==' because I do not want to match if the value is "PROJECT1" in source file.
Parameter Var_ApPJ="PROJECT"
${Var_RDR}/${Var_RFL} -> Refers to source file.
Refer to this part of documentation to know how to pass variable to awk.
I found an alternative way of '==' with '~':
awk -F '|' '($3 ~ "^${Var_ApPJ}"$) {print $3}' ${Var_RDR}/${Var_RFL};
here is the problem -
try below command -
awk -F '|' '$3 == Var_ApPJ {print $3}' ${Var_RDR}/${Var_RFL};
Remove curly braces and bracket.
vipin#kali:~$ cat kk.txt
a 5 b cd ef gh
vipin#kali:~$ awk -v var1="5" '$2 == var1 {print $3}' kk.txt
b
vipin#kali:~$
OR
#cat kk.txt
a 5 b cd ef gh
#var1="5"
#echo $var1
5
#awk '$2 == "'"$var1"'" {print $3}' kk.txt ### With "{}"
b
#
#awk '$2 == "'"${var1}"'" {print $3}' kk.txt ### without "{}"
b
#
Is it possible to count the occurrence of each word like using uniq -c but with the count after the word rather than before?
Example scenario
Input file named as text1.txt which contain the following data
Renault:cilo:84563
Renault:cilo:84565
M&M:Thar:84566
Tata:nano:84567
M&M:quanto:84568
M&M:quanto:84569
The fields used in the above data are car_company:car_model:customerID
Desired result
cilo 2
Thar 1
nano 1
quanto 2
(car_model and number of cars sold grouped by car_model)
My code
cat test1.txt | cut -d: -f2 | uniq -c
Actual Result
2 cilo
1 Thar
1 nano
2 quanto
Is it possible to do the above process without using uniq -c ,so that I can swap the order of the fields (columns)?
You can use uniq, and simply post-process its output to swap the columns:
cut -d: -f2 test1.txt | uniq -c | awk '{print $2 "\t" $1 "\n" }'
EDIT: Added \n, as noted in a comment.
Save your commands output into a file "badresult";
cat test1.txt | cut -d: -f2 | uniq -c > badresult
Then cut the seventh field and save it into a file named "counts"(you should use space(" ") as a seperator);
cut -d" " -f7 badresult > counts
Then cut the eighth field and save it into a file named "models"(you should use space(" ") as a seperator);
cut -d" " -f8 badresult > models
Now you have your counts and models in seperate files. All you have to do is to show these two files seperately with "pr" command(-m: one file per column, -T:no pre-information)
pr -m -T models counts
Using awk:
cat test1.txt | cut -d: -f2 | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print }'
The little awk code exchanges fields 1 and 2 using a temporary.
You just need awk for this:
$ awk -F: '{a[$2]++} END {for (i in a) print i, a[i]}' file
cilo 2
quanto 2
nano 1
Thar 1
This goes through every line keeping track of how many times the second field has appeared. Since everything is stored in the array a, then it is just a matter of looping through it and printing its content.
I have a requirement to find duplicates based on three columns in a .txt file in unix which is delimited by ,.
Input:
a,b,c,d,e,f,gf,h
a,bd,cg,dd,ey,f,g,h
a,b,df,d,e,fd,g,h
a,b,ck,d,eg,f,g,h
Let's take we are finding dupliactes based on 1,2,5 fields.
Expected output:
a,b,c,d,e,f,gf,h
a,b,df,d,e,fd,g,h
Can anyone help to write a script for this or is there a command already available?
I tried like this:
awk -F, '!x[$1,$2,$3]++' file.txt but did not work
One way using awk:
awk -F, 'FNR==NR { x[$1,$2,$5]++; next } x[$1,$2,$5] > 1' a.txt a.txt
This is simple, but reads the file two times. On the first pass (FNR==NR), it maintains counts based on key fields. During the second pass, if prints the line if its key was found more than once.
Another way using awk:
awk -F, '{if (x[$1$2$5]) { y[$1$2$5]++; print $0; if (y[$1$2$5] == 1) { print x[$1$2$5] } } x[$1$2$5] = $0}' a.txt
Explanation:
1 awk -F,
2 '{if (x[$1$2$5])
3 { y[$1$2$5]++; print $0;
4 if (y[$1$2$5] == 1)
5 { print x[$1$2$5] }
6 } x[$1$2$5] = $0
7 }'
Line 2: If x has $1$2$5, this key was seen before, do steps 3-5
Line 3: Increment the count and print the line because it is a dup
Line 4: This means, We are seeing this key for the 2nd time, so we need to print the first line with this key. Last time we saw this key we did not know whether it was a dup or not. So we print the first line in step 5.
Line 6: Store the current line against the key so we can use it in step 2
Another way using sort, uniq and awk
Note: uniq command has an option '-f' to skip the specified number of fields before it starts comparison.
sort -t, -k1,1 -k2,2 -k5,5 a.txt | awk -F, 'BEGIN { OFS = " "} {print $0, $1, $2, $5}' | sed 's/,/ /g' | uniq -f7 -D | sed 's/ /,/g' | cut -d',' -f 1-7
This sorts based on fields 1,2,5. awk prints the original line and appends fields 1,2,5 . sed changes the delimiter because uniq does not have an option to specify delimiter. uniq skips first 7 fields and works on rest of the line and prints duplicate lines.
I had a similar issue
I needed to eliminate duplicate detail records while preserving flat file record formatting and seqence of the records.
The duplication caused by a time expansion of the date field in column 2 of the detail only.
Receiving system was reporting duplication on columns 4 and 5.
I cobbled together this quick hack to resolve it.
First read the file data into an array
Then we can read and manipulate the individual records (crudely with a counter) as demonstrated in this snippet integrating a case statement to logically treat the various record types.
Cheers!
readarray inrecs < [input file name]
filebase=echo "[input file name] | cut -d '.' -f1
i=1
for inrec in "${inrecs[#]}";do
field1=echo ${inrecs[$i-1]} | cut -d',' -f1
field2=echo ${inrecs[$i-1]} | cut -d',' -f2
field3=echo ${inrecs[$i-1]} | cut -d',' -f3
field4=echo ${inrecs[$i-1]} | cut -d',' -f4
field5=echo ${inrecs[$i-1]} | cut -d',' -f5
field6=echo ${inrecs[$i-1]} | cut -d',' -f6
field7=echo ${inrecs[$i-1]} | cut -d',' -f7
field8=echo ${inrecs[$i-1]} | cut -d',' -f8
case $field1 in
'H')
echo "$field1,$field2,$field3">${filebase}.new
;;
'D')
dupecount=0
dupecount=`zegrep -c -e "${field4},${field5}" ${infile}`
if [[ "$dupecount" -gt 1 ]];then
writtencount=0
writtencount=`zegrep -c -e "${field4},${field5}" ${filebase}.new`
if [[ "${writtencount}" -eq 0 ]];then
echo "$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8,">>${filebase}.new
fi
else
echo "$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8,">>${filebase}.new
fi
;;
'T')
dcount=`zegrep -c '^D' ${filebase}.new`
echo "$field1,$field2,$dcount,$field4">>${filebase}.new
;;
esac
((i++))
done
Please help to identify , count of line items where Third field value is equal to Zero and count of line items where Third field value not equal to Zero , Group by Second field :
Need to populate "Count!=0 & Count=0" as "0" if there is no count.
Input file: f1.txt
ppp1,abc,10,qqq
ppp2,abc,5,qqq
ppp3,abc,0,qqq
ppp4,abc,18,qqq
ppp5,abc,0,qqq
mmm1,xyz,0,rrr
mmm2,xyz,55,rrr
nnn1,ijk,12,sss
nnn2,ijk,89,sss
nnn3,ijk,62,sss
bbb1,lmn,0,ttt
bbb2,lmn,0,ttt
Output.txt
abc,count!=0,3
abc,count=0,2
xyz,count!=0,1
xyz,count=0,1
ijk,count!=0,3
ijk,count=0,0
lmn,count!=0,0
lmn,count=0,2
I break the cmd in lines, to make it easier to read:
awk -F, -v OFS="," '{c[$2];
if($3!=0)a[$2]++;else b[$2]++}
END{for(x in c){
print x,"count!=0",a[x]*1;
print x,"count=0",b[x]*1}}' input
cat f1.txt | awk -F "," '{ if($3==0) {print $2 count=0} if ($3!=0)
{print $2 count!=0}}' | uniq -c | awk {print $2","$3","$1} >> Output.txt
I have fileA, fileB data as shown below
fileA
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
fileB
,,user2,location
,,user4,location
,,user1,location
,,user3,location
I want to search fileA user on fileB and get only location and add that one to fileA/or other file
Output expecting like
,,"user1","email",location
,,"user2","email",location
,,"user3","email",location
,,"user4","email",location
I'm trying the logic, using while get the fileA username and search that on fileB to get the location. but getting failed to add that with fileA back
Your help much appreciated
This should work:
for user in `awk -F\" '{print $2}' fileA`
do
loc=`grep ${user} fileB | awk -F',' '{print $4}'`
sed -i "/${user}/ s/$/,${loc}/" fileA
done
Adding the example:
$ cat fileA
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
$ cat fileB
,,user2,location2
,,user4,location4
,,user1,location1
,,user3,location3
$ for user in `awk -F\" '{print $2}' fileA`; do echo ${user}; loc=`grep ${user} fileB | awk -F',' '{print $4}'`; echo ${loc}; sed -i "/${user}/ s/$/,${loc}/" fileA; done
$ cat fileA
,,"user1","email",location1
,,"user2","email",location2
,,"user3","email",location3
,,"user4","email",location4
The description is not clear but based on the question you can use the following command to append a value/data to end of each row in Unix
sed -i '/search_pattern/ s/$/string_to_be_appended/' filename
You can do this entirely in awk
awk -F, '
NR==FNR{a[$3]=$4;next}
{for(x in a) if(index($3,x)>0) print $0","a[x]}' file2 file1
Test:
$ cat file1
,,"user1","email"
,,"user2","email"
,,"user3","email"
,,"user4","email"
$ cat file2
,,user2,location2
,,user4,location4
,,user1,location1
,,user3,location3
$ awk -F, 'NR==FNR{a[$3]=$4;next}{for(x in a) if(index($3,x)>0) print $0","a[x]}' file2 file1
,,"user1","email",location1
,,"user2","email",location2
,,"user3","email",location3
,,"user4","email",location4