I have a csv file. I would like to put the fields into different variables. Supposed there are three fields in each line of the csv file. I have this code:
csvfile=test.csv
while read inline; do
var1=`echo $inline | awk -F',' '{print $1}'`
var2=`echo $inline | awk -F',' '{print $2}'`
var3=`echo $inline | awk -F',' '{print $3}'`
.
.
.
done < $csvfile
This code is good. However, if a field is coded with an embedded comma, then, it would not work. Any suggestion? For example:
how,are,you
I,"am, very",good
this,is,"a, line"
This may not be the perfect solution but it will work in your case.
[cloudera#quickstart Documents]$ cat cd.csv
a,b,c
d,"e,f",g
File content
csvfile=cd.csv
while read inline; do
var1=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $1}' | sed 's/*/,/g'`
var2=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $2}' | sed 's/*/,/g'`
var3=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $3}' | sed 's/*/,/g'`
echo $var1 " " $var2 " " $var3
done< $csvfile
Output :
[cloudera#quickstart Documents]$ sh a.sh
a b c
d e,f g
So basically first we are trying to handle "," in data and then replacing the "," with "*" to get desired column using awk and then reverting * to "," again to get actual field value
I need to get this result having this format :
"hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC | grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "
So I tried to use this instruction :
paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_","1ELPC",cat(" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "),sep = "")
But, this return
grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 [1] "hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1EPSE"
So, the problem is about using the cat function, In fact I need that its result will be in quoted format. In other way, I can't understand why the result was inversed here ?
I'm assuming you split up the arguments to paste0 for a specific reason. As #RuiBarradas mentions - cat is for printing, but not returning an actual object (always returns NULL):
paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_",
"1ELPC",
" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 ",
sep = "")
returns:
[1] "hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 "
which looks to me like what you want.
Do note that, in the output \" is one character (a double quote). i.e.,
> nchar("\"")
[1] 1
To further illustrate the point:
temp <- paste0("hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_",
"1ELPC",
" grep \"^d\" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8 ",
sep = "")
> cat(temp)
hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
> print(temp, quote = FALSE)
[1] hadoop fs -ls -d -C -t /hdfs/data/t1/t11/34/1EX4/ | grep indicateurs-PUB_1ELPC grep "^d" | sort -k6,7 | tail -1 | tr -s ' ' | cut -d' ' -f8
Would like to sort Input.csv file based on fields $1 and $5 and generate country wise A-Z order.
While doing sort need to consider country name either from $1 or $5 if any of the fields are blank.
Input.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
,,,,mno,50,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
Desired Output.csv
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
def,20,02-Jul-13,Aug,,,,,
def,20,02-Aug-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
I have tried below command but not getting desired output. Please suggest..
head -1 Input.csv > Output.csv; sort -t, -k1,1 -k5,5 <(tail -n +2 Input.csv) >> Output.csv
awk to the rescue!
$ awk -F, '{print ($1==""?$5:$1) "\t" $0}' file | sort | cut -f2-
Country,Amt,Des,Details,Country,Amt,Des,Network,Details
abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep
abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep
abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep
def,20,02-Aug-13,Aug,,,,,
def,20,02-Jul-13,Aug,,,,,
,,,,ghi,30,AL,DEF~PQZ,Sep
jkl,40,11-Sep-13,Aug,,,,,
,,,,mno,50,DL,ABC~XYZ,Sep
here the header starting with uppercase and data is lowercase. If this is not a valid assumption special handling of header required as you did above or better with awk
$ awk -F, 'NR==1{print; next} {print ($1==""?$5:$1) "\t" $0 | "sort | cut -f2-"}' file
Is this what you want? (Omitted first line)
cat file_containing_your_lines | awk 'NR != 1' | sed "s/,/\t/g" | sort -k 1 -k 5 | sed "s/\t/,/g"
I need to compare field1, field5 in fileA to field5, field6 in fileB
and print out when there are no matches:
file A
ZEROC_ZAR,MKT,M,ZAR,3YEAR,7.59
ZEROC_AED,MKT,M,ZAR,4YEAR,7.84
ZEROC_ZAR,MKT,M,ZAR,5YEAR,8.03
ZEROC_AED,MKT,M,ZAR,7YEAR,8.33
file B
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR
This oneliner prints the non-matching lines of fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort | uniq -u
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR
Explanation:
First combine fields 1 and 5 of fileA:
$ cut -d, -f1,5 fileA
ZEROC_ZAR,3YEAR
ZEROC_AED,4YEAR
ZEROC_ZAR,5YEAR
ZEROC_AED,7YEAR
Use these strings to grep for matching lines in fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
Then use cat - fileB | sort to combine these two lines with the content of fileB:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690231,02977,AED,ZEROC_AED,4YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690235,02977,AED,ZEROC_AED,7YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR
Finally, use uniq -u to remove duplicate lines:
$ cut -d, -f1,5 fileA | xargs -n1 -I{} grep {} fileB | cat - fileB | sort | uniq -u
TKS,010690226,02977,AED,ZEROC_AED,3YEAR
TKS,010690233,02977,AED,ZEROC_AED,5YEAR
TKS,010690236,02977,AED,ZEROC_AED,10YEAR