Ignore case when removing duplicates using awk - unix

I am using following command to remove duplicates from file.
awk -F"," '!x[$1]++' test.csv
How can I make it to ignore case of column 1?
I tried awk -F"," '{IGNORECASE = 1} !x[$1]++' test.csv but it does not seem to work.

Using toupper:
awk -F"," '!x[toupper($1)]++' test.csv

awk -F"," '!x[tolower($1)]++' test.csv

Related

awk write in file with column separator

I am reading a file and writing first 2 columns into an output file.
I want write with "," as a column separator
I tried with
awk -F"," -OFS"|" '{print $1 , $2}' filename
The output file doesn't have | separator
Thanks
Pratik
Yes it will not print since you didn't write it properly. Following are the 2 ways to mention OFS in any awk program.
1st way: By using -v OFS="|" mention it as a variable.
awk -F"," -v OFS="|" '{print $1,$2}' filename
2nd way: Use BEGIN section of awk for mentioning it(which is recommended too).
awk 'BEGIN{FS=",";OFS="|"}{print $1,$2}' filename
3rd way: As per ghoti's comment adding 1 more way of assigning value for OFS here. We could assign it before mentioning Input_file names too by doing this we could set different OFS values for different Input_file(s)(since awk could read multiple Input_files so it can help in those kind of situations). Eg-->
awk '{print $1,$2}' FS="," OFS="|" Input_file1 FS=":" OFS=";" Input_file2
In above command for Input_file1 FS is , and OFS is | and for Input_file2 FS is : and OFS is ;. Thanks to ghoti sir for mentioning this in comments :)

delete first and last hyphen character from each column

I am trying to remove the first and last characters from two separate columns prior to them being saved to a file. The characters I need to remove are the hyphens. Due to hyphens in the results, I am unable to just remove all of them. Is there a more effective way to use awk for this?
my current thoughts are something similar to this command.
cat file.txt | awk -F '|' '{print $2, $4}' | sed 's/.//;s/.$//' > newfile.txt
file example
1-|-40939-23-|-column-3-|-column-4-|
2-|-9832651-23-|-column-3-|-column-4-|
current output
40939-23- -column-4
9832651-23- -column-4
desired output
40939-23 column-4
9832651-23 column-4
$ awk -F'-[|](-|$)' '{print $2, $4}' file
40939-23 column-4
9832651-23 column-4
Could you please try following and let me know if this helps.
awk -F"|" '{gsub(/^-|-$/,"",$2);gsub(/^-|-$/,"",$(NF-1));print $2,$(NF-1)}' Input_file
Solution 2nd: Using field numbers considering that your Input_file will be always same.
awk 'BEGIN{FS="[-|]";OFS="-"}{print $4 OFS $5 " " $12 OFS $13}' Input_file

Combining two awk commands in single command

I want to combine these two command and want to invoke single command
In first command i am storing 4th column of x.csv(Separator ,) file in z.csv file.
awk -F, '{print $4}' x.CSV > z.csv
In second command, i want to find out unique first-column value of z.csv(Separator-space) file.
awk -F\ '{print $1}' z.csv|sort|uniq
I want to combine these two command in single command,How can i do that?
Pipe the output of the first awk to the second awk:
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' |sort|uniq
or, as Avinash Raj suggested,
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' | sort -u
Assuming that the content of z.csv is actually wanted, rather than just an artefact of the way you're currently implementing your program, then you can use:
awk -F, '{ print $4 > "z.csv"
split($4, f, " ")
f4[f[1]] = 1
}
END { for (i in f4) print i }' x.CSV
The split function breaks field 4 on spaces, and (associative) array f4 records the key value. The loop at the end prints out the distinct values, unsorted. If you need them sorted, you can either use GNU awk's built-in sort functions or (if you don't have an awk with built-in sort functions) write your own in awk, or pipe the output to sort.
With GNU awk, you can replace the END block with:
END { asorti(f4); for (i in f4) print f4[i] }
If you don't want the z.csv file, then (a) you could have used a pipe in the first place, and (b) you can simply remove the print $4 > "z.csv" line.
awk '{split($4,b," "); a[b[1]]=1} END { for( i in a) print i }' FS=, x.CSV
This does not sort the data, but it's not clear if you actually want it sorted or merely needed that to get unique entries. If you do want it sorted, pipe it to sort.

need some help on awk command

need a help with awk. reading a csv file and, doing some substitution on some of the columns. It's like 9th column(string type) should be replaced by value of (9th column itself + value of the 4th column(integer)), then 15th column by $15+$12, column 26th with $26+$23. same has to be done line by line for all the records. Suggestions please
Below is the sample I/O. and the first line which is Description must be left as is.
sample Input
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst|Del|20|SD|DA
101|ms|Del|21|XS|DA
Sample output
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA
it's like empname has been concatenated with empid & the role desc with roleID.Hope that's helpful :)
This will perform the needed transformation:
$ awk 'NR>1{$2=$2$1;$5=$5$4}1' FS='|' OFS='|' file
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA
If you have to do this for many columns you can use a for loop like so (provided a arithmetic or geometric stepsize):
$ awk 'NR>1{for(i=2;i<=5;i+=3)$i=$i$(i-1)}1' FS='|' OFS='|' file
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA
When you say +, I'm assuming you mean string concatentation. IN awk, there is no specific concatenation operator, you just put two strings side-by-side.
awk -F, -v OFS=, '{$9 = $9 $4; $15=$15$12; $26=$26$23; print}' file.csv
Also assuming that by "csv", you actually mean comma-separated.
If you want to edit the file in-place, you need to do this:
awk ... file.csv > newfile && mv file.csv file.csv.bak && mv newfile file.csv
Edit: to leave the first line untouched:
awk -F, -v OFS=, 'NR>1 {$9 = $9 $4; $15=$15$12; $26=$26$23} {print}' file.csv
Now the columns are modified for the 2nd and subsequent lines, but every line is printed.
You'll sometimes see that written this way:
awk -F, -v OFS=, 'NR>1 {$9 = $9 $4; $15=$15$12; $26=$26$23} 1' file.csv

How to append one file to the other, with first file being edited, hence can't use usual cat command

Suppose that I have two files, each of them have header in the first line and records in the remaining lines. And I want to concatenate two files into one, but don't include header twice.
I tried the following commands while googling for the answer, (hence I may not cope in an optimal way).
cat awk 'NR!=1 {printf "%s\n", $1}' file2.csv >| file.csv
However, I got the following error.
cat: awk: No such file or directory
cat: NR!=1 {printf "%s\n",$1}: No such file or directory
It looks like cat recognized awk as files, not commands. I want the result of awk to be the content of files, so I also tried to pipe it to the argument of cat.
awk 'NR!=1 {printf "%s\n", $1}' file2.csv > cat file.csv
However, in this way, I got file cat, in which I got the result of awk...
So how can I solve it?
Thanks.
You need some grouping:
{
cat file1
sed '1d' file2
} > file.csv
As one line
{ cat file1; sed '1d' file2; } > file.csv
The semicolon before the ending brace is required.
{cat file1; tail -n +2 file2} > out
Print first line from first file, then print line #2 to the end of any file
awk 'NR==1||FNR>1' file1 file2 (file3 file4 ..) > outfile

Resources