Could you please help in fetching duplicates from a specific column in a csv file in unix.
Tried with uniq utility, it works only with txt file.
Please suggest.
Try sorting the value before applying uniq.
awk -F ',' '{print $1}' <filenname> |sort | uniq
Related
Contents of sample input file(input.txt) - starting from following line,
Name|Class|School Name
Deepu|First|Meridian
Neethu|Second|Meridian
Sethu|First|DAV
Theekshana|Second|DAV
Teju|First|Sangamithra
I need to output the details of the student with the school name Sangamithra
in the below format. I am new to unix. So I need help.
Desired output:
Sangamithra|First|Teju
I think you are looking something like this one.
awk -F\| '{print $3"|"$2"|"$1}' filename
School Name|Class|Name
Meridian|First|Deepu
Meridian|Second|Neethu
DAV|First|Sethu
DAV|Second|Theekshana
Sangamithra|First|Teju
If you're just interested in the output, this can be achieved using grep:
grep "Sangamithra" input.txt
If you want the name to be first, you might need awk (tested):
grep "Sangamithra" input.txt | awk -F "|" '{print $3"|"$1"|"$2}'
I want to validate the file. As per validation, I need to check the length of each column, null or not null and primary constant of that file.
cat File_name| awk -F '|' '{print NF}' | sort | uniq
This command split lines of the file on tokens using pipe | as delimiter, print number of tokens on each row (NF variable), sort the output (sort command) and on the end get only uniq numbers (uniq command).
The script can be optimised getting rid of cat command and combine it in awk and use parameter of sort to get uniq records:
awk -F '|' '{print NF}' file_name | sort -u
I can delete duplicate lines in files using below commands:
1) sort -u and uniq commands. is that possible using sed or awk ?
There's a "famous" awk idiom:
awk '!seen[$0]++' file
It has to keep the unique lines in memory, but it preserves the file order.
sort and uniq these only need to remove duplicates
cat filename | sort | uniq >> filename2
if its file consist of number use sort -n
After sorting we can use this sed command
sed -E '$!N; /^(.*)\n\1$/!P; D' filename
If the file is unsorted then you can use with combination of the command.
sort filename | sed -E '$!N; /^\(.*\)\n\1$/!P; D'
Is there a oneliner for for sort and uniq given a filename in unix?
I googled and found the following but its not sorting,also not sure what is the below command doing..any better ways using awk or anyother unix tool?
cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file
On a side note,is there one that can be used in both windows and unix?this is not important but just checking..
C:\Users\Chola>sort -t "#" -k2,2 email-list.txt
Input text file:-
436485
422636
429228
427041
433414
425810
422636
431526
428808
If your file consists only of numbers, one per line:
sort -n FILENAME | uniq
or
sort -u -n FILENAME
(You can add -u to the sort command instead of piping through uniq in all of the following.).
If you want to extract just one column of a file, and then sort that column numerically removing duplicates, you could do this:
cut -f7 FILENAME | sort -n | uniq
Cut assumes that there is a single tab between columns. If your file is CSV, you might be able to do this:
cut -f7 -d, FILENAME | sort -n | uniq
but that won't work if there is a , in a text field in the file (where CSV will protect it with "'s).
If you want to sort by the column but remove only completely duplicate lines, then you can do this:
sort -k7,7n FILENAME | uniq
sort assumes that columns are separated by whitespace. Again, if you want to separate with ,, you can use:
sort -k7,7n -t, FILENAME | uniq
I am using following command to remove duplicates from file.
awk -F"," '!x[$1]++' test.csv
How can I make it to ignore case of column 1?
I tried awk -F"," '{IGNORECASE = 1} !x[$1]++' test.csv but it does not seem to work.
Using toupper:
awk -F"," '!x[toupper($1)]++' test.csv
awk -F"," '!x[tolower($1)]++' test.csv