sort and uniq oneliner - unix

Is there a oneliner for for sort and uniq given a filename in unix?
I googled and found the following but its not sorting,also not sure what is the below command doing..any better ways using awk or anyother unix tool?
cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file
On a side note,is there one that can be used in both windows and unix?this is not important but just checking..
C:\Users\Chola>sort -t "#" -k2,2 email-list.txt
Input text file:-
436485
422636
429228
427041
433414
425810
422636
431526
428808

If your file consists only of numbers, one per line:
sort -n FILENAME | uniq
or
sort -u -n FILENAME
(You can add -u to the sort command instead of piping through uniq in all of the following.).
If you want to extract just one column of a file, and then sort that column numerically removing duplicates, you could do this:
cut -f7 FILENAME | sort -n | uniq
Cut assumes that there is a single tab between columns. If your file is CSV, you might be able to do this:
cut -f7 -d, FILENAME | sort -n | uniq
but that won't work if there is a , in a text field in the file (where CSV will protect it with "'s).
If you want to sort by the column but remove only completely duplicate lines, then you can do this:
sort -k7,7n FILENAME | uniq
sort assumes that columns are separated by whitespace. Again, if you want to separate with ,, you can use:
sort -k7,7n -t, FILENAME | uniq

Related

find similar rows in a text file in unix system

I have a file named tt.txt and the contents of this file is as follows:
fdgs
jhds
fdgs
I am trying to get the similar row as the output in a text file.
my expected output is:
fdgs
fdgs
to do so, I used this command:
uniq -u tt.txt > output.txt
but it returns:
fdgs
jhds
fdgs
do you know how to fix it?
If by similar row you mean the row with the same content.
From the uniq manpage the uniq command would only filter the adjacent matching lines from the repeated lines. So you need to sort the input first and used -D option to print all duplicated lines like below. However -D options is limited to the GNU implementation, and doing this would print the output in different order from the input.
sort tt.txt | uniq -D
If you want the output to be in the same order you need to remember the input line number and sort the line number again like this
cat -n tt.txt | sort -k 2 | uniq -f 1 -D | sort -k 1,1 | sed 's/\s+[0-9]+\s+//'
cat -n would print the content with the line number
sort -k 2 would sort the input starting at 2rd column
uniq -f 1 would ignore the first column
sort -k1,1 would sort the the output back by the original line number
sed 's/\s+[0-9]+\s+//' would delete the first column with line number
uniq -u command would output only the unique input line, which is completely opposite as what you want.
One in awk:
$ awk '++seen[$0]==2;seen[$0]>1' file
fdgs
fdgs

Can someone explain me the unix command the following command

I want to validate the file. As per validation, I need to check the length of each column, null or not null and primary constant of that file.
cat File_name| awk -F '|' '{print NF}' | sort | uniq
This command split lines of the file on tokens using pipe | as delimiter, print number of tokens on each row (NF variable), sort the output (sort command) and on the end get only uniq numbers (uniq command).
The script can be optimised getting rid of cat command and combine it in awk and use parameter of sort to get uniq records:
awk -F '|' '{print NF}' file_name | sort -u

sorting ls-l owners in Unix

I want to sort the owners in alphabetical order from a call to ls -l and cannot figure out a way to do it. I know something like ls-l | sort would sort the file name but how do i sort the owners in order?
The owner is the third field, so use -k 3:
ls -l | sort -k 3
You can extend this idea to sorting based on other fields, and you can have multiple -k options. For instance, maybe you want to sort by owner, and then size in descending order:
ls -l | sort -k 3,3 -k 5rn
I am not sure if you want only the owners or the whole information sorted by owner. In the former case superfo's solution is almost correct.
Additionally you need to remove repeating white spaces from ls's output with tr because otherwise cut that uses them as a delimiter won't work in all directories.*
So in the end you get this:
ls -l | tr -s ' ' | cut -d ' ' -f 3 | sort | uniq
*Some directories have a two digit value in the second field and all other lines with a single digit get an additional whitespace to preserve the layout.
How about ...
ls -l | cut -d ' ' -f 3 | sort | uniq
Try this:
ls -l | awk '{print $3, $4, $8}' | sort
It will print the user name, the group name and the file name. (File name cannot contain spaces)
ls -l | awk '{print $3, $4, $0}' | sort
This will print the user name, group name and the full ls -l output, sorted by the user name first, then the group name, then what ls -l prints first

Unix uniq utility: What is wrong with this code?

What I want to accomplish: print duplicated lines
This is what uniq man says:
SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Discard all but one of successive identical lines from INPUT (or stan-
dard input), writing to OUTPUT (or standard output).
...
-d, --repeated
only print duplicate lines
This is what I try to execute:
root#laptop:/var/www# cat file.tmp
Foo
Bar
Foo
Baz
Qux
root#laptop:/var/www# cat file.tmp | uniq --repeated
root#laptop:/var/www#
So I was waiting for Foo in this example but it returns nothing..
What is wrong with this snippet?
uniq only checks consecutive lines against each other. So you can only expect to see something printed if there are two or more Foo lines in a row, for example.
If you want to get around that, sort the file first with sort.
$ sort file.tmp | uniq -d
Foo
If you really need to have all the non-consecutive duplicate lines printed in the order they occur in the file, you can use awk for that:
$ awk '{ if ($0 in lines) print $0; lines[$0]=1; }' file.tmp
but for a large file, that may be less efficient than sort and uniq. (May be - I haven't tried.)
cat file.tmp | sort | uniq --repeated
or
sort file.tmp | uniq --repeated
cat file.tmp | sort | uniq --repeated
the lines needs to be sorted
uniq operates on adjacent lines. what you want is
cat file.tmp | sort | uniq --repeated
On OS X, I actually would have
sort file.tmp | uniq -d
I've never tried this myself, but I think the word "successive" is the key.
This would probably work if you sorted the input before running uniq over it.
Something like
sort file.tmp | uniq -d

How to keep a file's format if you use the uniq command (in shell)?

In order to use the uniq command, you have to sort your file first.
But in the file I have, the order of the information is important, thus how can I keep the original format of the file but still get rid of duplicate content?
Another awk version:
awk '!_[$0]++' infile
This awk keeps the first occurrence. Same algorithm as other answers use:
awk '!($0 in lines) { print $0; lines[$0]; }'
Here's one that only needs to store duplicated lines (as opposed to all lines) using awk:
sort file | uniq -d | awk '
FNR == NR { dups[$0] }
FNR != NR && (!($0 in dups) || !lines[$0]++)
' - file
There's also the "line-number, double-sort" method.
nl -n ln | sort -u -k 2| sort -k 1n | cut -f 2-
You can run uniq -d on the sorted version of the file to find the duplicate lines, then run some script that says:
if this_line is in duplicate_lines {
if not i_have_seen[this_line] {
output this_line
i_have_seen[this_line] = true
}
} else {
output this_line
}
Using only uniq and grep:
Create d.sh:
#!/bin/sh
sort $1 | uniq > $1_uniq
for line in $(cat $1); do
cat $1_uniq | grep -m1 $line >> $1_out
cat $1_uniq | grep -v $line > $1_uniq2
mv $1_uniq2 $1_uniq
done;
rm $1_uniq
Example:
./d.sh infile
You could use some horrible O(n^2) thing, like this (Pseudo-code):
file2 = EMPTY_FILE
for each line in file1:
if not line in file2:
file2.append(line)
This is potentially rather slow, especially if implemented at the Bash level. But if your files are reasonably short, it will probably work just fine, and would be quick to implement (not line in file2 is then just grep -v, and so on).
Otherwise you could of course code up a dedicated program, using some more advanced data structure in memory to speed it up.
for line in $(sort file1 | uniq ); do
grep -n -m1 line file >>out
done;
sort -n out
first do the sort,
for each uniqe value grep for the first match (-m1)
and preserve the line numbers
sort the output numerically (-n) by line number.
you could then remove the line #'s with sed or awk

Resources