Can someone explain me the unix command the following command - unix

I want to validate the file. As per validation, I need to check the length of each column, null or not null and primary constant of that file.
cat File_name| awk -F '|' '{print NF}' | sort | uniq

This command split lines of the file on tokens using pipe | as delimiter, print number of tokens on each row (NF variable), sort the output (sort command) and on the end get only uniq numbers (uniq command).
The script can be optimised getting rid of cat command and combine it in awk and use parameter of sort to get uniq records:
awk -F '|' '{print NF}' file_name | sort -u

Related

Combining two awk commands in single command

I want to combine these two command and want to invoke single command
In first command i am storing 4th column of x.csv(Separator ,) file in z.csv file.
awk -F, '{print $4}' x.CSV > z.csv
In second command, i want to find out unique first-column value of z.csv(Separator-space) file.
awk -F\ '{print $1}' z.csv|sort|uniq
I want to combine these two command in single command,How can i do that?
Pipe the output of the first awk to the second awk:
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' |sort|uniq
or, as Avinash Raj suggested,
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' | sort -u
Assuming that the content of z.csv is actually wanted, rather than just an artefact of the way you're currently implementing your program, then you can use:
awk -F, '{ print $4 > "z.csv"
split($4, f, " ")
f4[f[1]] = 1
}
END { for (i in f4) print i }' x.CSV
The split function breaks field 4 on spaces, and (associative) array f4 records the key value. The loop at the end prints out the distinct values, unsorted. If you need them sorted, you can either use GNU awk's built-in sort functions or (if you don't have an awk with built-in sort functions) write your own in awk, or pipe the output to sort.
With GNU awk, you can replace the END block with:
END { asorti(f4); for (i in f4) print f4[i] }
If you don't want the z.csv file, then (a) you could have used a pipe in the first place, and (b) you can simply remove the print $4 > "z.csv" line.
awk '{split($4,b," "); a[b[1]]=1} END { for( i in a) print i }' FS=, x.CSV
This does not sort the data, but it's not clear if you actually want it sorted or merely needed that to get unique entries. If you do want it sorted, pipe it to sort.

using awk to get column values and then running another command on values and printing them

I've always used Stack Overflow to get help with issues but this is my first post. I am new to UNIX scripting and I was given a task to get values of column two and then run a command on them. The command I am suppose to run is 'echo -n "$2" | openssl dgst -sha1;' which is a function to hash a value. My problem is not hashing one value, but hashing them all and then printing them. Can someone maybe help me figure this out? This is how I am starting but I think the path I am going is wrong.
NOTE: this is a CSV text file and I know I need to use AWK command for this.
awk 'BEGIN { FS = "," } ; { print $2 }'
while [ "$2" != 0 ];
do
echo -n "$2" | openssl dgst -sha1
done
This prints the second column in it's entirety and also print some type of hashed value.
Sorry for the long first post, just trying to be as specific as possible. Thanks!
You don't really need awk just for extracting the second column. You can do by using bash read built in and setting the IFS to the delimiter.
while IFS=, read -ra line; do
[[ ${line[1]} != 0 ]] && echo "${line[1]}" | openssl dgst -sha1
done < inputFile
You should probably post some sample input data and the error you are getting so that someone can debug your existing code better.
This will do the trick:
$ awk '{print $2}' file | xargs -n1 openssl dgst -sha1
Use awk to print the second field in the file and xargs with the -n1 to pass each record separately to openssl.
If by CSV you mean each record is seperated by a comma then you need to add -F, to awk.
$ awk -F, '{print $2}' file | xargs -n1 openssl dgst -sha1

sort and uniq oneliner

Is there a oneliner for for sort and uniq given a filename in unix?
I googled and found the following but its not sorting,also not sure what is the below command doing..any better ways using awk or anyother unix tool?
cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file
On a side note,is there one that can be used in both windows and unix?this is not important but just checking..
C:\Users\Chola>sort -t "#" -k2,2 email-list.txt
Input text file:-
436485
422636
429228
427041
433414
425810
422636
431526
428808
If your file consists only of numbers, one per line:
sort -n FILENAME | uniq
or
sort -u -n FILENAME
(You can add -u to the sort command instead of piping through uniq in all of the following.).
If you want to extract just one column of a file, and then sort that column numerically removing duplicates, you could do this:
cut -f7 FILENAME | sort -n | uniq
Cut assumes that there is a single tab between columns. If your file is CSV, you might be able to do this:
cut -f7 -d, FILENAME | sort -n | uniq
but that won't work if there is a , in a text field in the file (where CSV will protect it with "'s).
If you want to sort by the column but remove only completely duplicate lines, then you can do this:
sort -k7,7n FILENAME | uniq
sort assumes that columns are separated by whitespace. Again, if you want to separate with ,, you can use:
sort -k7,7n -t, FILENAME | uniq

sorting ls-l owners in Unix

I want to sort the owners in alphabetical order from a call to ls -l and cannot figure out a way to do it. I know something like ls-l | sort would sort the file name but how do i sort the owners in order?
The owner is the third field, so use -k 3:
ls -l | sort -k 3
You can extend this idea to sorting based on other fields, and you can have multiple -k options. For instance, maybe you want to sort by owner, and then size in descending order:
ls -l | sort -k 3,3 -k 5rn
I am not sure if you want only the owners or the whole information sorted by owner. In the former case superfo's solution is almost correct.
Additionally you need to remove repeating white spaces from ls's output with tr because otherwise cut that uses them as a delimiter won't work in all directories.*
So in the end you get this:
ls -l | tr -s ' ' | cut -d ' ' -f 3 | sort | uniq
*Some directories have a two digit value in the second field and all other lines with a single digit get an additional whitespace to preserve the layout.
How about ...
ls -l | cut -d ' ' -f 3 | sort | uniq
Try this:
ls -l | awk '{print $3, $4, $8}' | sort
It will print the user name, the group name and the file name. (File name cannot contain spaces)
ls -l | awk '{print $3, $4, $0}' | sort
This will print the user name, group name and the full ls -l output, sorted by the user name first, then the group name, then what ls -l prints first

How to keep a file's format if you use the uniq command (in shell)?

In order to use the uniq command, you have to sort your file first.
But in the file I have, the order of the information is important, thus how can I keep the original format of the file but still get rid of duplicate content?
Another awk version:
awk '!_[$0]++' infile
This awk keeps the first occurrence. Same algorithm as other answers use:
awk '!($0 in lines) { print $0; lines[$0]; }'
Here's one that only needs to store duplicated lines (as opposed to all lines) using awk:
sort file | uniq -d | awk '
FNR == NR { dups[$0] }
FNR != NR && (!($0 in dups) || !lines[$0]++)
' - file
There's also the "line-number, double-sort" method.
nl -n ln | sort -u -k 2| sort -k 1n | cut -f 2-
You can run uniq -d on the sorted version of the file to find the duplicate lines, then run some script that says:
if this_line is in duplicate_lines {
if not i_have_seen[this_line] {
output this_line
i_have_seen[this_line] = true
}
} else {
output this_line
}
Using only uniq and grep:
Create d.sh:
#!/bin/sh
sort $1 | uniq > $1_uniq
for line in $(cat $1); do
cat $1_uniq | grep -m1 $line >> $1_out
cat $1_uniq | grep -v $line > $1_uniq2
mv $1_uniq2 $1_uniq
done;
rm $1_uniq
Example:
./d.sh infile
You could use some horrible O(n^2) thing, like this (Pseudo-code):
file2 = EMPTY_FILE
for each line in file1:
if not line in file2:
file2.append(line)
This is potentially rather slow, especially if implemented at the Bash level. But if your files are reasonably short, it will probably work just fine, and would be quick to implement (not line in file2 is then just grep -v, and so on).
Otherwise you could of course code up a dedicated program, using some more advanced data structure in memory to speed it up.
for line in $(sort file1 | uniq ); do
grep -n -m1 line file >>out
done;
sort -n out
first do the sort,
for each uniqe value grep for the first match (-m1)
and preserve the line numbers
sort the output numerically (-n) by line number.
you could then remove the line #'s with sed or awk

Resources