Unix command for below file - unix

I have a CSV file like below
05032020
Col1|col2|col3|col4|col5
Infosys
Tcs
Wipro
Accenture
Deloitte
I want record count by skipping date and Header columns
O/p: Record count 5 with including line numbers
cat FF_Json_to_CSV_MAY03.txt
05032020
requestId|accountBranch|accountNumber|guaranteeGuarantor|accountPriority|accountRelationType|accountType|updatedDate|updatedBy
0000000001|5BW|52206|GG1|02|999|CHECKING|20200503|BTCHLCE
0000000001|55F|80992|GG2|02|1999|IRA|20200503|0QLC
0000000001|55F|24977|CG|01|3999|CERTIFICAT|20200503|SRIKANTH
0000000002|5HJ|03349|PG|01|777|SAVINGS|20200503|BTCHLCE
0000000002|5M8|999158|GG3|01|900|CORPORATE|20200503|BTCHLCE
0000000002|5LL|49345|PG|01|999|CORPORATE|20200503|BTCHLCE
0000000002|5HY|15786|PG|01|999|CORPORATE|20200503|BTCHLCE
0000000003|55F|34956|CG|01|999|CORPORATE|20200503|SRIKANTH
0000000003|5BY|14399|GG10|03|10|MONEY MARK|20200503|BTCHLCE
0000000003|5PE|32100|PG|04|999|JOINT|20200503|BTCHLCE
0000000003|5LB|07888|GG25|02|999|BROKERAGE|20200503|BTCHLCE
0000000004|55F|36334|CG|02|999|JOINT|20200503|BTCHLCE
0000000005|55F|06739|GG9|02|999|SAVINGS|20200503|BTCHLCE
0000000005|5CP|39676|PG|01|999|SAVINGS|20200503|BTCHLCE
0000000006|55V|62452|CG|01|10|CORPORATE|20200503|SRIKANTH
0000000007|55V|H9889|CG|01|999|SAVINGS|20200503|BTCHLCE
0000000007|5L2|03595|PG|02|999|CORPORATE|20200503|BTCHLCE
0000000007|55V|C1909|GG8|01|10|JOINT|20200503|BTCHLCE
I need line numbers from 00000000001

There are two ways to solve your issue:
Count only the records you want to count
Count all records and remove the ones you don't want to count
From your example, it's not possible to know how to do it, but let me give you some ideas:
Imagine that your file starts with 3 header lines, then you can do something like:
wc -l inputfile | awk '{print $1-3}'
Imagine that the lines you want to count all start with a number and a dot, then you can do something like:
grep "[0-9]*\." inputfile | wc -l

Related

How can I use unix count command

I have a text file 2 fields separated by :
i3583063:b3587412
i3583064:b3587412
i3583065:b3587412
i3583076:b3587421
i3583077:b3587421
i3583787:b3587954
i3584458:b3588416
i3584459:b3588416
i3584460:b3588416
i3584461:b3588416
i3584462:b3588416
i3584463:b3588416
i3584464:b3588416
i3584465:b3588416
Field 1 is always uniq but not field 2 it can be repeated. How can I identify first, 2nd 3rd etc. occurrence of field 2? Can I use count?
Thanks
I don't know if I've ever heard of a standard Unix count utility, but you can do this with Awk. Here's an Awk script that adds the count as a third column:
awk -F: 'BEGIN {OFS=":"} {$3=++count[$2]; print}' input.txt
It should generate the output:
i3583063:b3587412:1
i3583064:b3587412:2
i3583065:b3587412:3
i3583076:b3587421:1
i3583077:b3587421:2
i3583787:b3587954:1
i3584458:b3588416:1
i3584459:b3588416:2
i3584460:b3588416:3
i3584461:b3588416:4
i3584462:b3588416:5
i3584463:b3588416:6
i3584464:b3588416:7
i3584465:b3588416:8
The heart of the script {$3=++count[$2]; print} simply increments a counter indexed by the value of the second field, stores it in a new third field, and then outputs the line with this new field. Awk is a great little language and still well worth learning.
You can use the sort command with the -u parameter. This way redundant lines are removed.
sort -u filename.txt
If you want to count occurrences
sort -u filename.txt | wc -l

parsing tab separated header of a file in unix

I'm trying to work out a generic script for getting the tab separated column values (as a header to the file). Separating via awk is fine for getting the column names. I'm not getting the idea as how to get each tab separated values until the end or the last NF(if using awk). The number of columns in the file isn't fix, sometime it might come up with 20 columns, sometime it could be 100, etc.
For ex: the tab separated columns in the file are-
abc ttr nnc r32 inc ...
If I write a simple awk as:
head -1 file | awk 'BEGIN {NF="\t"} {print $1,$2, etc}
It'd present each tab separated column represented by $1, $2, etc. I tried an incremental version by replacing $1, $2, etc by $i but wouldn't work.
Any ideas on this?
If I understand correctly, you are asking how to loop over the fields from 1 to NF. Here is an example of such a loop:
$ head -1 file | awk -F"\t" '{for (i=1;i<=NF;i++)printf "%s ",$i; print"";}'
abc ttr nnc r32 inc

Search a log file for total count of unique ips

I'm using this following format to get the counts of how many times unique IPs hit my website.
Search a log file for total count of unique ips
zcat *file* | awk '{print $1}' | sort | uniq -c | sort -n
This gives me a list of IPs and it's occurrence.
1001 109.165.113.xxx
1001 178.137.88.xxx
1001 178.175.13.xxx
1001 81.4.217.xxx
1060 74.122.180.xxx
1103 67.201.52.xxx
1203 81.144.138.xxx
1670 54.240.158.xxx
1697 54.239.137.xxx
2789 39.183.147.xxx
4630 93.158.143.xxx
What I want to find out is simple and if it can be done on a single command line.
I just want the count of this list. So from the above example. I want for the buffer to tell me 11. I thought I could use a second AWK command to count the unique occurrence of the 2nd output but I guess you cannot use AWK twice in a single command line.
Obviously I can output the above to a log file and then run a second awk command to count the unique occurrence of the 2nd field(IPS) but I was hoping to get this done in a single command.
You might want:
zcat ... |
awk '{cnt[$1]++} END{for (ip in cnt) {unq++; print cnt[ip], ip}; print unq+0}'
If you have GNU awk you can add BEGIN{PROCINFO["sorted_in"]="#ind_num_asc"} at the front to get the loop output sorted, see http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Scanning.
Here is the awk code to get total count of unique ips
zcat *file* | awk '{a[$1]} END {print length(a)}'

How to get the count of duplicate strings in a set using grep, uniq and awk in unix?

I have a very large set of strings, one on every line of a file. Many strings occur more than one times in the file at different locations.
I want a frequency count of the strings using unix commands like awk, grep, uniq and so on. I tried few combinations but it didn't work.
what is the exact command to get the frequency count?
To count the occurrences of lines in a file the simplest thing to do is:
$ sort file | uniq -c

Difference between linenumbers of cat file | nl and wc -l file

i have a file with e.g. 9818 lines. When i use wc -l file, i see 9818 lines. When i vi the file, i see 9818 lines. When i :set numbers, i see 9818 lines. But when i cat file | nl, i see the final line number is 9750 (e.g.). Basically i'm asking why line numbers from cat file | nl and wc -l file do not match.
wc -l: count all lines
nl: count all (nonempty) lines
try
nl -ba: count all lines
nl(1) says the default is for header and footer lines to not be numbered (-hn -fn), and those are specified by repeating \; on various lines. Perhaps your input file includes some of these?
I suggest reading the output of nl line by line against cat -n output and see where things diverge. Or use diff -u if you want to take the fun out of reading 9818 lines. :)
nl does not number blank lines, so this is almost certainly the reason. If you can point us to the file, we can confirm that, but I suspect this is the case.

Resources