BASH SHELL print columns with specific order - unix

I have this file :
933|Mahinda|Perera|male|1989-12-03|2010-03-17T13:32:10.447+0000|192.248.2.123|Firefox
1129|Carmen|Lepland|female|1984-02-18|2010-02-28T04:39:58.781+0000|81.25.252.111|Internet Explorer
4194|Hồ Chí|Do|male|1988-10-14|2010-03-17T22:46:17.657+0000|103.10.89.118|Internet Explorer
8333|Chen|Wang|female|1980-02-02|2010-03-15T10:21:43.365+0000|1.4.16.148|Internet Explorer
8698|Chen|Liu|female|1982-05-29|2010-02-21T08:44:41.479+0000|14.103.81.196|Firefox
8853|Albin|Monteno|male|1986-04-09|2010-03-19T21:52:36.860+0000|178.209.14.40|Internet Explorer
10027|Ning|Chen|female|1982-12-08|2010-02-22T17:59:59.221+0000|1.2.9.86|Firefox
and with this order
./tool.sh --browsers -f <file>
i want to count the number of the browsers in specific order , for example :
Chrome 143
Firefox 251
Internet Explorer 67
i use this command :
if [ "$1" == "--browsers" -a "$2" == "-f" -a "$4" == "" ]
then
awk -F'|' '{print $8}' $3 | sort | uniq -c | awk ' {print $2 , $3 , $1} '
fi
but it works only for 3 arguments. How to make it work for many arguments? for example a browser with 4 words or more

Seems like an awk one-liner to count your browsers:
$ awk -F'|' '{a[$8]++} END{for(i in a){printf("%s %d\n",i,a[i])}}' inputfile
Firefox 3
Internet Explorer 4
This increments elements of an array, then at the end of the file steps through the array and prints the totals. If you want the output sorted, you can just pipe it through sort. I don't see a problem with multiple words in a browser name.

try this:
awk -F"|" '{print $8}' in | sort | uniq -c | awk '{print $2,$1}'
where in is the input file.
output
[myShell] ➤ awk -F"|" '{print $8}' in | sort | uniq -c | awk '{print $2,$1}'
Firefox 3
Internet 4
also for parsing argument is better to use getopts
i.e.
#!/bin/bash
function usage {
echo "usage: ..."
}
while getopts b:o:h opt; do
case $opt in
b)
fileName=$OPTARG
echo "filename[$fileName]"
awk -F"|" '{print $8}' $fileName | sort | uniq -c | awk '{print $2,$1}'
;;
o)
otherargs=$OPTARG
echo "otherargs[$otherargs]"
;;
h)
usage && exit 0
;;
?)
usage && exit 2
;;
esac
done
output
[myShell] ➤ ./arg -b in
filename[in]
Firefox 3
Internet 4

Your final Awk hard-codes two fields; just continue with $4, $5, $6 etc to print more fields. However, this will add a spurious space for each comma.
Better yet, since the first field is fixed width (because that's the output format from uniq -c), you can do print substr($0,8), $1

I'd do it in perl:
#!/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %count_of;
while ( <> ) {
chomp;
$count_of{(split /\|/)[7]}++;
}
print Dumper \%count_of;
This can be cut down to a one liner:
perl -F'\|' -lane '$c{$F[7]++}; END{ print "$_ => $c{$_}" for keys %c }'

Related

Merge a string to a line extracted from a text file in UNIX

I wanted to merge a string ABC to a line that I have extracted from a file.
The following command is used to extract the lines 20-25 in file_ABC, take only the first column, which is then transposed to become a row (or line).
sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s
This is the result:
2727778 14734 0 0 0 2713044
I would like to add at the first position of this line the string ABC.
ABC 2727778 14734 0 0 0 2713044
Any suggestion on how to do that?
A quick hack would be to use something like
printf 'ABC\t%s\n' "$(sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s)"
You could modify your initial command instead to use awk for everything, though:
awk '
BEGIN {printf "ABC"}
NR>=20 && NR<=25 {printf "\t%s", $1}
END {print ""}
' file_ABC
This might work for you (GNU sed):
sed '20,25{s/\s.*//;H};$!d;x;s/^/ABC/;s/\n/ /g' file
Gather up the first column fields by appending them to the hold space for rows 20 to 25 only. At the end of the file prepend ABC and replace the introduced newlines by spaces.
For fun, bash only
filename=file_ABC
words=("${filename##*_}")
i=0
while read -r word rest_of_line; do
((++i < 20 )) && continue
(( i > 25 )) && break
words+=("$word")
done < "$filename"
join() { local IFS=$1; shift; echo "$*"; }
join $'\t' "${words[#]}"
But this will be much slower than a single awk call.
if you want to keep all in one script
$ awk 'BEGIN {line="ABC"}
NR>=20 && NR<=25 {line=line FS $1}
NR==25 {print line; exit}' file
improved version as suggested by #EdMorton
$awk 'NR>=20 {line=line OFS $1}
NR==25 {print "ABC" line; exit}' file

Extract file string from left side but following 2nd delimiter from right

Below are the full file names.
qwertyuiop.abcdefgh.1234567890.txt
qwertyuiop.1234567890.txt
trying to use
awk -F'.' '{print $1}'
How can i use awk command to extract below output.
qwertyuiop.abcdefgh
qwertyuiop
Edit
i have a list of files in a directory
i am trying to extract time,size,owner,filename into seperate variables.
for filenames.
NAME=$(ls -lrt /tmp/qwertyuiop.1234567890.txt | awk -F'/' '{print $3}' | awk -F'.' '{print $1}')
$ echo $NAME
qwertyuiop
$
NAME=$(ls -lrt /tmp/qwertyuiop.abcdefgh.1234567890.txt | awk -F'/' '{print $3}' | awk -F'.' '{print $1}')
$ echo $NAME
qwertyuiop
$
expected
qwertyuiop.abcdefgh
With GNU awk and other versions that allow manipulation of NF
$ awk -F. -v OFS=. '{NF-=2} 1' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
NF-=2 will effectively delete last two fields
1 is an awk idiom to print contents of $0
Note that this assumes there are at least two fields in every line, otherwise you'd get an error
Similar concept with perl, prints empty line if number of fields in the line is less than 3
$ perl -F'\.' -lane 'print join ".", #F[0..$#F-2]' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
With sed, you can preserve lines if number of fields is less than 3
$ sed 's/\.[^.]*\.[^.]*$//' ip.txt
qwertyuiop.abcdefgh
qwertyuiop
EDIT: Taking inspiration from Sundeep sir's solution and adding this following too in this mix.
awk 'BEGIN{FS=OFS="."} {$(NF-1)=$NF="";sub(/\.+$/,"")} 1' Input_file
Could you please try following.
awk -F'.' '{for(i=(NF-1);i<=NF;i++){$i=""};sub(/\.+$/,"")} 1' OFS="." Input_file
OR
awk 'BEGIN{FS=OFS="."} {for(i=(NF-1);i<=NF;i++){$i=""};sub(/\.+$/,"")} 1' Input_file
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Mentioning BEGIN section of awk program here.
FS=OFS="." ##Setting FS and OFS variables for awk to DOT here as per OPs sample Input_file.
} ##Closing BEGIN section here.
{
for(i=(NF-1);i<=NF;i++){ ##Starting for loop from i value from (NF-1) to NF for all lines.
$i="" ##Setting value if respective field to NULL.
} ##Closing for loop block here.
sub(/\.+$/,"") ##Substituting all DOTs till end of line with NULL in current line.
}
1 ##Mentioning 1 here to print edited/non-edited current line here.
' Input_file ##Mentioning Input_file name here.

Why my awk string match not working?

$ echo foooobazbarrrrr |
> gawk 'match($0, /(fo+).+(bar*)/, arr)
> {print arr[1], arr[2] }'
The output of this code should be foooo barrrr but on my Ubuntu, it is not working and failed.
If I wrote this code
> gawk 'match($0, /(fo+).+(bar*)/)
> {print }'
Then its working. Why is the first version not working?
Your command is slightly different from the example in the GNU manual. It has the opening { at the very start so that there's no pattern to match and the newline is required to separate the two awk commmands.
$ echo foooobazbarrrrr | gawk '{ match($0, /(fo+).+(bar*)/, arr)
> print arr[1], arr[2] }'
foooo barrrrr
Alternatively, you could use a semi-colon instead of a newline to separate the commands:
$ echo foooobazbarrrrr | gawk '{ match($0, /(fo+).+(bar*)/, arr); print arr[1], arr[2] }'
foooo barrrrr
Your version of the command will work if it’s entered as one line:
$ echo foooobazbarrrrr | gawk 'match($0, /(fo+).+(bar*)/, arr) {print arr[1], arr[2] }'
foooo barrrrr

counting records in unix file

This was an interview question, nevertheless still a programming question.
I have a unix file with two columns name and score. I need to display count of all the scores.
like
jhon 100
dan 200
rob 100
mike 100
the output should be
100 3
200 1
You only need to use built in unix utility to solve it, so i am assuming using shell scripts . or reg ex. or unix commands
I understand looping would be one way to do. store all the values u have already seen and then grep every record for unseen values. any other efficient way of doing it
Try this:
cut -d ' ' -f 2 < /tmp/foo | sort -n | uniq -c \
| (while read n v ; do printf "%s %s\n" "$v" "$n" ; done)
The initial cut could be replaced with another while read loop, which would be more resilient to input file format variations (extra whitespace). If some of the names consist in several words, simple field extraction will not work as easily, but sed can do it.
Otherwise, use your favorite programming language. Perl would probably shine. It is not difficult either in Java or even in C or Forth.
$ cat foo.txt
jhon 100
dan 200
rob 100
mike 100
$ awk '{print $2}' foo.txt | sort | uniq -c
3 100
1 200
Its a pity you can't do a count with sort or uniq alone.
Edit: I just noticed I have the count in front ... to get it exactly the same you can do:
$ awk '{print $2}' foo.txt | sort | uniq -c | awk '{ print $2 " " $1 }'
Not very complicated in perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my %count = ();
while (<>) {
chomp;
my ($name, $score) = split(/ /);
$count{$score}++;
}
foreach my $key (sort keys %count) {
print "$key ", $count{$key}, "\n";
}
You could go with awk:
awk '/.*/ { a[$2] = a[$2] + 1; } END { for (x in a) { print x, " ", a[x] } }' record_file.txt
Alternatively with shell commands:
for i in `awk '{print $2}' inputfile | sort -u`
do
echo -n "$i "
grep $i inputfile | wc -l
done
The first awk command will give a list of all the different scores (e.g. 100 and 200) which then
the for loop iterates over, counting up each separately. Not very super efficient, but simple. If the file is not to big is should not be a too big problem.

Parsing each field and process it using 'awk'/'gawk'

Here is a query:
grep bar 'foo.txt' | awk '{print $3}'
The field name emitted by the 'awk' query are mangled C++ symbol names. I want to pass each to dem and finally output the output of 'dem'- i.e the demangled symbols.
Assume that the field separator is a ' ' (space).
awk is a pattern matching language. The grep is totally unnecessary.
awk '/bar/{print $3}' foot.txt
does what your example does.
Edit Fixed up a bit after reading the comments on the precedeing answer (I don't know a thing about dem...):
You can make use of the system call in awk with something like:
awk '/bar/{cline="dem " $3; system(cline)}' foot.txt
but this would spawn an instance of dem for each symbol processed. Very inefficient.
So lets get more clever:
awk '/bar/{list = list " " $3;}END{cline="dem " list; system(cline)}' foot.txt
BTW-- Untested as I don't have dem or your input.
Another thought: if you're going to use the xargs formulation offered by other posters, cut might well be more efficient than awk. At that point, however, you would need grep again.
How about
grep bar 'foo.txt' | awk '{ print $3 }' | xargs dem | awk '{ print $3 }'
This will print the demangled symbols, complete with argument lists in the case of methods:
awk '/bar/ { print $3 }' foo.txt | xargs dem | sed -e 's:.* == ::'
This will print the demangled symbols, without argument lists in the case of methods:
awk '/bar/ { print $3 }' foo.txt | xargs dem | sed -e 's:.* == \([^(]*\).*:\1:'
Cheers,
V.

Resources