Unix Command for counting number of words which contains letter combination (with repeats and letters in between) - unix

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!

You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.

I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.

Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

Related

Use awk to replace word in file

I have a file with some lines:
a
b
c
d
I would like to cat this file into a awk command to produce something like this:
letter is a
letter is b
letter is c
letter is d
using something like this:
cat file.txt | awk 'letter is $1'
But it's not printing out as expected:
$ cat raw.txt | awk 'this is $1'
a
b
c
d
At the moment, you have no { action } block, so your condition evaluates the two empty variables this and is, concatenating them with the first field $1, and checks whether the result is true (a non-empty string). It is, so the default action prints each line.
It sounds like you want to do this instead:
awk '{ print "letter is", $1 }' raw.txt
Although in this case, you might as well just use sed:
sed 's/^/letter is /' raw.txt
This command matches the start of each line and adds the string.
Note that I'm passing the file as an argument, rather than using cat with a pipe.
Not sure if you wanted sed or awk but this is in awk:
$ awk '{print "letter is " $1}' file
letter is a
letter is b
letter is c
letter is d

what is the alternate way to count the occurrence of each word without using 'uniq -c' command?

Is it possible to count the occurrence of each word like using uniq -c but with the count after the word rather than before?
Example scenario
Input file named as text1.txt which contain the following data
Renault:cilo:84563
Renault:cilo:84565
M&M:Thar:84566
Tata:nano:84567
M&M:quanto:84568
M&M:quanto:84569
The fields used in the above data are car_company:car_model:customerID
Desired result
cilo 2
Thar 1
nano 1
quanto 2
(car_model and number of cars sold grouped by car_model)
My code
cat test1.txt | cut -d: -f2 | uniq -c
Actual Result
2 cilo
1 Thar
1 nano
2 quanto
Is it possible to do the above process without using uniq -c ,so that I can swap the order of the fields (columns)?
You can use uniq, and simply post-process its output to swap the columns:
cut -d: -f2 test1.txt | uniq -c | awk '{print $2 "\t" $1 "\n" }'
EDIT: Added \n, as noted in a comment.
Save your commands output into a file "badresult";
cat test1.txt | cut -d: -f2 | uniq -c > badresult
Then cut the seventh field and save it into a file named "counts"(you should use space(" ") as a seperator);
cut -d" " -f7 badresult > counts
Then cut the eighth field and save it into a file named "models"(you should use space(" ") as a seperator);
cut -d" " -f8 badresult > models
Now you have your counts and models in seperate files. All you have to do is to show these two files seperately with "pr" command(-m: one file per column, -T:no pre-information)
pr -m -T models counts
Using awk:
cat test1.txt | cut -d: -f2 | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print }'
The little awk code exchanges fields 1 and 2 using a temporary.
You just need awk for this:
$ awk -F: '{a[$2]++} END {for (i in a) print i, a[i]}' file
cilo 2
quanto 2
nano 1
Thar 1
This goes through every line keeping track of how many times the second field has appeared. Since everything is stored in the array a, then it is just a matter of looping through it and printing its content.

Grep and split in unix

I need to first grep the exact line and then to capture the required value.
eg.
Total logical records skipped: 0
Total logical records read: 500
Total logical records rejected: 3
Total logical records discarded: 0
I need to capture the value 500. How can i do that?
One way using awk:
awk '/Total logical records read:/ { print $NF }' file.txt
If by capture, you mean store as a shell variable:
variable=$(awk '/Total logical records read:/ { print $NF }' file.txt)
Use grep + awk:
cat log.txt | grep read | awk '{print $ 5;}'
or just awk:
cat log.txt | awk '/read/ {print $5;}'
Look at these examples: Awk Introduction Tutorial – 7 Awk Print Examples

How to interleave lines from two text files

What's the easiest/quickest way to interleave the lines of two (or more) text files? Example:
File 1:
line1.1
line1.2
line1.3
File 2:
line2.1
line2.2
line2.3
Interleaved:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Sure it's easy to write a little Perl script that opens them both and does the task. But I was wondering if it's possible to get away with fewer code, maybe a one-liner using Unix tools?
paste -d '\n' file1 file2
Here's a solution using awk:
awk '{print; if(getline < "file2") print}' file1
produces this output:
line 1 from file1
line 1 from file2
line 2 from file1
line 2 from file2
...etc
Using awk can be useful if you want to add some extra formatting to the output, for example if you want to label each line based on which file it comes from:
awk '{print "1: "$0; if(getline < "file2") print "2: "$0}' file1
produces this output:
1: line 1 from file1
2: line 1 from file2
1: line 2 from file1
2: line 2 from file2
...etc
Note: this code assumes that file1 is of greater than or equal length to file2.
If file1 contains more lines than file2 and you want to output blank lines for file2 after it finishes, add an else clause to the getline test:
awk '{print; if(getline < "file2") print; else print ""}' file1
or
awk '{print "1: "$0; if(getline < "file2") print "2: "$0; else print"2: "}' file1
#Sujoy's answer points in a useful direction. You can add line numbers, sort, and strip the line numbers:
(cat -n file1 ; cat -n file2 ) | sort -n | cut -f2-
Note (of interest to me) this needs a little more work to get the ordering right if instead of static files you use the output of commands that may run slower or faster than one another. In that case you need to add/sort/remove another tag in addition to the line numbers:
(cat -n <(command1...) | sed 's/^/1\t/' ; cat -n <(command2...) | sed 's/^/2\t/' ; cat -n <(command3) | sed 's/^/3\t/' ) \
| sort -n | cut -f2- | sort -n | cut -f2-
With GNU sed:
sed 'R file2' file1
Output:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Here's a GUI way to do it: Paste them into two columns in a spreadsheet, copy all cells out, then use regular expressions to replace tabs with newlines.
cat file1 file2 |sort -t. -k 2.1
Here its specified that the separater is "." and that we are sorting on the first character of the second field.

Advanced grep unix

Usually grep command is used to display the line contaning the specified pattern. Is there any way to display n lines before and after the line which contains the specified pattern?
Can this will be achieved using awk?
Yes, use
grep -B num1 -A num2
to include num1 lines of context before the match, and num2 lines of context after the match.
EDIT:
Seems the OP is using AIX. This has a different set of options which doesn't include -B and -A
this link describes grep on AIX 4.3 (it doesn't look promising)
Matt's perl script might be a better solution.
Here is what I usually do on AIX:
before=2 << The number of lines to be shown Before >>
after=2 << The number of lines to be shown After >>
grep -n <pattern> <filename> | cut -d':' -f1 | xargs -n1 -I % awk "NR<=%+$after && NR>=%-$before" <filename>
If you do not want the extra 2 varialbles you can always use it an a one line:
grep -n <pattern> <filename> | cut -d':' -f1 | xargs -n1 -I % awk 'NR<=%+<<after>> && NR>=%-<<before>>' <filename>
Suppose I have a pattern 'stack' and the filename is flow.txt
I want 2 lines before and 3 lines after. The the command will be like:
grep -n 'stack' flow.txt | cut -d':' -f1 | xargs -n1 -I % awk 'NR<=%+3 && NR>=%-2' flow.txt
I want 2 lines before and only - the the command will be like:
grep -n 'stack' flow.txt | cut -d':' -f1 | xargs -n1 -I % awk 'NR<=% && NR>=%-2' flow.txt
I want 3 lines after and only - the the command will be like:
grep -n 'stack' flow.txt | cut -d':' -f1 | xargs -n1 -I % awk 'NR<=%+3 && NR>=%' flow.txt
Multiple Files - change it for Awk & grep. From above for the pattern 'stack' with the filename is flow.* - 2 lines before and 3 lines after. The the command will be like:
awk 'BEGIN {
before=1; after=3; pattern="stack";
i=0; hold[before]=""; afterprints=0}
{
#Print the lines from the previous Match
if (afterprints > 0)
{
print FILENAME ":" FNR ":" $0
afterprints-- #keep a track of the lines to print after - this can be reset if a match is found
if (afterprints == 0) print "---"
}
#Look for the pattern in current line
if ( match($0, pattern) > 0 )
{
# print the lines in the hold round robin buffer from the current line to line-1
# if (before >0) => user wants lines before avoid divide by 0 in %
# and afterprints => 0 - we have not printed the line already
for(j=i; j < i+before && before > 0 && afterprints == 0 ; j++)
print hold[j%before]
if (afterprints == 0) # print the line if we have not printed the line already
print FILENAME ":" FNR ":" $0
afterprints=after
}
if (before > 0) # Store the lines in the round robin hold buffer
{ hold[i]=FILENAME ":" FNR ":" $0
i=(i+1)%before }
}' flow.*
From the tags, it's likely that the system has a grep that may not support providing context (Solaris is one system that doesn't and I can't remember about AIX). If that is the case, there's a perl script that may help at http://www.sun.com/bigadmin/jsp/descFile.jsp?url=descAll/cgrep__context_grep.
If you have sed you could use this shell script
BEFORE=2
AFTER=3
FILE=file.txt
PATTERN=pattern
for i in $(grep -n $PATTERN $FILE | sed -e 's/\:.*//')
do head -n $(($AFTER+$i)) $FILE | tail -n $(($AFTER+$BEFORE+1))
done
What it does is, grep -n prefixes each match with the line it was found at, the sed strips all but the line it was found at. Then you use head to get the lines up to the line it was found on plus an additional $AFTER lines. That's then piped to tail to just get $BEFORE + $AFTER + 1 lines (that is, your matching line plus the number of lines before and after)
Sure there is (from the grep man page):
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
and if you want the same amount of lines before AND after the match, use:
-C NUM, -NUM, --context=NUM
Print NUM lines of output context. Places a line containing a
group separator (--) between contiguous groups of matches. With
the -o or --only-matching option, this has no effect and a
warning is given.
you can use awk
awk 'BEGIN{t=4}
c--&&c>=0
/pattern/{ c=t; for(i=NR;i<NR+t;i++)print a[i%t] }
{ a[NR%t]=$0}
' file
output
$ more file
1
2
3
4
5
pattern
6
7
8
9
10
11
$ ./shell.sh
2
3
4
5
6
7
8
9

Resources