grep for a pattern occuring 2 or 3 times - unix

I am looking for regular expression which finds the occurrence like 696969 in 2345679696969.
I don't want to search 696969 but to simplify it something like 69 occurring 3 times.
Something like this:
grep '[0-9]\{7\}69\{3\}'
but it searches for occurrence of 9 three times.
Could somebody help?

Group 69 with parentheses:
grep -E '(69){3}'
Test
$ echo "2345679696969" | grep -E '(69){3}'
2345679696969
All together:
$ echo "2345679696969" | grep -E '[0-9]{7}(69){3}'
2345679696969
or with a basic grep (thanks Avinash):
grep '[0-9]\{7\}\(69\)\{3\}'

Related

Get specific line from unix command output

lets say I run a command in the shell cmd doSomething and it shows separate lines as output, for example
> cmd doSomething
outputLine1
outputLine2
outputLine3
Is there a way to assign the 2 nd line(outputLine2) in to a variable (e.g testdir) ?
Ideally I would like to be able to use $testdir.
You can combine head and tail, as follows:
doSomething | head -n 2 | tail -n 1
The head -n 2 shows the first two output lines, the tail -n 1 the last of those two.
For putting this into a variable:
variable=$(doSomething | head -n 2 | tail -n 1)

Replace last 9 delimeters "," with "|" in Unix

I want to replace last 9 "," delimeters with "|" in a file.
For example, from:
abcd,3,5,5,7,7,1,2,3,4
"ashu,pant,something",3,5,5,7,7,8,7,8,8,8
to:
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8
Help would be really appreciated.
Not exactly the same but replace all after the second occurrence with GNU sed:
$ echo \"ashu,pant\",3,5,5,7,7,87,8,8,8 |
sed 's/,/|/2g'
"ashu,pant"|3|5|5|7|7|87|8|8|8
Edit to match your changed requirements:
Hackish, but first reverse lines and replace all commas with pipes, then replace pipes with commas starting from 10th occurrence:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
rev |
sed 's/,/|/g; s/|/,/10g' |
rev
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
You could also use GNU awk and FPAT to replace all comma outside of quotes:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")";OFS="|"}{$1=$1}1'
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
awk '{gsub(/[[:digit:]]/," |&")gsub(/, /,"")}1' file
output
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8

Compare 2 files in unix file1(2M numbers/rows/lines) , file2(2,000,480 numbers/rows/lines)

How can I compare this 2 big files in unix.
I've already tried using 'grep -Fxvf file1.txt file2.txt | wc -l' but the output is 2,000,480 and when switching file1 and file2 the output is 1,999,999.
How can I get the output of '480' because that's what i am expecting.
I've also tried using diff/cmp commands but the output is too complicated.
I think you want an absolute value of a difference in line numbers in 2 files. You can achieve it easily with awk and get a decent result. You'd read numbers of lines in an array and later subtract the array values in the END block. For pure shell it'd have to get more complex. Imagine you get some test data generated (10 and 14 line files):
$ seq 1 10 > ten
$ seq 1 14 > fourteen
And then you do:
$ ( wc -l ten ; wc -l fourteen ) | awk '{ print $1}' | sort -rn | xargs -J % echo % - p | dc
The result:
4
But much better way would be do just do it in 3 lines (get word count for file1, then file2 and then subtract)

Getting the last x digits from output with grep command

I need help getting the last 16 digits from the output I get with this command ;
cat q5data.txt | grep -o '[0-9]*[0-9]\{16\}'
The output I get is :
6420029454020029
26787889786973463
92272417810036027222591368318424
1147142436072964
And id want the last 16 digits only of the numbers above, so it would look something like this :
6420029454020029
6787889786973463
7222591368318424
1147142436072964
So yeah, the question is, how would I get the last 16 digits ?
q5data contains this:
0111102.82575525572371251FriThuSat32169716436971243.1415 foo100001$$$3.14153
foo`3.1415Green100010blah2.8
2.85720948213811501Purple`WedTueBLACK1869228491762178BLACK$$3.14100001Feb010000
taoblahfoopiGreen010111
VOIDchiOrangeSatNILLVOIDBLACK$$$Sat3.14155378825854705118Mar$WHITEAug`Tue
4421929582063064
2.8$$$$BLACKSun$"blah$ThublahJun2057411253659033Orange$$Sun$$fubar'
BLACKSun8061215743158569Jul'010101`2.8MayFri$$'blah
100001$3.141533.14153taoBLACKWHITE3.141532.8'foo"chi`BLACK$$$3300209361826966
5976364681345632YellowFri"JanWHITEWedWHITE3652470302503667WHITE
1237496282374608WHITEpiNILLVOID110111WHITEApr'$$$2.83536505910579946111010
54891762211716313.14$$RedWedtaoMonFri110010$$3068508931421361$PurpleNILLWHITE9242959892278294Sep
000110BlueOct2582940799974379
phifoo$
Purple3.1415Green '
3.14BLACKTuepiYellowWHITEchi35798399298233973.14153.1415WHITEpitao$SunBlue010110
NULLBLACKTue1650665049652872`2.8$'$$$NULL3.14SatGreen$$3.141533.14153GreenVOIDJul"
chichifubarWedpiBLACK3.14153BLACKpiWHITEThu$ BLACK
blah2.8fubar4411479881441554$$`BLACKWHITE1101113.14SepWHITEJanThuGreen
$$WHITE'"3675572769992033fooBlueNULL100000'
BLACK 3.14WHITEDecfubarOrangeMay NILLWHITE2570850288634750 101011$$$Mon
Tue" 3.143.14phiSat7665425103246257MayphiTue'0010110101112.8BLACK$fubar"
0358649831711525100010'FriJunThu"3.14SunGreenfubarMonWHITEVOID$$$VOID1877369637528056Jan$010010
GreenTue000111ThuBLACKApr011010
Jun6244216458497289`PurpleAug$$$2685357800265115''2.8taopi101100$$chiFeb
9471418620899225VOID8617331495319240NULLWHITEblah5461478451014026
6352741666667105
WHITEfooOct011010pi$$$110100BLACKBLACKTuePurpleWHITE9093492271343727SepNovchi
Orange3.144596443153024361`"'$$78253311502390510101103.14153Friphi $Mon
1385825179552755YellowBLACK001011Sep$$RedFebfubarMon010010000010fubar"Jul0110117544560082562350
3.141653642540032022chi'Orange
1253542283769081tao4876457038962098MonSunMayWHITEYellow3.14153$Orange000101blah
RedSatNILLphiVOIDWedfubarGreen chi$$piphiJul$$$111001`9540185369262601NILLVOID
7006440921851679Wed3.14152.8chiGreenThu$$Tuefoofooblahpi$$$taopi$ May 'Feb
MayNILLblah8007182476768737JantaophiThutao$'Jul AprNILLBLACK'3.14153Feb3.1415
57067714600406493.141537231229468300261Mon$`SunNILL `NULL3.14153foochi1000109494160741986074
6577869219715310JulJanBLACKfubarBLACK2.8phiGreen0091496849086433
SunBlue2355648762601053 3.1415NULL$$$BLACK100011 ThuDecJun2.83.1415phiFeb"
9173525733960126BLACK 3.14153`110001PurpleRedFebfubarVOIDfoo$$$blah9330024102534139
Jun$$VOIDVOID4099554992034342Julpi9976331355660412taoWHITEGreen$$100010NILLVOID
3.14153phiSatphi43658305924319679197159994746838phipiApr
3.1415RedblahMayfooJul100011NovtaoMon3.141533.14JanGreen$$ OctNILLfooWHITE3.1415
96027197435535111011013.14VOID3583462878046156NULL3.1415blahOrangefoo 100101taofoo3.14153"3.1415
$$Red3.14Marblah'
3797758515388131tao $$$101010NULL2268984774582096BlueBlue3.14153Oct`
74321533961822933.14153994759453326425$$Jul001111PurpleGreenTueNovJan2742714540787707Blue$$$
0010003.14blah3.14ThuWHITE$$$$blah
3997313793176662 3.141463510697622121Yellow 3.1415'Jul`3.14153NILL2.8Thuphi
3134920264311067fooNov`NULL1111119335359393623483Tue$$$GreenVOIDtaoRedTueAug$$3.141532.8Sat'
3.14153Oct100010FebJan$$3.1415pi$$'chiRed$$$NILL8614261680268364
fubarBLACKpi110001110101pichi0126011887834143GreenNILLYellow NILLfoo101000 $$$
RedTueNULLThu2.814091424413091162.8 WHITE$WHITE60620358244865230211111773156587'pi
Yellow3.1415$$$$$
"Aug3.1415VOIDBLACK0810996065354809$$$NULLfoo$$Orange6850772642048628WedBLACK
BLACKBluepi 70173555329860651869981769139132phi$$$$$$3.14Feb2.86083883638401362
6420029454020029WHITE26787889786973463.14 3.14 Mon`92272417810036027222591368318424$$$tao
fooTue"1147142436072964AprPurpleSep
Okay so, at the begining of q5data we see 01111102. and right after this we see : 82575525572371251 (17 digits)
Id like it to output the last 16 digits ( 2575525572371251 )
Thank you :)
To match the end of the pattern use \b
grep -o '[0-9]\{16\}\b' q5data.txt
so this will match 16 digits up to a word boundary.
If you want to capture digits in strings terminated with non-numerical chars you need negative lookahead (with -P option, not available in standard grep)
$ grep -Po '[0-9]{16}(?![0-9])'
e.g.
$ echo "12345678901234567890aaa" | grep -Po '[0-9]{16}(?![0-9])'
5678901234567890
If you want the last 16 digits from every run of 16 or more digits, then you could filter through grep twice:
grep -Eo '[0-9]{16,}' <q5data.txt | grep -Eo '.{16}$'
The first selects all runs of 16 or more digits, and the second selects the last 16 characters from each run.
Testing this on the first line of your input file gives:
$ grep -Eo '[0-9]{16,}' <<<'0111102.82575525572371251FriThuSat32169716436971243.1415 foo100001$$$3.14153' | grep -Eo '.{16}$'
2575525572371251
2169716436971243
grep -Eo '([0-9]{16})$' q5data.txt

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

Resources