Grep examples - can't understand - unix

Given the following commands:
ls | grep ^b[^b]*b[^b]
ls | grep ^b[^b]*b[^b]*
I know that ^ marks the start of the line, but can anyone give me a brief explanation about
these commands? what do they do? (Step by step)
thanks!

^ can mean two things:
mark the beginning of a line
or it negates the character set (whithin [])
So, it means:
lines starting with 'b'
matching any (0+) characters Other than 'b'
matching another 'b'
followed by something not-'b' (or nothing at all)
It will match
bb
bzzzzzb
bzzzzzbzzzzzzz
but not
zzzzbb
bzzzzzxzzzzzz

1)starts with b and name continues with a 0 or more characters which are not b and then b and then continues with a character which is not b
2)starts with b and name continues with a 0 or more characters which are not b and then b and then continues with 0 or more characters which are not b

Related

Removing empty lines and duplicate lines from text file

I recently used the awk command to remove duplicate lines, and spaces between lines but I am not getting the desired output file.
Input file:
a b
a b
c d
c d
e f
e f
Desired output:(I wanted to remove duplicate lines and all spaces in between lines)
a b
c d
e f
I used the following code:
awk '!x[$0]++' input file > output file
And got this output:
a b
c d
e f
The space between the first line and all the rest is still in the output file.
Help please and thank you.
awk 'NF && !seen[$0]++' inputfile.txt > outputfile.txt
NF removes white lines or lines containing only tabs or whitespaces.
!seen[$0]++ removes duplicates.
If the original line order of the input is important, then the following will not work for you. If you don't care about the order, then read on.
For me, awk is not the best tool for this problem.
Since you are trying to use awk, I assume you are in a unix-like environment, so:
When I hear "eliminate blank lines" I think "grep".
When I hear "eliminate duplicate lines" I think "uniq" (which requires sort, though not in your example since it is already sorted.)
So, given a file 'in.txt' that duplicates your example, the following produces the desired output.
grep -v "^[[:space:]]*$" in.txt | uniq
Now, if your real data is not sorted, that won't work. Instead use:
grep -v "^[[:space:]]*$" in.txt | sort -u
Your output may be in a different order than the input in this case.
cat test
a b
a b
c d
c d
e f
e f
awk '$0 !~ /^[[:space:]]*$/' test
a b
a b
c d
c d
e f
e f

Regular Expression To exclude sub-string name(job corps) Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"

Regular Expression To exclude sub-string name(job corps)
Includes at least 1 upper case letter, 1 lower case letter, 1 number and 1 symbol except "#"
I have written something like below :
^((?!job corps).)(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!#$%^&*]).*$
I tested with the above regular expression, not working for special character.
can anyone guide on this..
If I understand well your requirements, you can use this pattern:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[^#]*$
If you only want to allow characters from [a-zA-Z0-9^#$%&*] changes the pattern to:
^(?![^a-z]*$|[^A-Z]*$|[^0-9]*$|[^!#$%^&*]*$|.*?job corps)[a-zA-Z0-9^#$%&*]*$
details:
^ # start of the string
(?! # not followed by any of these cases
[^a-z]*$ # non lowercase letters until the end
|
[^A-Z]*$ # non uppercase letters until the end
|
[^0-9]*$
|
[^!#$%^&*]*$
|
.*?job corps # any characters and "job corps"
)
[^#]* # characters that are not a #
$ # end of the string
demo
Note: you can write the range #$%& like #-& to win a character.
stribizhev, your answer is correct
^(?!.job corps)(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[!#$%^&])(?!.#).$
can verify the expression in following url:
http://www.freeformatter.com/regex-tester.html

Extract data before and after matching (BIG FILE )

I have got a big file ( arounf 80K lines )
my main goal is to find the patterns and pring for example 10 lines before and 10 lines after the pattern .
the pattern accures multiple times across the file .
using the grep command :
grep -i <my_pattern>* -B 10 -A 10 <my_file>
i get only some of the data , i think it must be something related to the buffer size ....
i need a command ( grep , sed , awk ) that will handle all the matching
and will print 10 line before and after the pattern ...
Example :
my patterns hides here :
a
b
c
pattern_234
c
b
a
a
b
c
pattern_567
c
b
a
this happens multiple times across the file .
running this command :
grep -i pattern_* -B 3 -A 3 <my_file>
will get he right output :
a
b
c
c
b
a
a
b
c
c
b
it works but not full time
if i have 80 patterns not all the 80 will be shown
awk to the rescue
awk -vn=4 # pass the argument of context line count
'{
for(i=1;i<=n;i++) # store the past n lines in an indexed array
p[i]=p[i+1];
p[n+1]=$0
}
/pattern/ # if pattern matched
{
c=n+1; # set the counter to after match line count
for(i=1;i<=n;i++) # print previously saved entries
print p[i]
}
c-->0' # print the lines after match until counter runs out
will print 4 lines before and 4 lines after the match of pattern, change the value of n as per your need.
if non-symmetric before/after you need two variables
awk -vb=2 -va=3 '{for(i=1;i<=b;i++) p[i]=p[i+1];p[b+1]=$0} /pattern/{c=a+1;for(i=1;i<=b;i++) print p[i]} c-->0'

Get the first letter of a make variable

Is there a better way to get the first character of a GNU make variable than
FIRST=$(shell echo $(VARIABLE) | head -c 1)
(which is not only unwieldy but also calls the external shell)?
This is pretty horrible, but at least it doesn't invoke shell:
$(eval REMAINDER := $$$(VAR)) # variable minus the first char
FIRST := $(subst $(REMAINDER),,$(VAR)) # variable minus that
The GNU Make Standard Library provides a substr function
substr
Arguments: 1: A string
2: Start offset (first character is 1)
3: Ending offset (inclusive)
Returns: Returns a substring
I haven't tested it, but $(call substr,$(VARIABLE),1,1) should work
Since I came across this in my own search and didn't find what I was looking for here is what I ended up using to parse a hex number that could be applied to any known set of characters
letters := 0 1 2 3 4 5 6 7 8 9 a b c d e f
nextletter = $(strip $(foreach v,$(letters),$(word 2,$(filter $(1)$(v)%,$(2)) $v)))
then
INPUT := 40b3
firstletter := $(call nextletter,,$(INPUT))
secondletter := $(call nextletter,$(firstletter),$(INPUT))
thirdletter := $(call nextletter,$(firstletter)$(secondletter),$(INPUT))
etc.
It's ugly but it's shell agnostic

unix grep command

I have a text file named "file1" containing the following data :
apple
appLe
app^e
app\^e
Now the commands given are :
1.)grep app[\^lL]e file1
2.)grep "app[\^lL]e" file1
3.)grep "app[l\^L]e" file1
4.)grep app[l\^L]e file1
output in 1st case : app^e
output in 2nd case :
apple
appLe
app^e
output in 3rd case :
apple
appLe
app^e
output in 4th case :
apple
appLe
app^e
why so..?
Please help..!
1.)grep app[\^lL]e file1
The escape (\) is removed by the shell before grep sees it so this is equivalent to app[^lL]e. The bit in brackets matches anything not (from the ^, since it's the first character) L or l
2.)grep "app[\^lL]e" file1
This time, the \ escapes the ^ so it matches ^ or L or l
3.)grep "app[l\^L]e" file1
^ works to negate the set only if it is the first character, so this matches ^ or L or l
4.)grep app[l\^L]e file1
The ^ is escaped, but since it's not the first it doesn't make any difference, so it matches ^ or L or l
In the first case grep app[\^lL]e file1, you do not quote the pattern on the command line, the shell takes care of its expansion. So the search pattern, effectively, becomes
app[^lL]e
and means: "app", then any symbol but "l" or "L", then "e". The only line that fits is
app^e
In other cases, ^ is either escaped and matched literally, or, in addition, it is in the middle of of the pattern.

Resources