What is a grep command performs? - unix

Im trying to understand this unix command but im not quite an expert on this, could someone explain it more in detail?
grep '^.\{167\}02'
What does it perform?

Found line(s) which starts (^) from any (.) 167 symbols which has been followed by 02.

From the man page (man grep)
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
Check the part in bold: if you don't specify the files you want to search in, it will just wait and listen to your keyboard input and do a regex match for each new line that you type.
If you want to test it, I suggest you using an easier regex, maybe with less characters like this one: ^.\{3\}02 and see what happens:
$ grep '^.\{3\}02'
02
002
0002
00002 <-- this matches and will later be printed and highlighted
00002
You don't normally use grep and type lines yourself to see if matches, but give it files as argument, or another input using the pipe:
ls -la | grep '^.\{167\}02'

Related

Unix Text Processing - how to remove part of a file name from the results?

I'm searching through text files using grep and sed commands and I also want the file names displayed before my results. However, I'm trying to remove part of the file name when it is displayed.
The file names are formatted like this: aja_EPL_1999_03_01.txt
I want to have only the date without the beginning letters and without the .txt extension.
I've been searching for an answer and it seems like it's possible to do that with a sed or a grep command by using something like this to look forward and back and extract between _ and .txt:
(?<=_)\d+(?=\.)
But I must be doing something wrong, because it hasn't worked for me and I possibly have to add something as well, so that it doesn't extract only the first number, but the whole date. Thanks in advance.
Edit: Adding also the working command I've used just in case. I imagine whatever command is needed would have to go at the beginning?
sed '/^$/d' *.txt | grep -P '(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)' *.txt --colour -A 1
The results look like this:
aja_EPL_1999_03_02.txt:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
A desired output would be this:
1999_03_02:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
First off, you might want to think about your regular expression. While the one you have you say works, I wonder if it could be simplified. You told us:
(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)
It looks to me as if this is intended to match lines that start with a case insensitive "PALL", possibly preceded by any number of other characters that start with a capital letter, and that lines must not end in a backslash or a dot. So valid lines might be any of:
PALLILENNUD : korraga üritavad etc etc
Õlu on kena. Do I have appalling speling?
Peeter Pall is a limnologist at EMU!
If you'd care to narrow down this description a little and perhaps provide some examples of lines that should be matched or skipped, we may be able to do better. For instance, your outer parentheses are probably unnecessary.
Now, let's clarify what your pipe isn't doing.
sed '/^$/d' *.txt
This reads all your .txt files as an input stream, deletes any empty lines, and prints the output to stdout.
grep -P 'regex' *.txt --otheroptions
This reads all your .txt files, and prints any lines that match regex. It does not read stdin.
So .. in the command line you're using right now, your sed command is utterly ignored, as sed's output is not being read by grep. You COULD instruct grep to read from both files and stdin:
$ echo "hello" > x.txt
$ echo "world" | grep "o" x.txt -
x.txt:hello
(standard input):world
But that's not what you're doing.
By default, when grep reads from multiple files, it will precede each match with the name of the file from whence that match originated. That's also what you're seeing in my example above -- two inputs, one x.txt and the other - a.k.a. stdin, separated by a colon from the match they supplied.
While grep does include the most minuscule capability for filtering (with -o, or GNU grep's \K with optional Perl compatible RE), it does NOT provide you with any options for formatting the filename. Since you can'd do anything with the output of grep, you're limited to either parsing the output you've got, or using some other tool.
Parsing is easy, if your filenames are predictably structured as they seem to be from the two examples you've provided.
For this, we can ignore that these lines contain a file and data. For the purpose of the filter, they are a stream which follows a pattern. It looks like you want to strip off all characters from the beginning of each line up to and not including the first digit. You can do this by piping through sed:
sed 's/^[^0-9]*//'
Or you can achieve the same effect by using grep's minimal filtering to return every match starting from the first digit:
grep -o '[0-9].*'
If this kind of pipe-fitting is not to your liking, you may want to replace your entire grep with something in awk that combines functionality:
$ awk '
/[\.]$/ {next} # skip lines ending in backslash or dot
/^([A-ZÖÄÜÕŠŽ].*)?PALL/ { # lines to match
f=FILENAME
sub(/^[^0-9]*/,"",f) # strip unwanted part of filename, like sed
printf "%s:%s\n", f, $0
getline # simulate the "-A 1" from grep
printf "%s:%s\n", f, $0
}' *.txt
Note that I haven't tested this, because I don't have your data to work with.
Also, awk doesn't include any of the fancy terminal-dependent colourization that GNU grep provides through the --colour option.

grep matches between two files and convert to lower case

I need a fast and efficient approach to the following problem (I am working with many files.) But for example:
I have two files: file2
Hello
Goodbye
Salut
Bonjour
and file1
Hello, is it Me you're looking for?
I would like to find any word in file 2 that exists in file 2, and then convert that word to lower case.
I can grep the words in a file by doing:
grep -f file2.txt file1.txt
and returns
Hello
So now I want to convert to
hello
so that the final output is
hello, is it Me you're looking for?
Where if I match multiple files:
grep -f file2.txt *_infile.txt
The output will be stored in respective separate outfiles.
I know I can convert to lower case using something like tr, but I only know how to do this on every instance of an uppercase letter. I only want to convert words common between two files from uppercase to lowercase.
Thanks.
I would solve the problem a bit differently.
First, I would mark matches in grep. --color=always works well, although it's somewhat cumbersome and potentially unreliable in detection. Then I would change marked matches with sed or perl:
grep --color=always -F -f file2.txt file1.txt | \
perl -p -e 's/\x1b.*?\[K(.*?)\x1b.*?\[K/\L\1/g'
The cryptic RE matches the coloring escape sequence before the match, de-coloring escape sequence right after the match and captures everything in between into group 1. Then it applies lowercase \L conversion to the capture. Likely GNU sed can do the same, but probably perl is more portable.

Delete line containing a specific string starting with dollar sign using unix sed

I am very new to Unix.
I have a parameter file Parameter.prm containing following lines.
$$ErrorTable1=ErrorTable1
$$Filename1_New=FileNew.txt
$$Filename1_Old=FileOld.txt
$$ErrorTable2=ErrorTable2
$$Filename2_New=FileNew.txt
$$Filename2_Old=FileOld.txt
$$ErrorTable3=ErrorTable3
$$Filename3_New=FileNew.txt
$$Filename3_Old=FileOld.txt
I want get the output as
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
Basically, I need to delete line starting with $$Filename.
Since $ is a keyword, I am not able to interpret it as a string. How can I accomplish this using sed?
With sed:
$ sed '/$$Filename/d' infile
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
The /$$Filename/ part is the address, i.e., for all lines matching this, the command following it will be executed. The command is d, which deletes the line. Lines that don't match are just printed as is.
Extracting information from a textfile based on pattern search is a job for grep:
grep ErrorTable file
or even
grep -F '$$ErrorTable' file
-F tells grep to treat the search term as a fixed string instead of a regular expression.
Just to answer your question, if a regular expression needs to search for characters which have a special meaning in the regex language, you need to escape them:
grep '\$\$ErrorTable' file

Learning GREP, command does not work as I've been reading

So I've been given an assignment and the question is:
What command would you enter to see 5-letter words that begin with 'd' (upper or lower-case), followed by a lower-case vowel, and ending in 's'?
grep '^[Dd][aeiouy]..[s]' /usr/share/dict/words
^[Dd] Means that the first letter is D or d. Perfect.
[aeiouy] Means that the next letter will be one of those. Perfect.
Two dots means that the next two characters can be anything that they want. Perfect.
And s because it ends in an s. Perfect.
But when I hit enter, I'm getting things like debasements and debases. Not only are my parameters for grep being ignored, but it is reaching for too many words already, and I can't figure out what I've done wrong.
You need to anchor the end. Like this:
grep '^[Dd][aeiouy]..[s]$' /usr/share/dict/words
Otherwise you're matching all words that start with '[Dd][aeiouy]..s' which is why you get things like "dumpster"
I believe ^ and $ are string terminators, so unless the line contains ONLY the word you're looking for, you won't find it. It only works on the dictionary file but not in general files, if you try. You should use \b on both sides as they're word boundaries.
\b[Dd][aeiouy]..[s]\b
But, grep will not return you only these words. It will return you the whole line that matches the expression, for example:
~$ grep "\b[Dd][aeiouy]..[s]\b" test
aacd danis daniel danis Dunns daniedanilsanielfk
In this case, just use the parameter -o, to print only matching words, one each line.
~$ grep -o "\b[Dd][aeiouy]..[s]\b" test
danis
danis
Dunns

Another grep advanced

Q1. I want to grep something like that:
grep -Ir --exclude-dir="some*dirs" "my-text" ~/somewhere
but I don't want to show the whole strings containing "my-text", I want to see only list of files.
Q2. I want to see list of files containing "my-text" but not containing "another-text". How to do that?
Sorry, but I could not find the answer in man grep, neither in google.
Q1. You mustn't have googled very hard on that one.
man grep
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
Q2. Unless you expect both patterns to be on the same line, you'll need multiple invocations of grep. Something like:
$ grep -l my-text | xargs grep -vl another-text

Resources