grep -w with only space as delimiter - unix

grep -w uses punctuations and whitespaces as delimiters.
How can I set grep to only use whitespaces as a delimiter for a word?

If you want to match just spaces: grep -w foo is the same as grep " foo ". If you also want to match line endings or tabs you can start doing things like: grep '\(^\| \)foo\($\| \)', but you're probably better off with perl -ne 'print if /\sfoo\s/'

You cannot change the way grep -w works. However, you can replace punctuations with, say, X character using tr or sed and then use grep -w, that will do the trick.

The --word-regexp flag is useful, but limited. The grep man page says:
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
If you want to use custom field separators, awk may be a better fit for you. Or you could just write an extended regular expression with egrep or grep --extended-regexp that gives you more control over your search pattern.

Use tr to replace spaces with new lines. Then grep your string. The contiguous string I needed was being split up with grep -w because it has colons in it. Furthermore, I only knew the first part, and the second part was the unknown data I needed to pull. Therefore, the following helped me.
echo "$your_content" | tr ' ' '\n' | grep 'string'

Related

How to get pipe (|) delimiters between FIX tags in a UNIX command for FIX logs?

I am able to get spaces between tags by running:
tail -f filename | tr '\001' ' '
but I would like the tail output to have | delimiters, i.e.
35=D|49=sender|56=recipient
anyone know how? thanks
Don't you simply want this?
tail -f filename | tr '\001' '|'
^
replace space with pipe!
\001 is ASCII character 1, also known as SOH ("start of heading"). FIX uses this character as the field separator, i.e. it follows every "tag=value" element.
The unix tr command simply replaces all instances of the first parameter (\001 above) with the second parameter (|).

how to grep nth string

How to use "grep" shell command to show specific word from a line starting with a specific word.
Ex:
I want to print a string "myFTPpath/folderName/" from the line starting with searchStr in the below mentioned line.
searchStr:somestring:myFTPpath/folderName/:somestring
Something like this with awk:
awk -F: '/^searchStr/{print $3}' File
From all the lines starting with searchStr, print the 3rd field (field seperator set as :)
Sample:
AMD$ cat File
someStr:somestring:myFTPpath/folderName/:somestring
someStr:somestring:myFTPpath/folderName/:somestring
searchStr:somestring:myFTPpath/folderName/:somestring
someStr:somestring:myFTPpath/folderName/:somestring
AMD$ awk -F: '/^searchStr/{print $3}' File
myFTPpath/folderName/
Remember that grep isn't the only tool that can usefully do searches.
In this particular case, where the lines are naturally broken into fields, awk is probably the best solution, as #A.M.D's answer suggests.
For more general case edits, however, remember sed's -n option, which suppresses printing out a line after edits:
sed -n 's/searchStr:[^:]*:\([^:]*\):.*/\1/p' input-file
The -n suppresses automatic printing of the line, and the trailing /p flag explicitly prints out lines on which there is a substitution.
This matching pattern is fiddly – use awk in this fielded case – but don't forget sed -n.
You could get the desired output with grep itself but you need to enable -P and -o parameters.
$ echo 'searchStr:somestring:myFTPpath/folderName/:somestring' | grep -oP '^searchStr:[^:]*:\K[^:]*'
myFTPpath/folderName/
\K discards the characters which are matched previously from printing at the final leaving only the characters which are matched by the pattern exists next to \K. Here we used \K instead of a variable length positive lookbehind assertion.

grep -wF "example1#domain1.org" filename.txt and special characters

Inside filename.txt I've:
example1#domain1.org example1#domain1.org
example2#domain1.org example2#domain1.org
example3#domain1.org newexample3#domain1.org
example4#domain1.org oldexample4#domain1.org
example5#domain1.org otherexample5#domain1.org
I need search inside it the exact match, the result of:
grep -wF "example1#domain1.org" filename.txt
it's correct.
My problem is that grep show me correct result also if I do:
grep -wF "example1" filename.txt
maybe, is the # (special characters) problem.
-w, --word-regexp
-F, --fixed-strings
Word characters in grep consist of letters, digits, and underscores. So # will end a word.
http://www.gnu.org/software/grep/manual/html_node/Matching-Control.html#Matching-Control

Check if file contains some text (not regex) in Unix

I want to check if a multiline text matches an input. grep comes close, but I couldn't find a way to make it interpret pattern as plain text, not regex.
How can I do this, using only Unix utilities?
Use grep -F:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by
POSIX.)
EDIT: Initially I didn't understand the question well enough. If the pattern itself contains newlines, use -z option:
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.
I've tested it, multiline patterns worked.
From man grep
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
If the input string you are trying to match does not contain a blank line (eg, it does not have two consecutive newlines), you can do:
awk 'index( $0, "needle\nwith no consecutive newlines" ) { m=1 }
END{ exit !m }' RS= input-file && echo matched
If you need to find a string with consecutive newlines, set RS to some string that is not in the file. (Note that the results of awk are unspecified if you set RS to more than one character, but most awk will allow it to be a string.) If you are willing to make the sought string a regex, and if your awk supports setting RS to more than one character, you could do:
awk 'END{ exit NR == 1 }' RS='sought regex' input-file && echo matched

grep for special characters in Unix

I have a log file (application.log) which might contain the following string of normal & special characters on multiple lines:
*^%Q&$*&^#$&*!^#$*&^&^*&^&
I want to search for the line number(s) which contains this special character string.
grep '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The above command doesn't return any results.
What would be the correct syntax to get the line numbers?
Tell grep to treat your input as fixed string using -F option.
grep -F '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
Option -n is required to get the line number,
grep -Fn '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The one that worked for me is:
grep -e '->'
The -e means that the next argument is the pattern, and won't be interpreted as an argument.
From: http://www.linuxquestions.org/questions/programming-9/how-to-grep-for-string-769460/
A related note
To grep for carriage return, namely the \r character, or 0x0d, we can do this:
grep -F $'\r' application.log
Alternatively, use printf, or echo, for POSIX compatibility
grep -F "$(printf '\r')" application.log
And we can use hexdump, or less to see the result:
$ printf "a\rb" | grep -F $'\r' | hexdump -c
0000000 a \r b \n
Regarding the use of $'\r' and other supported characters, see Bash Manual > ANSI-C Quoting:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard
grep -n "\*\^\%\Q\&\$\&\^\#\$\&\!\^\#\$\&\^\&\^\&\^\&" test.log
1:*^%Q&$&^#$&!^#$&^&^&^&
8:*^%Q&$&^#$&!^#$&^&^&^&
14:*^%Q&$&^#$&!^#$&^&^&^&
You could try removing any alphanumeric characters and space. And then use -n will give you the line number. Try following:
grep -vn "^[a-zA-Z0-9 ]*$" application.log
Try vi with the -b option, this will show special end of line characters
(I typically use it to see windows line endings in a txt file on a unix OS)
But if you want a scripted solution obviously vi wont work so you can try the -f or -e options with grep and pipe the result into sed or awk.
From grep man page:
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified
by POSIX.)

Resources