grep not giving expected result - unix

I am having trouble understanding following grep operation
a=jQuery.Uno
echo $a | grep -i "jquerya*"
why is above query returning jQuery.Uno?

The * quantifier matches 0 (zero) or more.
In the string, jQuery.Uno there is 0 a after y. As such, the regex jquerya* matches the string.
If you wanted one or more of a, then instead say:
grep -i "jquerya\{1,\}"
or, if your version of grep supports extended regular expressions:
grep -iE "jquerya+"
Moreover, instead of echo "$var" | grep ..., it is better to make use of herestrings if your shell supports those:
grep -iE "jquerya+" <<< "$a"

Related

How to grep the output from first grep of the date?

I would like to search from the first output of grep. First grep I search all the files from yesterday that has the word "abc". I would like to also find if it has "def" from first search.
This is what I have
find /tttt/aaaa/bbb -type f -mtime -1 -mtime 0 -print0| xargs -0 grep -l "abc"
I would like to find the output that also contains "def".
It sounds like you are wanting to search for multiple strings in a list. This can be accomplished with the -E switch on grep. This allows you to provide search strings separated with a pipe.
grep -E 'abc|def'
Another option is to use a backslash on the pipe:
grep 'abc\|def'
EDIT: Oops. I thought you wanted an OR, but you want an AND.
You just need to add another pipe to grep at the end of your one-liner:
| xargs grep -l 'def'

Divide the result of two grep and word count

I have a log file and I would like to divide the result of one grep and count by another grep and count.
$ echo $((cat log2.txt | grep timed\|error\|Error | wc -l)/(cat log2.txt | grep Duration | wc -l))
zsh: bad math expression: operator expected at `log2.txt |...'
It's ugly, doesn't work and I can probably do it in a better way but I don't know how.
Also I would like to know if it possible to id incrementaly on a log stream read by tail for example.
First of all, you should know that, both grep|wc -l will count number of matched lines instead of occurrences, I hope this is what you really want.
Regarding your requirement, indeed, your approach is ugly (7 processes), apart from the mistakes. The job can be done by a single awk line:
awk '/timed|[Ee]rror/{a++}/Duration/{b++}END{printf "%.2f\n",a/b}' log2.txt
The above line calculates the result based on matched number of lines, same as your grep|wc -l.
You have several problems:
You are trying to run shell commands directly inside an arithmetic expression.
You aren't passing the correct regular expression to grep.
You need to make sure at least one of the operands is a floating-point value to trigger zsh's floating-point division.
Each pipeline can also be reduced to a single command; use input redirection instead of cat, and use the -c option to get the number of lines that match the regular expression.
echo $(( 1.0 * $(grep -c 'timed\|error\|Error' log2.txt) / $(grep -c Duration log2.txt))
Basic regular expressions treat unescaped | as a literal character, not an alteration operator.
$ echo foo | grep foo\|bar
$ echo foo | grep foo\\\|bar # Pass a literal backslash as part of the regex
foo
$ echo foo | grep 'foo\|bar' # Use '...' instead of explicitly escaping \ and |
foo
$ echo foo | grep -E 'foo|bar' # Use extended regular expressions instead

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"
export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"#class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'
I tried like below it gives the first occurence alone
[ggangadharan#phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
but i want output like below.
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Can somebody help me how to use awk for this scenario?
thanks in advance
I'm not sure how to hack this in awk, but you can safely use egrep here:
$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Using "significant splitting" in AWK:
$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"
which gives
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.
This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:
$ awk -v RS='"' '/regexp/' <<< "$str"
Here are two approaches using a JSON-aware command-line tool, here jq.
In both cases we assume that the string of interest is embedded in the
JSON object contained in $str
(1) In the following, we simply pretty-print the JSON object and grep for
the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:
$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"
(2) The following query is appropriate if we are only interested in a
match if it occurs in an object as a value corresponding to the key "localDir":
sed 's/^[^{]*//' <<< "$str" |
jq -r '..
| select(.localDir?)
| .localDir
| select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

How to grep for the whole word

I am using the following command to grep stuff in subdirs
find . | xargs grep -s 's:text'
However, this also finds stuff like <s:textfield name="sdfsf"...../>
What can I do to avoid that so it just finds stuff like <s:text name="sdfsdf"/>
OR for that matter....also finds <s:text somethingElse="lkjkj" name="lkkj"
basically s:text and name should be on same line....
You want the -w option to specify that it's the end of a word.
find . | xargs grep -sw 's:text'
Use \b to match on "word boundaries", which will make your search match on whole words only.
So your grep would look something like
grep -r "\bSTRING\b"
adding color and line numbers might help too
grep --color -rn "\bSTRING\b"
From http://www.regular-expressions.info/wordboundaries.html:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
You can drop the xargs command by making grep search recursively. And you normally don't need the 's' flag. Hence:
grep -wr 's:text'
you could try rg, https://github.com/BurntSushi/ripgrep :
rg -w 's:text' .
should do it
Use -w option for whole word match. Sample given below:
[binita#ubuntu ~]# a="abcd efg"
[binita#ubuntu ~]# echo $a
abcd efg
[binita#ubuntu ~]# echo $a | grep ab
abcd efg
[binita#ubuntu ~]# echo $a | grep -w ab
[binita#ubuntu ~]# echo $a | grep -w abcd
abcd efg
This is another way of setting the boundaries of the word, note that it doesn't work without the quotes around it:
grep -r '\<s:text\>' .
If you just want to filter out the remainder text part, you can do this.
xargs grep -s 's:text '
This should find only s:text instances with a space after the last t. If you need to find s:text instances that only have a name element, either pipe your results to another grep expression, or use regex to filter only the elements you need.

output of one command is argument of another

Is there any way to fit in 1 line using the pipes the following:
output of
sha1sum $(xpi) | grep -Eow '^[^ ]+'
goes instead of 456
sed 's/#version#/456/' input.txt > output.txt
Um, I think you can nest $(command arg arg) occurances, so if you really need just one line, try
sed "s/#version#/$(sha1sum $(xpi) | grep -Eow '^[^ ]+')/" input.txt \
> output.txt
But I like Trey's solution putting it one two lines; it's less confusing.
This is not possible using pipes. Command nesting works though:
sed 's/#version#/'$(sha1sum $(xpi) | grep -Eow '^[^ ]+')'/' input.txt > output.txt
Also note that if the results of the nested command contain the / character you will need to use a different character as delimiter (#, |, $, and _ are popular ones) or somehow escape the forward slashes in your string. This StackOverflow question also has a solution to the escaping problem. The problem can be solved by piping the command to sed and replacing all forward slashes (for escape characters) and backslashes (to avoid conflicts with using / as the outer sed delimiter).
The following regular expression will escape all \ characters and all / characters in the command:
sha1sum $(xpi) | grep -Eow '^[^ ]+' | sed -e 's/\(\/\|\\\|&\)/\\&/g'
Nesting this as we did above we get this solution which should properly escape slashes where needed:
sed 's/#version#/'$(sha1sum $(xpi) | grep -Eow '^[^ ]+' | sed -e 's/\(\/\|\\\|&\)/\\&/g')'/' input.txt > output.txt
Personally I think that looks like a mess as one line, but it works.

Resources