Divide the result of two grep and word count

Divide the result of two grep and word count - zsh

I have a log file and I would like to divide the result of one grep and count by another grep and count.
$ echo $((cat log2.txt | grep timed\|error\|Error | wc -l)/(cat log2.txt | grep Duration | wc -l))
zsh: bad math expression: operator expected at `log2.txt |...'
It's ugly, doesn't work and I can probably do it in a better way but I don't know how.
Also I would like to know if it possible to id incrementaly on a log stream read by tail for example.

First of all, you should know that, both grep|wc -l will count number of matched lines instead of occurrences, I hope this is what you really want.
Regarding your requirement, indeed, your approach is ugly (7 processes), apart from the mistakes. The job can be done by a single awk line:
awk '/timed|[Ee]rror/{a++}/Duration/{b++}END{printf "%.2f\n",a/b}' log2.txt
The above line calculates the result based on matched number of lines, same as your grep|wc -l.

You have several problems:
You are trying to run shell commands directly inside an arithmetic expression.
You aren't passing the correct regular expression to grep.
You need to make sure at least one of the operands is a floating-point value to trigger zsh's floating-point division.
Each pipeline can also be reduced to a single command; use input redirection instead of cat, and use the -c option to get the number of lines that match the regular expression.
echo $(( 1.0 * $(grep -c 'timed\|error\|Error' log2.txt) / $(grep -c Duration log2.txt))
Basic regular expressions treat unescaped | as a literal character, not an alteration operator.
$ echo foo | grep foo\|bar
$ echo foo | grep foo\\\|bar # Pass a literal backslash as part of the regex
foo
$ echo foo | grep 'foo\|bar' # Use '...' instead of explicitly escaping \ and |
foo
$ echo foo | grep -E 'foo|bar' # Use extended regular expressions instead

Related

How do I use grep to find words of specified or unspecified length?

In the Unix command line (CentOS7) I have to use the grep command to find all words with:
At least n characters
At most n characters
Exactly n characters
I have searched the posts here for answers and came up with grep -E '^.{8}' /sample/dir but this only gets me the words with at least 8 characters.
Using the $ at the end returns nothing. For example:
grep -E '^.{8}$' /sample/dir
I would also like to trim the info in /sample/dir so that I only see the specific information. I tried using a pipe:
cut -f1,7 -d: | grep -E '^.{8}' /sample/dir
Depending on the order, this only gets me one or the other, not both.
I only want the usernames at the beginning of each line, not all words in each line for the entire file.
For example, if I want to find userids on my system, these should be the results:
1.
tano-ahsoka
skywalker-a
kenobi-obiwan
ahsoka-t
luke-s
leia-s
ahsoka-t
kenobi-o
grievous
I'm looking for two responses here as I have already figured out number 1.
Numbers 2 and 3 are not working for some reason.
If possible, I'd also like to apply the cut for all three outputs.
Any and all help is appreciated, thank you!

You can run one grep for extracting the words, and another for filtering based on length.
grep -oE '(\w|-)+' file | grep -Ee '^.{8,}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{,8}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{8}$'
Update the pattern based on requirements and maybe use -r and specify a directory instead of a file. Adding -h option may also be needed to prevent the filenames from being printed.

Depending on your implementation of grep, it might work to use:
grep -o -E '\<\w{8}\>' # exactly 8
grep -o -E '\<\w{8,}\>' # 8 or more
grep -o -E '\<\w{,8}\>' # 8 or less

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"
export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"#class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'
I tried like below it gives the first occurence alone
[ggangadharan#phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
but i want output like below.
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Can somebody help me how to use awk for this scenario?
thanks in advance

I'm not sure how to hack this in awk, but you can safely use egrep here:
$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Using "significant splitting" in AWK:
$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"
which gives
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.
This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:
$ awk -v RS='"' '/regexp/' <<< "$str"

Here are two approaches using a JSON-aware command-line tool, here jq.
In both cases we assume that the string of interest is embedded in the
JSON object contained in $str
(1) In the following, we simply pretty-print the JSON object and grep for
the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:
$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"
(2) The following query is appropriate if we are only interested in a
match if it occurs in an object as a value corresponding to the key "localDir":
sed 's/^[^{]*//' <<< "$str" |
jq -r '..
| select(.localDir?)
| .localDir
| select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

grep not giving expected result

I am having trouble understanding following grep operation
a=jQuery.Uno
echo $a | grep -i "jquerya*"
why is above query returning jQuery.Uno?

The * quantifier matches 0 (zero) or more.
In the string, jQuery.Uno there is 0 a after y. As such, the regex jquerya* matches the string.
If you wanted one or more of a, then instead say:
grep -i "jquerya\{1,\}"
or, if your version of grep supports extended regular expressions:
grep -iE "jquerya+"
Moreover, instead of echo "$var" | grep ..., it is better to make use of herestrings if your shell supports those:
grep -iE "jquerya+" <<< "$a"

How to grep for the whole word

I am using the following command to grep stuff in subdirs
find . | xargs grep -s 's:text'
However, this also finds stuff like <s:textfield name="sdfsf"...../>
What can I do to avoid that so it just finds stuff like <s:text name="sdfsdf"/>
OR for that matter....also finds <s:text somethingElse="lkjkj" name="lkkj"
basically s:text and name should be on same line....

You want the -w option to specify that it's the end of a word.
find . | xargs grep -sw 's:text'

Use \b to match on "word boundaries", which will make your search match on whole words only.
So your grep would look something like
grep -r "\bSTRING\b"
adding color and line numbers might help too
grep --color -rn "\bSTRING\b"
From http://www.regular-expressions.info/wordboundaries.html:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.

You can drop the xargs command by making grep search recursively. And you normally don't need the 's' flag. Hence:
grep -wr 's:text'

you could try rg, https://github.com/BurntSushi/ripgrep :
rg -w 's:text' .
should do it

Use -w option for whole word match. Sample given below:
[binita#ubuntu ~]# a="abcd efg"
[binita#ubuntu ~]# echo $a
abcd efg
[binita#ubuntu ~]# echo $a | grep ab
abcd efg
[binita#ubuntu ~]# echo $a | grep -w ab
[binita#ubuntu ~]# echo $a | grep -w abcd
abcd efg

This is another way of setting the boundaries of the word, note that it doesn't work without the quotes around it:
grep -r '\<s:text\>' .

If you just want to filter out the remainder text part, you can do this.
xargs grep -s 's:text '
This should find only s:text instances with a space after the last t. If you need to find s:text instances that only have a name element, either pipe your results to another grep expression, or use regex to filter only the elements you need.

Generate a random filename in unix shell

I would like to generate a random filename in unix shell (say tcshell). The filename should consist of random 32 hex letters, e.g.:
c7fdfc8f409c548a10a0a89a791417c5
(to which I will add whatever is neccesary). The point is being able to do it only in shell without resorting to a program.

Assuming you are on a linux, the following should work:
cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32
This is only pseudo-random if your system runs low on entropy, but is (on linux) guaranteed to terminate. If you require genuinely random data, cat /dev/random instead of /dev/urandom. This change will make your code block until enough entropy is available to produce truly random output, so it might slow down your code. For most uses, the output of /dev/urandom is sufficiently random.
If you on OS X or another BSD, you need to modify it to the following:
cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9' | head -c 32

why do not use unix mktemp command:
$ TMPFILE=`mktemp tmp.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` && echo $TMPFILE
tmp.MnxEsPDsNUjrzDIiPhnWZKmlAXAO8983

One command, no pipe, no loop:
hexdump -n 16 -v -e '/1 "%02X"' -e '/16 "\n"' /dev/urandom
If you don't need the newline, for example when you're using it in a variable:
hexdump -n 16 -v -e '/1 "%02X"' /dev/urandom
Using "16" generates 32 hex digits.

uuidgen generates exactly this, except you have to remove hyphens. So I found this to be the most elegant (at least to me) way of achieving this. It should work on linux and OS X out of the box.
uuidgen | tr -d '-'

As you probably noticed from each of the answers, you generally have to "resort to a program".
However, without using any external executables, in Bash and ksh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); done; echo $string
in zsh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); dummy=$RANDOM; done; echo $string
Change the lower case x in the format string to an upper case X to make the alphabetic hex characters upper case.
Here's another way to do it in Bash but without an explicit loop:
printf -v string '%X' $(printf '%.2s ' $((RANDOM%16))' '{00..31})
In the following, "first" and "second" printf refers to the order in which they're executed rather than the order in which they appear in the line.
This technique uses brace expansion to produce a list of 32 random numbers mod 16 each followed by a space and one of the numbers in the range in braces followed by another space (e.g. 11 00). For each element of that list, the first printf strips off all but the first two characters using its format string (%.2) leaving either single digits followed by a space each or two digits. The space in the format string ensures that there is then at least one space between each output number.
The command substitution containing the first printf is not quoted so that word splitting is performed and each number goes to the second printf as a separate argument. There, the numbers are converted to hex by the %X format string and they are appended to each other without spaces (since there aren't any in the format string) and the result is stored in the variable named string.
When printf receives more arguments than its format string accounts for, the format is applied to each argument in turn until they are all consumed. If there are fewer arguments, the unmatched format string (portion) is ignored, but that doesn't apply in this case.
I tested it in Bash 3.2, 4.4 and 5.0-alpha. But it doesn't work in zsh (5.2) or ksh (93u+) because RANDOM only gets evaluated once in the brace expansion in those shells.
Note that because of using the mod operator on a value that ranges from 0 to 32767 the distribution of digits using the snippets could be skewed (not to mention the fact that the numbers are pseudo random in the first place). However, since we're using mod 16 and 32768 is divisible by 16, that won't be a problem here.
In any case, the correct way to do this is using mktemp as in Oleg Razgulyaev's answer.

Tested in zsh, should work with any BASH compatible shell!
#!/bin/zsh
SUM=`md5sum <<EOF
$RANDOM
EOF`
FN=`echo $SUM | awk '// { print $1 }'`
echo "Your new filename: $FN"
Example:
$ zsh ranhash.sh
Your new filename: 2485938240bf200c26bb356bbbb0fa32
$ zsh ranhash.sh
Your new filename: ad25cb21bea35eba879bf3fc12581cc9

Yet another way[tm].
R=$(echo $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM | md5 | cut -c -8)
FILENAME="abcdef-$R"

This answer is very similar to fmarks, so I cannot really take credit for it, but I found the cat and tr command combinations quite slow, and I found this version quite a bit faster. You need hexdump.
hexdump -e '/1 "%02x"' -n32 < /dev/urandom

Another thing you can add is running the date command as follows:
date +%S%N
Reads nonosecond time and the result adds a lot of randomness.

The first answer is good but why fork cat if not required.
tr -dc 'a-f0-9' < /dev/urandom | head -c32

Grab 16 bytes from /dev/random, convert them to hex, take the first line, remove the address, remove the spaces.
head /dev/random -c16 | od -tx1 -w16 | head -n1 | cut -d' ' -f2- | tr -d ' '
Assuming that "without resorting to a program" means "using only programs that are readily available", of course.

If you have openssl in your system you can use it for generating random hex (also it can be -base64) strings with defined length. I found it pretty simple and usable in cron in one line jobs.
openssl rand -hex 32
8c5a7515837d7f0b19e7e6fa4c448400e70ffec88ecd811a3dce3272947cb452

Hope to add a (maybe) better solution to this topic.
Notice: this only works with bash4 and some implement of mktemp(for example, the GNU one)
Try this
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//}
This one is twice as faster as head /dev/urandom | tr -cd 'a-f0-9' | head -c 32, and eight times as faster as cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32.
Benchmark:
With mktemp:
#!/bin/bash
# a.sh
for (( i = 0; i < 1000; i++ ))
do
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//} > /dev/null
done
time ./a.sh
./a.sh 0.36s user 1.97s system 99% cpu 2.333 total
And the other:
#!/bin/bash
# b.sh
for (( i = 0; i < 1000; i++ ))
do
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 32 > /dev/null
done
time ./b.sh
./b.sh 0.52s user 20.61s system 113% cpu 18.653 total

If you are on Linux, then Python will come pre-installed. So you can go for something similar to the below:
python -c "import uuid; print str(uuid.uuid1())"
If you don't like the dashes, then use replace function as shown below
python -c "import uuid; print str(uuid.uuid1()).replace('-','')"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Divide the result of two grep and word count - zsh

Related

How do I use grep to find words of specified or unspecified length?

unix awk repeated pattern

grep not giving expected result

How to grep for the whole word

Generate a random filename in unix shell

Categories

Resources