Multiple keyword grep where all keywords must be found - unix

I'm trying to find a better way to determine which files in a given directory contain all of a given set of search strings. The way I'm currently doing it seems awkward.
For example, if I want to find which files contain "aaa", "bbb", "ccc", and "ddd" I would do:
grep -l "aaa" * > out1
grep -l "bbb" `cat out1` > out2
grep -l "ccc" `cat out2` > out3
grep -l "ddd" `cat out3` > out4
cat out4
rm out1 out2 out3 out4
As you can see, this seems clumsy. Any better ideas?
EDIT: I'm on a Solaris 10 machine

You can use xargs to chain the grep calls together:
grep -l "aaa" * | xargs grep -l "bbb" | xargs grep -l "ccc" | xargs grep -l "ddd"

something along this may help:
for file in * ; do
matchall=1
for pattern in aaa bbb ccc ddd ; do
grep "$pattern" "$file" >/dev/null || { matchall=0; break ; }
done
if [ "$matchall" -eq "1" ]; then echo "maching all : $file" ; fi
done
(you can add patterns by replacing aaa bbb ccc ddd with something like $(cat patternfile))
For the ones interrested : it 1) loop over each file, and 2) for each file: it assumes this will match all patterns, and loops over the patterns: as soon as a pattern doesn't appear in the file, that pattern loop is exited, the name of that file is not printed, and it goes to check the next file. ie, it only print a file which has been through all the patterns without any setting "matchall" to 0.

Related

Bash delete all folders except one that might be nested

I am trying to work out a way to delete all folders but keep once, even if it is nested.
./release/test-folder
./release/test-folder2
./release/feature/custom-header
./release/feature/footer
If I run something like:
shopt -s extglob
rm -rf release/!(test-folder2)/
or
find ./release -type d -not -regex ".*test-folder2.*" -delete
get it OK, but in the cases when path is nested like feature/footer
Both command lines matches release/feature and it gets deleted.
Can you suggest any other option that would keep the folder, no matter how nested it is?
This is not best solution but it works.
# $1 = root path
# $2 = pattern
findex(){
# create temp dir
T=$(mktemp -d)
# find all dirs inside "$1"
# and save it in file "a"
find "$1" -type d >$T/a
# filtering file A by pattern "$2"
# and save it in file "b"
cat $T/a | grep "$2" >$T/b
# For each path in the file b
# add paths of the parent directories
# and save it in file "c"
cat $T/b | while read P; do
echo $P
while [[ ${#1} -lt ${#P} ]]; do
P=$(dirname "$P")
echo $P
done
done >$T/c
# make list in file "c" unique
# and save it in file "d"
cat $T/c | sort -u >$T/d;
# find from list "a" all the paths
# that are missing in the list "d"
awk 'NR==FNR{a[$0];next} !($0 in a)' $T/d $T/a
# remove temporary directory
rm -rf $T
}
# find all dirs inside ./path except matching "pattern"
# and remove it
findex ./path "pattern" | xargs -L1 rm
Test it
findex(){
T=$(mktemp -d)
find "$1" -type d >$T/a
cat $T/a | grep "$2" >$T/b
cat $T/b | while read P; do
echo $P
while [[ ${#1} -lt ${#P} ]]; do
P=$(dirname "$P")
echo $P
done
done >$T/c
cat $T/c | sort -u >$T/d;
# save result in file "e"
awk 'NR==FNR{a[$0];next} !($0 in a)' $T/d $T/a >$T/e
# output path of temporary directory
echo $T
}
cd $TMPDIR
for I in {000..999}; do
mkdir -p "./test/${I:0:1}/${I:1:1}/${I:2:1}";
done
T=$(findex ./test "5")
cat $T/a | wc -l # => 1111 dirs total
cat $T/d | wc -l # => 382 dirs matched
cat $T/e | wc -l # => 729 dirs to delete
rm -rf $T ./test

Unix command for list all the files by grouping and sorting by file type and name

There are lots of files in a directory and output to be group and sort like below,first exe files
without any file extension,then sql files ending with "body",then sql files ending with "spec",then
other sql files.then "sh" then "txt" files.
abc
1_spec.sql
1_body.sql
2_body.sql
other.sql
a1.sh
a1.txt
find . -maxdepth 1 -type f ! -name "*.*"
find . -type f -name "*body*.sql"
find . -type f -name "*spec*.sql"
Getting difficult to combine all and sorting group with order.
with ls, grep and sort you could do something like this script I hacked together:
#!/bin/sh
ls | grep -v '\.[a-zA-Z0-9]*$' | sort
ls | grep '_body.sql$' | sort
ls | grep '_spec.sql$' | sort
ls | grep -vE '_body.sql$|_spec.sql$' | grep '.sql$' | sort
ls | grep '.sh$' | sort
ls | grep '.txt$' | sort
normal ls:
$ ls -1
1_body.sql
1_spec.sql
2_body.sql
a1.sh
a1.txt
abc
bar.sql
def
foo.sh
other.sql
script
$
sorting script:
$ ./script
abc
def
script
1_body.sql
2_body.sql
1_spec.sql
bar.sql
other.sql
a1.sh
foo.sh
a1.txt
$

grep for a pattern occuring 2 or 3 times

I am looking for regular expression which finds the occurrence like 696969 in 2345679696969.
I don't want to search 696969 but to simplify it something like 69 occurring 3 times.
Something like this:
grep '[0-9]\{7\}69\{3\}'
but it searches for occurrence of 9 three times.
Could somebody help?
Group 69 with parentheses:
grep -E '(69){3}'
Test
$ echo "2345679696969" | grep -E '(69){3}'
2345679696969
All together:
$ echo "2345679696969" | grep -E '[0-9]{7}(69){3}'
2345679696969
or with a basic grep (thanks Avinash):
grep '[0-9]\{7\}\(69\)\{3\}'

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

Read only part of file/ cut to specific symbol

I have 100 files which all have a similar structure
line1
line2
stuff
RR
important stuff
The problem is that I want to cut when RR appears (which it does in each file). However, this is not always in the same line (it can be line 20, it can be line 35) but it is always there. Hence, is there any way in bash or R (when reading in the file) to that( just cuttign of the header)? I would prefer R.
You can read all rows and remove the unnecessary ones:
dat <- readLines(textConnection(
"line1
line2
stuff
RR
important stuff"))
# dat <- readLines("file.name")
dat[seq(which.max(dat == "RR") + 1, length(dat))]
# [1] "important stuff"
If you have awk available through bash you could do:
awk '(/RR/){p=1; next} (p){print}' < file.txt
$ cat file.txt
line1
line2
stuff
RR
important stuff
$ awk '(/RR/){p=1; next} (p){print}' < file.txt
important stuff
This sets a flag p when the 'RR' string is found, next causing the next line to be read without first evaluating (p){ print }. Subsequent lines will be printed.
Here's a few ways:
Using basic tools:
$ tail -n+$((1 + $(grep -n '^RR$' file.txt | cut -d: -f1))) file.txt
important stuff
$
Using pure bash:
$ { while read ln; do [ "$ln" == RR ] && break; done; cat; } < file.txt
important stuff
$
And another way, assuming you can guarantee no more than 9999 lines in a file:
$ grep -A9999 '^RR$' file.txt | tail -n+2
important stuff
$

Resources