grepping lines from a document using xargs - unix

Let's say I have queries.txt.
queries.txt:
cat
dog
123
now I want to use them are queries to find lines in myDocument.txt using grep.
cat queries.txt | xargs grep -f myDocument.txt
myDocument has lines like
cat
i have a dog
123
mouse
it should return the first 3 lines. but it's not. instead, grep tries to find them as file names. what am i doing wrong?

Here, you just need:
grep -f queries.txt myDocument.txt
This causes grep to read the regular expressions from the file queries.txt and then apply them to myDocument.txt.
In the xargs version, you were effectively writing:
grep -f myDocument.txt cat dog 123
If you absolutely must use xargs, then you'll need to write:
xargs -I % grep -e % myDocument.txt < queries.txt
This avoids a UUOC — Useless Use of cat – award by redirecting standard input from queries.txt. It uses the -I % option to specify where the replacement text should go in the command line. Using the -e option means that if the pattern is, say --help, you won't run into problems with (GNU) grep treating that as an argument (and therefore printing its help message).

The grep -e option will take a pattern string as an argument. -f treats the argument as a file name of a file with patterns in it.

Related

How do I use grep to find words of specified or unspecified length?

In the Unix command line (CentOS7) I have to use the grep command to find all words with:
At least n characters
At most n characters
Exactly n characters
I have searched the posts here for answers and came up with grep -E '^.{8}' /sample/dir but this only gets me the words with at least 8 characters.
Using the $ at the end returns nothing. For example:
grep -E '^.{8}$' /sample/dir
I would also like to trim the info in /sample/dir so that I only see the specific information. I tried using a pipe:
cut -f1,7 -d: | grep -E '^.{8}' /sample/dir
Depending on the order, this only gets me one or the other, not both.
I only want the usernames at the beginning of each line, not all words in each line for the entire file.
For example, if I want to find userids on my system, these should be the results:
1.
tano-ahsoka
skywalker-a
kenobi-obiwan
ahsoka-t
luke-s
leia-s
ahsoka-t
kenobi-o
grievous
I'm looking for two responses here as I have already figured out number 1.
Numbers 2 and 3 are not working for some reason.
If possible, I'd also like to apply the cut for all three outputs.
Any and all help is appreciated, thank you!
You can run one grep for extracting the words, and another for filtering based on length.
grep -oE '(\w|-)+' file | grep -Ee '^.{8,}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{,8}$'
grep -oE '(\w|-)+' file | grep -Ee '^.{8}$'
Update the pattern based on requirements and maybe use -r and specify a directory instead of a file. Adding -h option may also be needed to prevent the filenames from being printed.
Depending on your implementation of grep, it might work to use:
grep -o -E '\<\w{8}\>' # exactly 8
grep -o -E '\<\w{8,}\>' # 8 or more
grep -o -E '\<\w{,8}\>' # 8 or less

unix combine grep w and v command

I want to search a file and include the text #!/bin/bash, but exclude any other line that has a # sign. These two commands: grep -w '#!/bin/bash' file and grep -v '^#' file each do one part of this job. I would like this to be a single command, so here's what I've tried.
grep -w '#!/bin/bash' | grep -v '^#' file
This excludes lines beginning with #, but doesn't include the line #!/bin/bash
grep -w '#!/bin/bash' -v '^#' file
This just prints every line but #!/bin/bash
grep "^[^#]\|^#\!/bin/bash$" test.sh
Explanation:
^[^#] means starts by something different that #
\| is a or
^#\!/bin/bash$ is the exact line #!/bin/bash
So .. it looks as if you're trying to strip comments from bash files without removing their shebang.
The grep command can search for regular expressions, but isn't so good at applying rules of logic. You could do something like this:
grep -v '^#[^!]' input.sh
But you'd fail to strip comments that are affixed to the ends of lines. Note that I'm being a little more liberal with this regex, since it's entirely possible that a script might use something other than /bin/bash for its shebang. :-)
Another possibility would be to use awk. This lets you apply logic that cannot be expressed within a regular expression. For example, if you want to keep the commented line only if it is a shebang on the first line of the file, and remove all other comments, awk can express that as follows:
awk '
NF==1 && /^#!/; # if we're on the first line and find shebang, print.
/^#/ { next } # if this is a comment line, skip it.
1 # print everything else.
' input.sh

ls and xargs to output specific file extentions

I am trying to use ls and xargs to print specific file extensions .bam and .vcf witout the path. The below is close but when I | the two ls commands I get the error below. Separated it works fine except each file is printed on a newline (my actual data has hundreds of files and make it easier to read). Thank you :).
files in directory
1.bam
1.vcf
2.bam
2.vcf
command with error
ls /home/cmccabe/Desktop/NGS/test/R_folder/*.bam | xargs -n1 basename | ls /home/cmccabe/Desktop/NGS/test/R_folder/*.vcf | xargs -n1 basename >> /home/cmccabe/Desktop/NGS/test/log
xargs: basename: terminated by signal 13
desired output
1.bam 1.vcf
2.bam 2.vcf
You cannot pipe output into ls and have it print that with its other output. You should give the parameters to the first one and it will output everything.
ls *.a *.b *.c | xargs ...q
ls isn't really doing anything for you currently, it's the shell that's listing all your files. Since you're piping ls's output around, you're actually vulnerable to dangerous file names.
basename can take multiple arguments with the -a option:
basename -a "path/to/files/"*.{bam,vcf}
To print that in two columns, you could use printf via xargs, with sort for... sorting. The -z or -0 flags throughout cause null bytes to be used as the filename separators:
basename -az "path/to/files/"*.{bam,vcf} | sort -z | xargs -0n 2 printf "%b\t%b\n"
If you're going to be doing any more processing after printing to columns, you may want to replace the %bs in the printf format with %qs. That will escape non-printable characters in the output, but might look a bit ugly to human eyes.

Unix Pipes for Command Argument [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Read expression for grep from standard input
(1 answer)
Closed last month.
I am looking for insight as to how pipes can be used to pass standard output as the arguments for other commands.
For example, consider this case:
ls | grep Hello
The structure of grep follows the pattern: grep SearchTerm PathOfFileToBeSearched. In the case I have illustrated, the word Hello is taken as the SearchTerm and the result of ls is used as the file to be searched. But what if I want to switch it around? What if I want the standard output of ls to be the SearchTerm, with the argument following grep being PathOfFileToBeSearched? In a general sense, I want to have control over which argument the pipe fills with the standard output of the previous command. Is this possible, or does it depend on how the script for the command (e.g., grep) was written?
Thank you so much for your help!
grep itself will be built such that if you've not specified a file name, it will open stdin (and thus get the output of ls). There's no real generic mechanism here - merely convention.
If you want the output of ls to be the search term, you can do this via the shell. Make use of a subshell and substitution thus:
$ grep $(ls) filename.txt
In this scenario ls is run in a subshell, and its stdout is captured and inserted in the command line as an argument for grep. Note that if the ls output contains spaces, this will cause confusion for grep.
There are basically two options for this: shell command substitution and xargs. Brian Agnew has just written about the former. xargs is a utility which takes its stdin and turns it into arguments of a command to execute. So you could run
ls | xargs -n1 -J % grep -- % PathOfFileToBeSearched
and it would, for each file output by ls, run grep -e filename PathOfFileToBeSearched to grep for the filename output by ls within the other file you specify. This is an unusual xargs invocation; usually it's used to add one or more arguments at the end of a command, while here it should add exactly one argument in a specific place, so I've used -n and -J arguments to arrange that. The more common usage would be something like
ls | xargs grep -- term
to search all of the files output by ls for term. Although of course if you just want files in the current directory, you can this more simply without a pipeline:
grep -- term *
and likewise in your reversed arrangement,
for filename in *; do
grep -- "$#" PathOfFileToBeSearched
done
There's one important xargs caveat: whitespace characters in the filenames generated by ls won't be handled too well. To do that, provided you have GNU utilities, you can use find instead.
find . -mindepth 1 -maxdepth 1 -print0 | xargs -0 -n1 -J % grep -- % PathOfFileToBeSearched
to use NUL characters to separate filenames instead of whitespace

How do I perform a recursive directory search for strings within files in a UNIX TRU64 environment?

Unfortunately, due to the limitations of our Unix Tru64 environment, I am unable to use the GREP -r switch to perform my search for strings within files across multiple directories and sub directories.
Ideally, I would like to pass two parameters. The first will be the directory I want my search is to start on. The second is a file containing a list of all the strings to be searched. This list will consist of various directory path names and will include special characters:
ie:
/aaa/bbb/ccc
/eee/dddd/ggggggg/
etc..
The purpose of this exercise is to identify all shell scripts that may have specific hard coded path names identified in my list.
There was one example I found during my investigations that perhaps comes close, but I am not sure how to customize this to accept a file of string arguments:
eg: find etb -exec grep test {} \;
where 'etb' is the directory and 'test', a hard coded string to be searched.
This should do it:
find dir -type f -exec grep -F -f strings.txt {} \;
dir is the directory from which searching will commence
strings.txt is the file of strings to match, one per line
-F means treat search strings as literal rather than regular expressions
-f strings.txt means use the strings in strings.txt for matching
You can add -l to the grep switches if you just want filenames that match.
Footnote:
Some people prefer a solution involving xargs, e.g.
find dir -type f -print0 | xargs -0 grep -F -f strings.txt
which is perhaps a little more robust/efficient in some cases.
By reading, I assume we can not use the gnu coreutil, and egrep is not available.
I assume (for some reason) the system is broken, and escapes do not work as expected.
Under normal situations, grep -rf patternfile.txt /some/dir/ is the way to go.
a file containing a list of all the strings to be searched
Assumptions : gnu coreutil not available. grep -r does not work. handling of special character is broken.
Now, you have working awk ? no ?. It makes life so much easier. But lets be on the safe side.
Assume : working sed ,one of od OR hexdump OR xxd (from vim package) is available.
Lets call this patternfile.txt
1. Convert list into a regexp that grep likes
Example patternfile.txt contains
/foo/
/bar/doe/
/root/
(example does not print special char, but it's there.) we must turn it into something like
(/foo/|/bar/doe/|/root/)
Assuming echo -en command is not broken, and xxd , or od, or hexdump is available,
Using hexdump
cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n'
Using od
cat patternfile.txt |od -A none -t x1|tr -d '\n'
and pipe it into (common for both hexdump and od)
|sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'
then pipe result into
|sed 's:^:\\(:g' |sed 's:$:\\):g'
and you have a regexp pattern that is escaped.
2. Feed the escaped pattern into broken regexp
Assuming the bare minimum shell escape is available,
we use grep "$(echo -en "ESCAPED_PATTERN" )" to do our job.
3. To sum it up
Building a escaped regexp pattern (using hexdump as example )
grep "$(echo -en "$( cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n' |sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'|sed 's:^:\\(:g' |sed 's:$:\\):g')")"
will escape all characters and enclose it with (|) brackets so a regexp OR match will be performed.
4. Recrusive directory lookup
Under normal situations, even when grep -r is broken, find /dir/ -exec grep {} \; should work.
Some may prefer xargs instaed (unless you happen to have buggy xargs).
We prefer find /somedir/ -type f -print0 |xargs -0 grep -f 'patternfile.txt' approach, but since
this is not available (for whatever valid reason),
we need to exec grep for each file,and this is normaly the wrong way.
But lets do it.
Assume : find -type f works.
Assume : xargs is broken OR not available.
First, if you have a buggy pipe, it might not handle large number of files.
So we avoid xargs in such systems (i know, i know, just lets pretend it is broken ).
find /whatever/dir/to/start/looking/ -type f > list-of-all-file-to-search-for.txt
IF your shell handles large size lists nicely,
for file in cat list-of-all-file-to-search-for.txt ; do grep REGEXP_PATTERN "$file" ;
done ; is a nice way to get by. Unfortunetly, some systems do not like that,
and in that case, you may require
cat list-of-all-file-to-search-for.txt | split --help -a 4 -d -l 2000 file-smaller-chunk.part.
to turn it into smaller chunks. Now this is for a seriously broken system.
then a for file in file-smaller-chunk.part.* ; do for single_line in cat "$file" ; do grep REGEXP_PATTERN "$single_line" ; done ; done ;
should work.
A
cat filelist.txt |while read file ; do grep REGEXP_PATTERN $file ; done ;
may be used as workaround on some systems.
What if my shell doe not handle quotes ?
You may have to escape the file list beforehand.
It can be done much nicer in awk, perl, whatever, but since we restrict our selves to
sed, lets do it.
We assume 0x27, the ' code will actually work.
cat list-of-all-file-to-search-for.txt |sed 's#['\'']#'\''\\'\'\''#g'|sed 's:^:'\'':g'|sed 's:$:'\'':g'
The only time I had to use this was when feeding output into bash again.
What if my shell does not handle that ?
xargs fails , grep -r fails , shell's for loop fails.
Do we have other things ? YES.
Escape all input suitable for your shell, and make a script.
But you know what, I got board, and writing automated scripts for csh just seems
wrong. So I am going to stop here.
Take home note
Use the tool for the right job. Writing a interpreter on bc is perfectly
capable, but it is just plain wrong. Install coreutils, perl, a better grep
what ever. makes life a better thing.

Resources