egrep issue- need to search different patterns? - unix

Need a grep one liner [ without pipe ] which would check multiple expressions in a single command
cat FILE | egrep OH|OI is not working

What you're looking for is a simple:
egrep 'OH|OI' FILE
The command you have (without the quotes):
cat FILE | egrep OH|OI
will attempt to cat the file through egrep looking for OH then pipe the results of that through an executable OI (which probably doesn't exist).
The quotes will fix that for you so that OH|HI is a single argument to the egrep rather than something the shell processes.

Just try:
cat FILE | egrep 'OH|OI'

Eliminate the pipe by supplying the filename.
egrep 'OH|OI' filename

Related

Unix command to parse string

I'm trying to figure out a command to parse the following file content:
Operation=GET
Type=HOME
Counters=CacheHit=0,Exception=1,Validated=0
I need to extract Exception=1 into its own line. I'm fiddling with awk, sed and grep but not making much progress. Does anyone have any tips on using any unix command to perform this?
Thanks
Since your file is close to bash syntax, there is a fun little trick you can do to make bash itself parse the file. First, use some program like tr to transform the input into a something bash can parse, and then "source" that, which will create shell variables you can expand later to get the values.
source <(tr , $'\n' < file_name_goes_here)
echo $Exception
Many ways to do this. Here is one assuming the file is called "file.txt". Grab the line you want, replace everything from the start of the line up to Except with just Except, then pull out the first field using comma as the delimiter.
$ grep Exception file.txt | sed 's/.*Except/Except/g' | cut -d, -f 1
Exception=1
If you wanted to use gawk:
$ grep Exception file.txt | sed 's/.*Except/Except/g' | gawk -F, '{print $1}'
Exception=1
or just using grep and sed:
$ grep Exception file.txt | sed 's/.*\(Exception=[0-9]*\).*/\1/g'
Exception=1
or as #sheltter reminded me:
$ egrep -o "Exception=[0-9]+" file.txt
Exception=1
No need to use a mix of commands.
awk -F, 'NR==2 {print RS$1}' RS="Exception" file
Exception=1
Here we split the line by the keyword we look for RS="Exception"
If the line has two record (only when keyword is found), then
print first field, separated using command, with Record selector.
PS This only works if you have one Exception field

ls and xargs to output specific file extentions

I am trying to use ls and xargs to print specific file extensions .bam and .vcf witout the path. The below is close but when I | the two ls commands I get the error below. Separated it works fine except each file is printed on a newline (my actual data has hundreds of files and make it easier to read). Thank you :).
files in directory
1.bam
1.vcf
2.bam
2.vcf
command with error
ls /home/cmccabe/Desktop/NGS/test/R_folder/*.bam | xargs -n1 basename | ls /home/cmccabe/Desktop/NGS/test/R_folder/*.vcf | xargs -n1 basename >> /home/cmccabe/Desktop/NGS/test/log
xargs: basename: terminated by signal 13
desired output
1.bam 1.vcf
2.bam 2.vcf
You cannot pipe output into ls and have it print that with its other output. You should give the parameters to the first one and it will output everything.
ls *.a *.b *.c | xargs ...q
ls isn't really doing anything for you currently, it's the shell that's listing all your files. Since you're piping ls's output around, you're actually vulnerable to dangerous file names.
basename can take multiple arguments with the -a option:
basename -a "path/to/files/"*.{bam,vcf}
To print that in two columns, you could use printf via xargs, with sort for... sorting. The -z or -0 flags throughout cause null bytes to be used as the filename separators:
basename -az "path/to/files/"*.{bam,vcf} | sort -z | xargs -0n 2 printf "%b\t%b\n"
If you're going to be doing any more processing after printing to columns, you may want to replace the %bs in the printf format with %qs. That will escape non-printable characters in the output, but might look a bit ugly to human eyes.

Unix - Using ls with grep

How can I use ls (or other commands) and grep together to search from specific files for a certain word inside that file?
Example I have a file - 201503003_315_file.txt and I have other files in my dir.
I only want to search files that have a file name that contains _315_ and inside that file, search for the word "SAMPLE".
Hope this is clear and thanks in advance for any help.
You can do:
ls * _315_* | xargs grep "SAMPLE"
The first part: ls * _315_* will list only files that have 315 as part of the file name, this list of files is piped to grep which will scan each one of them and look for "SAMPLE"
UPDATE
A bit easier (and actually safer) approach was mentioned by David in the comments bellow:
grep "SAMPLE" *_315_*.txt
The reason why it's safer is that ls doesn't handle well special characters.
Another option, as mentioned by Charles Duffy in the comments below:
printf '%s\0' *_315_* | xargs -0 grep
Change to that directory (using cd dir) and try:
grep SAMPLE *_315_*
If you really MUST use ls AND grep try this:
ls *_315_* | xargs grep SAMPLE
The first example, however, requires less typing...

grepping lines from a document using xargs

Let's say I have queries.txt.
queries.txt:
cat
dog
123
now I want to use them are queries to find lines in myDocument.txt using grep.
cat queries.txt | xargs grep -f myDocument.txt
myDocument has lines like
cat
i have a dog
123
mouse
it should return the first 3 lines. but it's not. instead, grep tries to find them as file names. what am i doing wrong?
Here, you just need:
grep -f queries.txt myDocument.txt
This causes grep to read the regular expressions from the file queries.txt and then apply them to myDocument.txt.
In the xargs version, you were effectively writing:
grep -f myDocument.txt cat dog 123
If you absolutely must use xargs, then you'll need to write:
xargs -I % grep -e % myDocument.txt < queries.txt
This avoids a UUOC — Useless Use of cat – award by redirecting standard input from queries.txt. It uses the -I % option to specify where the replacement text should go in the command line. Using the -e option means that if the pattern is, say --help, you won't run into problems with (GNU) grep treating that as an argument (and therefore printing its help message).
The grep -e option will take a pattern string as an argument. -f treats the argument as a file name of a file with patterns in it.

Splitting unix output

I'm trying to extract an address from a file.
grep keyword /path/to/file
is how I'm finding the line of code I want. The output is something like
var=http://address
Is there a way I can get only the part directly after the = i.e. http://address , considering the keyword I'm greping for is both in the var and http://address parts
grep keyword /path/to/file | cut -d= -f2-
Just pipe to cut:
grep keyword /path/to/file | cut -d '=' -f 2
You can avoid the needless pipes:
awk -F= '/keyword/{print $2}' /path/to/file

Resources