Show unique filename only on command grep result [duplicate] - unix

This question already has answers here:
How can I use grep to show just filenames on Linux? [closed]
(3 answers)
Closed 4 years ago.
I'm trying to show only unique filenames when I grep a certain string. Currently I'm getting multiple results with the same filename if a certain string appear several times inside a file.
for example:
If I have a string "string12345" and it appears 3 times in several lines inside filename1.txt and appear 3 times in several lines inside filename2.txt as well when I use *grep 'string12345' .txt it shows 3 occurrences of filename1.txt and filename2.txt
Now what I'm trying to achieve is to show only 1 occurrence of filename1.txt and filename2.txt. Thank you in advance.

use the -l flag.
test.txt:
Hello, World!
Hello, World!
Grep search:
$ grep ./ -re "Hello"
./test.txt:Hello, World!
./test.txt:Hello, World!
$ grep ./ -re "Hello" -l
./test.txt
From the manual:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.

Related

Unix substitute multiple strings using a reference file [duplicate]

This question already has answers here:
Match exact word with awk on Mac OS X
(4 answers)
Closed 11 months ago.
I have a reference file and using that I want to replace multiple files in a directory. I am using AWK GSUB for that, however it is not replacing exact word, but replacing all occurrences. How can I stop that behaviour? How can I replace just the word? in this case the word is "IT"
My reference file
$ cat dev_to_prod.config
nonprod_DATA_PATH PROD_DATA_PATH
nonprod_ENCRYPTKEY PROD_ENCRYPTKEY
IT Business
My current data file
$ cat filefile.txt
IT
WITH
/IT/DFGh/erfe
/WITH/IT/sjfgh/hjIT/dfdsf/ITvjkl
Output with current code
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub(i,A[i])}1' dev_to_prod.config file.txt
Business
WBusinessH
/Business/DFGh/erfe
/WBusinessH/Business/sjfgh/hjBusiness/dfdsf/Businessvjkl
man awk says:
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
Then would you please try:
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub("\\<"i"\\>",A[i])}1' dev_to_prod.config file.txt
Output:
Business
WITH
/Business/DFGh/erfe
/WITH/Business/sjfgh/hjIT/dfdsf/ITvjkl

Unix - sed get value from a line after a first colon [duplicate]

This question already has answers here:
Extract one word after a specific word on the same line [duplicate]
(4 answers)
Closed 6 years ago.
I have a file (newline.txt) that contains the following line
Footer - Count: 00034300, Facility: TRACE, File Created: 20160506155539
I am trying to get the value after Count: up to the comma (in the example 00034300) from this line.
I tried this but I get is all the numbers concatenated into one large string with that command:
grep -i "Count:" newfile.txt | sed 's/[^0-9]//g'
output:0003430020160506155539
how do I get just the digits after Count: up to to the first non-digit character?
I just need 00034300.
Using sed
$ sed '/[Cc]ount/ s/[^:]*: *//; s/,.*//' newline.txt
00034300
How it works:
/[Cc]ount/ selects lines containing Count or count. This eliminates the need for grep.
s/[^:]*: *// removes everything up to the first colon including any spaces after the colon.
In what remains, s/,.*// removes everything after the first comma.
Using awk
$ awk -F'[[:blank:],]' '/[Cc]ount/ {print $4}' newline.txt
00034300
How it works:
-F'[[:blank:],]' tells awk to treat spaces, tabs, and commas as field separators.
/[Cc]ount/ selects lines that contain Count or count.
print $4 prints the fourth field on the selected lines.
Using grep
$ grep -oiP '(?<=Count: )[[:digit:]]+' newline.txt
00034300
This looks for any numbers following Count: and prints them.

AWK to check a string pattern and extract it from a file [duplicate]

This question already has answers here:
How to print matched regex pattern using awk?
(9 answers)
How to print regexp matches using awk? [duplicate]
(3 answers)
Closed 7 years ago.
Below are the file contents:
{30001002|XXparameter|XSD_LOC|$\{FILES_DIR\}/xsd/EDXFB_mbr_demo.xsd|3|2|$|#{0|}}
{30001002|XXparameter|source_files|$XSD/EDXFB_mbr_demo.xsd|3|1|l|#{0|}}
I trying to accomplish below using awk:
Firstly I want to search for string Pattern "EDXFB*.xsd".
If exists, then extract the strings that starts with "EDXFB" and ends with ".xsd"
Output:
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd
The basic awk pattern to extract the expression and print out matched data is following:
gawk 'match($0, /EDXFB.+\.xsd/, a) { print a[0] }'
Though, you should really spend some time reading awk manual.
And the regular expression could be changed to /EDXFB[a-z_]+\.xsd/ if it contains only lower-cased characters and _.
[EDIT]: Updated with cleaner code from #JID. Thanks :)
Here is one way to do it:
awk -F/ '/EDXFB.*\.xsd/ {split($NF,a,"|");print a[1]}' file
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd
It separate the line by / then print last field until |
In your example, probably grep would do what you want:
grep -o 'EDXFB.*\.xsd'

passing variable to this particular sed command [duplicate]

This question already has answers here:
bash sed fail in while loop
(2 answers)
Closed 8 years ago.
I am first searching for a key word and once that key word is found in a file from that particular line i am supposed delete till end of file.
#! /bin/csh -f
set sa = `grep -n -m 1 "^Pattern" file`
set s = `echo "$sa" | cut -d':' -f1`
set m = `sed '$s,$d' file | tee see > /dev/null`
so first line gives me the matching line with line number, second line i am getting the line number and third line i am trying to delete from line $s say 20 till last but it is not working. I have tried all combinations but it does not take the variable $s. Please help.
But you can do it much more easier with a single line of sed:
sed -n '/SEARCHPATTERN/q;p
-n tells to not print the lines
/SEARCHPATTERN/q exits on search pattern
;p otherwise print the lines
You need to take $s out of the quotes so it will be expanded.
set m = `sed $s',$d' file | tee see > /dev/null`

Using grep to search DNA sequence files

I am trying to using Unix's grep to search for specific sequences within files. The files are usually very large (~1Gb) of 'A's, 'T's, 'C's, and 'G's. These files also span many, many lines with each line being a word of 60ish characters. The problem I am having is that when I search for a specific sequence within these files grep will return results for the pattern that occur on a single line, but not if the pattern spans a line (has a line break somewhere in the middle). For example:
Using
$ grep -i -n "GACGGCT" grep3.txt
To search the file grep3.txt (I put the target 'GACGGCT's in double stars)
GGGCTTCGA**GACGGCT**GACGGCTGCCGTGGAGTCT
CCAGACCTGGCCCTCCCTGGCAGGAGGAGCCTG**GA
CGGCT**AGGTGAGAGCCAGCTCCAAGGCCTCTGGGC
CACCAGGCCAGCTCAGGCCACCCCTTCCCCAGTCA
CCCCCCAAGAGGTGCCCCAGACAGAGCAGGGGCCA
GGCGCCCTGAGGC**GACGGCT**CTCAGCCTCCGCCCC
Returns
3:GGGCTTCGAGACGGCTGACGGCTGCCGTGGAGTCT
8:GGCGCCCTGAGGCGACGGCTCTCAGCCTCCGCCCC
So, my problem here is that grep does not find the GACGGCT that spans the end of line 2 and the beginning of line 3.
How can I use grep to find target sequences that may or may not include a linebreak at any point in the string? Or how can I tell grep to ignore linebreaks in the target string? Is there a simple way to do this?
pcregrep -nM "G[\n]?A[\n]?C[\n]?G[\n]?G[\n]?C[\n]?T" grep3.txt
1:GGGCTTCGAGACGGCTGACGGCTGCCGTGGAGTCT
2:CCAGACCTGGCCCTCCCTGGCAGGAGGAGCCTGGA
CGGCTAGGTGAGAGCCAGCTCCAAGGCCTCTGGGC
6:GGCGCCCTGAGGCGACGGCTCTCAGCCTCCGCCCC
I assume that your each line is 60 char long. Then the below cmd should work
tr '\n' ' ' < grep3.txt | sed -e 's/ //g' -e 's/.\{60\}/&^/g' | tr '^' '\n' | grep -i -n "GACGGCT"
output :
1:GGGCTTCGA**GACGGCT**GACGGCTGCCGTGGAGTCTCCAGACCTGGCCCTCCCTGGC
2:AGGAGGAGCCTG**GACGGCT**AGGTGAGAGCCAGCTCCAAGGCCTCTGGGCCACCAGG
4:CCAGGCGCCCTGAGGC**GACGGCT**CTCAGCCTCCGCCCC

Resources