This question already has answers here:
How can I use grep to show just filenames on Linux? [closed]
(3 answers)
Closed 4 years ago.
I'm trying to show only unique filenames when I grep a certain string. Currently I'm getting multiple results with the same filename if a certain string appear several times inside a file.
for example:
If I have a string "string12345" and it appears 3 times in several lines inside filename1.txt and appear 3 times in several lines inside filename2.txt as well when I use *grep 'string12345' .txt it shows 3 occurrences of filename1.txt and filename2.txt
Now what I'm trying to achieve is to show only 1 occurrence of filename1.txt and filename2.txt. Thank you in advance.
use the -l flag.
test.txt:
Hello, World!
Hello, World!
Grep search:
$ grep ./ -re "Hello"
./test.txt:Hello, World!
./test.txt:Hello, World!
$ grep ./ -re "Hello" -l
./test.txt
From the manual:
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
Related
This question already has answers here:
Match exact word with awk on Mac OS X
(4 answers)
Closed 11 months ago.
I have a reference file and using that I want to replace multiple files in a directory. I am using AWK GSUB for that, however it is not replacing exact word, but replacing all occurrences. How can I stop that behaviour? How can I replace just the word? in this case the word is "IT"
My reference file
$ cat dev_to_prod.config
nonprod_DATA_PATH PROD_DATA_PATH
nonprod_ENCRYPTKEY PROD_ENCRYPTKEY
IT Business
My current data file
$ cat filefile.txt
IT
WITH
/IT/DFGh/erfe
/WITH/IT/sjfgh/hjIT/dfdsf/ITvjkl
Output with current code
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub(i,A[i])}1' dev_to_prod.config file.txt
Business
WBusinessH
/Business/DFGh/erfe
/WBusinessH/Business/sjfgh/hjBusiness/dfdsf/Businessvjkl
man awk says:
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
Then would you please try:
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub("\\<"i"\\>",A[i])}1' dev_to_prod.config file.txt
Output:
Business
WITH
/Business/DFGh/erfe
/WITH/Business/sjfgh/hjIT/dfdsf/ITvjkl
This question already has answers here:
Extract one word after a specific word on the same line [duplicate]
(4 answers)
Closed 6 years ago.
I have a file (newline.txt) that contains the following line
Footer - Count: 00034300, Facility: TRACE, File Created: 20160506155539
I am trying to get the value after Count: up to the comma (in the example 00034300) from this line.
I tried this but I get is all the numbers concatenated into one large string with that command:
grep -i "Count:" newfile.txt | sed 's/[^0-9]//g'
output:0003430020160506155539
how do I get just the digits after Count: up to to the first non-digit character?
I just need 00034300.
Using sed
$ sed '/[Cc]ount/ s/[^:]*: *//; s/,.*//' newline.txt
00034300
How it works:
/[Cc]ount/ selects lines containing Count or count. This eliminates the need for grep.
s/[^:]*: *// removes everything up to the first colon including any spaces after the colon.
In what remains, s/,.*// removes everything after the first comma.
Using awk
$ awk -F'[[:blank:],]' '/[Cc]ount/ {print $4}' newline.txt
00034300
How it works:
-F'[[:blank:],]' tells awk to treat spaces, tabs, and commas as field separators.
/[Cc]ount/ selects lines that contain Count or count.
print $4 prints the fourth field on the selected lines.
Using grep
$ grep -oiP '(?<=Count: )[[:digit:]]+' newline.txt
00034300
This looks for any numbers following Count: and prints them.
This question already has answers here:
How to print matched regex pattern using awk?
(9 answers)
How to print regexp matches using awk? [duplicate]
(3 answers)
Closed 7 years ago.
Below are the file contents:
{30001002|XXparameter|XSD_LOC|$\{FILES_DIR\}/xsd/EDXFB_mbr_demo.xsd|3|2|$|#{0|}}
{30001002|XXparameter|source_files|$XSD/EDXFB_mbr_demo.xsd|3|1|l|#{0|}}
I trying to accomplish below using awk:
Firstly I want to search for string Pattern "EDXFB*.xsd".
If exists, then extract the strings that starts with "EDXFB" and ends with ".xsd"
Output:
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd
The basic awk pattern to extract the expression and print out matched data is following:
gawk 'match($0, /EDXFB.+\.xsd/, a) { print a[0] }'
Though, you should really spend some time reading awk manual.
And the regular expression could be changed to /EDXFB[a-z_]+\.xsd/ if it contains only lower-cased characters and _.
[EDIT]: Updated with cleaner code from #JID. Thanks :)
Here is one way to do it:
awk -F/ '/EDXFB.*\.xsd/ {split($NF,a,"|");print a[1]}' file
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd
It separate the line by / then print last field until |
In your example, probably grep would do what you want:
grep -o 'EDXFB.*\.xsd'
This question already has answers here:
bash sed fail in while loop
(2 answers)
Closed 8 years ago.
I am first searching for a key word and once that key word is found in a file from that particular line i am supposed delete till end of file.
#! /bin/csh -f
set sa = `grep -n -m 1 "^Pattern" file`
set s = `echo "$sa" | cut -d':' -f1`
set m = `sed '$s,$d' file | tee see > /dev/null`
so first line gives me the matching line with line number, second line i am getting the line number and third line i am trying to delete from line $s say 20 till last but it is not working. I have tried all combinations but it does not take the variable $s. Please help.
But you can do it much more easier with a single line of sed:
sed -n '/SEARCHPATTERN/q;p
-n tells to not print the lines
/SEARCHPATTERN/q exits on search pattern
;p otherwise print the lines
You need to take $s out of the quotes so it will be expanded.
set m = `sed $s',$d' file | tee see > /dev/null`
I am trying to using Unix's grep to search for specific sequences within files. The files are usually very large (~1Gb) of 'A's, 'T's, 'C's, and 'G's. These files also span many, many lines with each line being a word of 60ish characters. The problem I am having is that when I search for a specific sequence within these files grep will return results for the pattern that occur on a single line, but not if the pattern spans a line (has a line break somewhere in the middle). For example:
Using
$ grep -i -n "GACGGCT" grep3.txt
To search the file grep3.txt (I put the target 'GACGGCT's in double stars)
GGGCTTCGA**GACGGCT**GACGGCTGCCGTGGAGTCT
CCAGACCTGGCCCTCCCTGGCAGGAGGAGCCTG**GA
CGGCT**AGGTGAGAGCCAGCTCCAAGGCCTCTGGGC
CACCAGGCCAGCTCAGGCCACCCCTTCCCCAGTCA
CCCCCCAAGAGGTGCCCCAGACAGAGCAGGGGCCA
GGCGCCCTGAGGC**GACGGCT**CTCAGCCTCCGCCCC
Returns
3:GGGCTTCGAGACGGCTGACGGCTGCCGTGGAGTCT
8:GGCGCCCTGAGGCGACGGCTCTCAGCCTCCGCCCC
So, my problem here is that grep does not find the GACGGCT that spans the end of line 2 and the beginning of line 3.
How can I use grep to find target sequences that may or may not include a linebreak at any point in the string? Or how can I tell grep to ignore linebreaks in the target string? Is there a simple way to do this?
pcregrep -nM "G[\n]?A[\n]?C[\n]?G[\n]?G[\n]?C[\n]?T" grep3.txt
1:GGGCTTCGAGACGGCTGACGGCTGCCGTGGAGTCT
2:CCAGACCTGGCCCTCCCTGGCAGGAGGAGCCTGGA
CGGCTAGGTGAGAGCCAGCTCCAAGGCCTCTGGGC
6:GGCGCCCTGAGGCGACGGCTCTCAGCCTCCGCCCC
I assume that your each line is 60 char long. Then the below cmd should work
tr '\n' ' ' < grep3.txt | sed -e 's/ //g' -e 's/.\{60\}/&^/g' | tr '^' '\n' | grep -i -n "GACGGCT"
output :
1:GGGCTTCGA**GACGGCT**GACGGCTGCCGTGGAGTCTCCAGACCTGGCCCTCCCTGGC
2:AGGAGGAGCCTG**GACGGCT**AGGTGAGAGCCAGCTCCAAGGCCTCTGGGCCACCAGG
4:CCAGGCGCCCTGAGGC**GACGGCT**CTCAGCCTCCGCCCC