AWK to check a string pattern and extract it from a file [duplicate] - unix

This question already has answers here:
How to print matched regex pattern using awk?
(9 answers)
How to print regexp matches using awk? [duplicate]
(3 answers)
Closed 7 years ago.
Below are the file contents:
{30001002|XXparameter|XSD_LOC|$\{FILES_DIR\}/xsd/EDXFB_mbr_demo.xsd|3|2|$|#{0|}}
{30001002|XXparameter|source_files|$XSD/EDXFB_mbr_demo.xsd|3|1|l|#{0|}}
I trying to accomplish below using awk:
Firstly I want to search for string Pattern "EDXFB*.xsd".
If exists, then extract the strings that starts with "EDXFB" and ends with ".xsd"
Output:
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd

The basic awk pattern to extract the expression and print out matched data is following:
gawk 'match($0, /EDXFB.+\.xsd/, a) { print a[0] }'
Though, you should really spend some time reading awk manual.
And the regular expression could be changed to /EDXFB[a-z_]+\.xsd/ if it contains only lower-cased characters and _.
[EDIT]: Updated with cleaner code from #JID. Thanks :)

Here is one way to do it:
awk -F/ '/EDXFB.*\.xsd/ {split($NF,a,"|");print a[1]}' file
EDXFB_mbr_demo.xsd
EDXFB_mbr_demo.xsd
It separate the line by / then print last field until |

In your example, probably grep would do what you want:
grep -o 'EDXFB.*\.xsd'

Related

Unix substitute multiple strings using a reference file [duplicate]

This question already has answers here:
Match exact word with awk on Mac OS X
(4 answers)
Closed 11 months ago.
I have a reference file and using that I want to replace multiple files in a directory. I am using AWK GSUB for that, however it is not replacing exact word, but replacing all occurrences. How can I stop that behaviour? How can I replace just the word? in this case the word is "IT"
My reference file
$ cat dev_to_prod.config
nonprod_DATA_PATH PROD_DATA_PATH
nonprod_ENCRYPTKEY PROD_ENCRYPTKEY
IT Business
My current data file
$ cat filefile.txt
IT
WITH
/IT/DFGh/erfe
/WITH/IT/sjfgh/hjIT/dfdsf/ITvjkl
Output with current code
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub(i,A[i])}1' dev_to_prod.config file.txt
Business
WBusinessH
/Business/DFGh/erfe
/WBusinessH/Business/sjfgh/hjBusiness/dfdsf/Businessvjkl
man awk says:
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
Then would you please try:
awk 'FNR==NR{A[$1]=$2;next}{for(i in A)gsub("\\<"i"\\>",A[i])}1' dev_to_prod.config file.txt
Output:
Business
WITH
/Business/DFGh/erfe
/WITH/Business/sjfgh/hjIT/dfdsf/ITvjkl

Unix command for replacing strings which have "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 4 years ago.
I know that for replacing a string (in a file which has matchstring string), we can use following command
grep -rl matchstring somedir/ | xargs sed -i 's/string1/string2/g'
How can I use/change the command if my string has special characters like "/"?
For example:
string1: "/home/folder1"
string2: "/home/folder1/folder2"
As #jamieguinan mentioned in his command, almost any delimiter character can be used. So, I changed the command as following: grep -rl matchstring somedir/ | xargs sed -i 's,string1,string2,g' Where string1 and string2 are: /home/folder1 and /home/folder1/folder2, respectively.

How to check for special characters in UNIX? [duplicate]

This question already has answers here:
Regex, every non-alphanumeric character except white space or colon
(11 answers)
Closed 3 years ago.
I have a fixed width file with 10-15 columns. The file contains alphanumeric values. How do I check for any special characters (like !,#,#,$,% etc.) in the entire file in UNIX ?
try this;
grep -vn "^[a-zA-Z0-9]*$" yourFile
or
grep -vn "^[[:alnum:]]*$" yourFile
man grep :
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-n, --line-number
Prefix each line of output with the 1-based line number within its input file. (-n is specified by POSIX.)
[[:alnum:]] means the character class of numbers and letters in the
current locale

Unix - sed get value from a line after a first colon [duplicate]

This question already has answers here:
Extract one word after a specific word on the same line [duplicate]
(4 answers)
Closed 6 years ago.
I have a file (newline.txt) that contains the following line
Footer - Count: 00034300, Facility: TRACE, File Created: 20160506155539
I am trying to get the value after Count: up to the comma (in the example 00034300) from this line.
I tried this but I get is all the numbers concatenated into one large string with that command:
grep -i "Count:" newfile.txt | sed 's/[^0-9]//g'
output:0003430020160506155539
how do I get just the digits after Count: up to to the first non-digit character?
I just need 00034300.
Using sed
$ sed '/[Cc]ount/ s/[^:]*: *//; s/,.*//' newline.txt
00034300
How it works:
/[Cc]ount/ selects lines containing Count or count. This eliminates the need for grep.
s/[^:]*: *// removes everything up to the first colon including any spaces after the colon.
In what remains, s/,.*// removes everything after the first comma.
Using awk
$ awk -F'[[:blank:],]' '/[Cc]ount/ {print $4}' newline.txt
00034300
How it works:
-F'[[:blank:],]' tells awk to treat spaces, tabs, and commas as field separators.
/[Cc]ount/ selects lines that contain Count or count.
print $4 prints the fourth field on the selected lines.
Using grep
$ grep -oiP '(?<=Count: )[[:digit:]]+' newline.txt
00034300
This looks for any numbers following Count: and prints them.

Field spearator to used if they are not escaped using awk

i have once question, suppose i am using "=" as fiels seperator, in this case if my string contain for example
abc=def\=jkl
so if i use = as fields seperator, it will split into 3 as
abc def\ jkl
but as i have escaped 2nd "=" , my output should be as
abc def\=jkl
Can anyone please provide me any suggestion , if i can achieve this.
Thanks in advance
I find it simplest to just convert the offending string to some other string or character that doesn't appear in your input records (I tend to use RS if it's not a regexp* since that cannot appear within a record, or the awk builtin SUBSEP otherwise since if that appears in your input you have other problems) and then process as normal other than converting back within each field when necessary, e.g.:
$ cat file
abc=def\=jkl
$ awk -F= '{
gsub(/\\=/,RS)
for (i=1; i<=NF; i++) {
gsub(RS,"\\=",$i)
print i":"$i
}
}' file
1:abc
2:def\=jkl
* The issue with using RS if it is an RE (i.e. multiple characters) is that the gsub(RS...) within the loop could match a string that didn't get resolved to a record separator initially, e.g.
$ echo "aa" | gawk -v RS='a$' '{gsub(RS,"foo",$1); print "$1=<"$1">"}'
$1=<afoo>
When the RS is a single character, e.g. the default newline, that cannot happen so it's safe to use.
If it is like the example in your question, it could be done.
awk doesn't support look-around regex. So it would be a bit difficult to get what you want by setting FS.
If I were you, I would do some preprocessing, to make the data easier to be handled by awk. Or you could read the line, and using other functions by awk, e.g. gensub() to remove those = s you don't want to have in result, and split... But I guess you want to achieve the goal by playing field separator, so I just don't give those solutions.
However it could be done by FPAT variable.
awk -vFPAT='\\w*(\\\\=)?\\w*' '...' file
this will work for your example. I am not sure if it will work for your real data.
let's make an example, to split this string: "abc=def\=jkl=foo\=bar=baz"
kent$ echo "abc=def\=jkl=foo\=bar=baz"|awk -vFPAT='\\w*(\\\\=)?\\w*' '{for(i=1;i<=NF;i++)print $i}'
abc
def\=jkl
foo\=bar
baz
I think you want that result, don't you?
my awk version:
kent$ awk --version|head -1
GNU Awk 4.0.2

Resources