Can't extract pattern from filename

Can't extract pattern from filename - unix

Getting errors from the following sed command:
echo 20130521_OnePKI_p107336_APP.pfx | sed -e 's/_\([pP][0-9]+\)_/\1/'
Instead of returning p107336, it is returning the full filenam 20130521_OnePKI_p107336_APP.pfx.
Any ideas why this is happening, and how I can restrict the output to only the pattern I would like?

The captures should be escaped parentheses and you can use case-insensitive match i, also, you are replacing the capture part with the captured part so no changes are made. This one matches the entire line and replaces it with the captured pattern:
sed -e 's/.*_\([pP][0-9][0-9]*\)_.*/\1/'

An easier way might be to use grep:
echo 20130521_OnePKI_p107336_APP.pfx | egrep -o "[pP][0-9]+"
The "-o" tells grep to only print the matching part of the input.

The regex [pP][0-9]+ in principle matches any substring that begins with either p or P followed by one or more digits. The string "20130521_OnePKI_p107336_APP.pfx" has a substring matching that pattern so the whole string matches the regex.
When grouping with parenthesis around the whole regex on the left side and referring to it on the right side like you did in 's/([pP][0-9]+)/\1/' you're basically saying "replace the match with itself", which will naturally result in the same string as had in the first place.
What you need here is to match the whole string from beginning and then group a part of that string, as already indicated. Then you can refer to that part on the right side to extract it from the bigger string.
You will need to appropriately escape the expression when working in a shell.

You must escape parens and +. Also match all the string and substitute all it only with the part you wish (.* before and end your string):
... | sed -e 's/^.*\([pP][0-9]\+\).*$/\1/'

Related

Replace double consonant letters with one using sed command

How to replace double consonants with only one letter using sed Linux command. Example: WILLIAM -> WILIAM. grep -E '(.)\1+' commands finds the words that follow two same consonants in a row pattern, but how do I replace them with only one occurrence of the letter?
I tried
cat test.txt | head | tr -s '[^AEUIO\n]' '?'

tr is all or nothing; it will replace all occurrences of the selected characters, regardless of context. For regex replacement, look at sed - you even included this in your question's tags, but you don't seem to have explored how it might be useful?
sed 's/\(.\)\1/\1/g' test.txt
The dot matches any character; to restrict to only consonants, change it to [b-df-hj-np-tv-xz] or whatever makes sense (maybe extend to include upper case; perhaps include accented characters?)
The regex dialect understood by sed is more like the one understood by grep without -E (hence all the backslashes); though some sed implementations also support this option to select the POSIX extended regular expression dialect.
Neither sed not tr need cat to read standard input for them (though tr obscurely does not accept a file name argument). See tangentially also Useless use of cat?

Match one consonant, remember it in \( \), then match is again with \1 and substitute it for itself.
sed 's/\([bcdfghjklmnpqrstvxzBCDFGHJKLMNPQRSTVXZ]\)\1/\1/'

How can I use grep to get all files and words in each file which contains the suffix (ASC. or DEFG. or CDW.)

At the moment with - grep -row "ASC.*\| DEFG.*\|"
I get below result:
/data/de_pgms/programs/00_individuals/programs/parts:ASC */
/data/de_pgms/programs/00_individuals/programs/parts:ASC.LKP_DAILY_DATES
/data/de_pgms/programs/00_individuals/programs/parts:DEFG Analysts\DATA_REQUEST.XLSX";
/data/de_pgms/programs/00_individuals/programs/parts:DEFG_AA/Constrained Supplier List";
How do i make sure I only get results such as
/data/de_pgms/programs/00_individuals/programs/parts:ASC.LKP_DAILY_DATES
/data/de_pgms/programs/00_individuals/programs/acm:DEFG.EDS_MONTHLY_RUN
Question:
how can I use grep to get all files and words in each file which contain the suffix ASC. or DEFG. or CDW.?

grep -e ":ASC\..*$\|:DEFG\..*$" file
Try this. It changes your pattern to a regular expression adding more context. most of it is literal like your expression however the $ at the end being an important feature to say the line ends here. prepending the ":" to the expressions prevents some false matches too. finally the .* says to match any one or more of any character.

Substitution with backreferencing in Atom editor

I have a frequent pattern on my text, say
(Eq. \ref{XXXX})
where XXXX is some word, and I'd like to change all this simply to
\refp{XXXX}
I can't make it work through CtrlF, even with Regex. The syntax
\(Eq. \\ref{.*}\)
works for finding the occurences (if with some bugs...) but the traditional backreferencing
\\refp{\1}
won't work for the replacement.
I tried to create a custom command with the atom-shell-commands package, the idea would be to use sed on the current selection. But the package won't accept octal escape sequences.
Any thoughts?

The replacement tokens use a $ sigil, not \. So you want $1, $2, $3, ...
The replacement in this case should be:
\\refp{$1}
As is common with regex matching, these tokens match the contents of paren groups, from left to right. So you need to add matching parens also. Your match string would be:
\(Eq. \\ref{(.*)}\)
Note there are parens around the .* match, so whatever is inside those parens is stored in $1. If there were a second and third set of parens, those would become $2 and $3.

Delete line containing a specific string starting with dollar sign using unix sed

I am very new to Unix.
I have a parameter file Parameter.prm containing following lines.
$$ErrorTable1=ErrorTable1
$$Filename1_New=FileNew.txt
$$Filename1_Old=FileOld.txt
$$ErrorTable2=ErrorTable2
$$Filename2_New=FileNew.txt
$$Filename2_Old=FileOld.txt
$$ErrorTable3=ErrorTable3
$$Filename3_New=FileNew.txt
$$Filename3_Old=FileOld.txt
I want get the output as
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
Basically, I need to delete line starting with $$Filename.
Since $ is a keyword, I am not able to interpret it as a string. How can I accomplish this using sed?

With sed:
$ sed '/$$Filename/d' infile
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
The /$$Filename/ part is the address, i.e., for all lines matching this, the command following it will be executed. The command is d, which deletes the line. Lines that don't match are just printed as is.

Extracting information from a textfile based on pattern search is a job for grep:
grep ErrorTable file
or even
grep -F '$$ErrorTable' file
-F tells grep to treat the search term as a fixed string instead of a regular expression.
Just to answer your question, if a regular expression needs to search for characters which have a special meaning in the regex language, you need to escape them:
grep '\$\$ErrorTable' file

Unix syntax for the grep command for only an ending character

For the file james, when I run this command:
cat james | grep ["."]
I get only the lines that contain a dot.
How do I get only the lines that end with a dot?

To find lines that end with a . character:
grep '\.$' james
Your cat command is unnecessary; grep is able to read the file itself, and doesn't need cat to do that job for it.
A . character by itself is special in regular expressions, matching any one character; you need to escape it with a \ to match a literal . character.
And you need to enclose the whole regular expression in single quotes because the \ and $ characters are special to the shell. In a regular expression, $ matches the end of a line. (You're dealing with some characters that are treated specially by the shell, and others that are treated specially by grep; the single quotes get the shell out of the way so you can control what grep sees.)
As for the square brackets you used in your question, that's another way to escape the ., but it's unusual. In a regular expression, [abc] matches a single character that's any of a, b, or c. [.] matches a single literal . character, since . loses its special meaning inside square brackets. The double quotes you used: ["."] are unnecessary, since . isn't a shell metacharacter -- but square brackets are special to the shell, with a similar meaning to their meaning in a regular expression. So your
grep ["."]
is equivalent to
grep [.]
The shell would normally expand [.] to a list of every visible file name that contains the single character .. There's always such a file, namely the current directory . -- but the shell's [] expansion ignores files whose names start with .. So since there's nothing to expand [.] to, it's left alone, and grep sees [.] as an argument, which just happens to work, matching lines that contain a literal . character. (Using a different shell, or the same shell with different settings, could mess that up.)
Note that the shell doesn't (except in some limited contexts) deal with regular expressions; rather it uses file matching patterns, which are less powerful.

You need to use $, which signals the end of the line:
cat james | grep ["."]$
This also works:
cat james |grep "\.$"

You can use a regular expression. This is what you need :
cat james | grep "\.$"
Look at grep manpage for more informations about regexp

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can't extract pattern from filename - unix

The captures should be escaped parentheses and you can use case-insensitive match i, also, you are replacing the capture part with the captured part so no changes are made. This one matches the entire line and replaces it with the captured pattern: sed -e 's/._\([pP][0-9][0-9]\)_.*/\1/'

An easier way might be to use grep: echo 20130521_OnePKI_p107336_APP.pfx | egrep -o "[pP][0-9]+" The "-o" tells grep to only print the matching part of the input.

You must escape parens and +. Also match all the string and substitute all it only with the part you wish (.* before and end your string): ... | sed -e 's/^.\([pP][0-9]\+\).$/\1/'

Related

Replace double consonant letters with one using sed command

How can I use grep to get all files and words in each file which contains the suffix (ASC. or DEFG. or CDW.)

Substitution with backreferencing in Atom editor

Delete line containing a specific string starting with dollar sign using unix sed

Unix syntax for the grep command for only an ending character

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can't extract pattern from filename - unix

The captures should be escaped parentheses and you can use case-insensitive match i, also, you are replacing the capture part with the captured part so no changes are made. This one matches the entire line and replaces it with the captured pattern: sed -e 's/.*_\([pP][0-9][0-9]*\)_.*/\1/'

An easier way might be to use grep: echo 20130521_OnePKI_p107336_APP.pfx | egrep -o "[pP][0-9]+" The "-o" tells grep to only print the matching part of the input.

You must escape parens and +. Also match all the string and substitute all it only with the part you wish (.* before and end your string): ... | sed -e 's/^.*\([pP][0-9]\+\).*$/\1/'

Related

Replace double consonant letters with one using sed command

How can I use grep to get all files and words in each file which contains the suffix (ASC. or DEFG. or CDW.)

Substitution with backreferencing in Atom editor

Delete line containing a specific string starting with dollar sign using unix sed

Unix syntax for the grep command for only an ending character

Categories

Resources

The captures should be escaped parentheses and you can use case-insensitive match i, also, you are replacing the capture part with the captured part so no changes are made. This one matches the entire line and replaces it with the captured pattern: sed -e 's/._\([pP][0-9][0-9]\)_.*/\1/'

You must escape parens and +. Also match all the string and substitute all it only with the part you wish (.* before and end your string): ... | sed -e 's/^.\([pP][0-9]\+\).$/\1/'