Why wont the plus work properly with this sed command? - unix

I can't get the ([^/]+) sed regex to work properly.
Instead of returning all non-forward slash characters, it only returns one.
Command:
echo '/test/path/file.log' | sed -r 's|^.*([^/]+)/(.*)$|\1.\2|g'
Expected:
path.file.log
Result:
h.file.log
Also Tried this but got the same result:
echo '/test/path/file.log' | sed -r 's|^.*([^/]{1,})/(.*)$|\1.\2|g'

The problem is not with [^/]+, but with the preceding .*. .* is greedy, and will consume a maximal amount of input. My usual suggestion would be to use .*? to make it non-greedy, but POSIX regexes don't support that syntax.
If there will always be a slash, you could add one to the regex to stop it from consuming too much.
$ echo '/test/path/file.log' | sed -r 's|^.*/([^/]+)/(.*)$|\1.\2|g'
path.file.log

OSes uses different versions of sed. Some sed versions use basic regexp syntax by default; if you need extended regexp syntax (+ is one of those features) then you need to switch option with -E.

Related

Extracting a specific length substring using SED

I have the following SED command
echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.*\)_\(.*\)$/\2_\1_\3/'
that returns
2222222233333333_abcd_jdkj
That's great, but I really want
22222222-33333333_abcd_jdkj
Is this possible with an easy tweak or do I need some non-sed solution? Basically, I know the number is 16 bytes, but I need to break it into two 8 byte numbers.
Instead of .* to match any number of characters, you can use .{8} to match exactly eight characters.
The below also uses sed -r to allow ERE syntax, which requires fewer backslashes and is generally easier to read than the default BRE. (On systems with BSD-style tools, this might be sed -E instead).
sed -re 's/^(.*)_(.{8})(.*)_(.*)$/\2-\3_\1_\4/' <<<"abcd_2222222233333333_jdkj"
By the way -- I would strongly suggest using [^_]* instead of .* so your regex can't match underscores where you don't want it to. (. means "any character"; [^_] means "any character except _"). That's not just a correctness enhancement -- it can also make your regex faster to evaluate by avoiding backtracking (where the regex engine realizes it's matched too much content and needs to undo some of its prior matches).
Also consider bash's built-in regex support:
string='abcd_2222222233333333_jdkj'
re='([^_]+)_([[:digit:]]{8})([[:digit:]]+)_(.*)'
if [[ $string =~ $re ]]; then
result=${BASH_REMATCH[2]}-${BASH_REMATCH[3]}_${BASH_REMATCH[1]}_${BASH_REMATCH[4]}
echo "Result is: $result"
else
echo "No match found"
fi
Solution per the above commenter's tip works
echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.\{8\}\)\(.\{8\}\)_\(.*\)$/\2-\3_\1_\4/'

How to swap two words that are connected via hyphen using sed

I need to swap two words that are connected with a hyphen, e.g., super-fast -> fast-super, using extended regular expressions.
I have already searched through the internet and the solution I came up with:
sed -r "s/^\(.*\) \(.-\) \(.*\)/\3 \2 \1/" inputfile
doesn't work.
If you know for sure you'll have exactly one -:
$ sed -E 's/(.*)-(.*)/\2-\1/' <<< 'super-fast'
fast-super
If you have more than one hyphen, this will swap the parts before and after the last one:
$ sed -E 's/(.*)-(.*)/\2-\1/' <<< 'one-two-three'
three-one-two
If you want to swap around the first one:
$ sed -E 's/([^-]*)-(.*)/\2-\1/' <<< 'one-two-three'
two-three-one
This uses [^-]* in the first capture group: zero or more of "not a hyphen".
Notice that if you use -r (or the equivalent -E) for extended regular expressions in sed, capture groups must not be escaped.

Testing a SED command for replacing text, it gives no error, but it isn't working as intended

I'm trying to search in all files a text, and replace it with the word EXAMPLE. I do the following:
for f in /home/testu/zz*; do
sed -i "s/&VAR1\s*=\s*'?[1]{4}'?/EXAMPLE/g" "$f"
done
It gives no error, the files seems to be "updated" in the filesystem, but they wont get changed. If I test that regexp with the grep command it works fine, so something must be wrong with SED, could it be SED version?
Thanks in advance.
Your current sed command parses the regular expression as a POSIX BRE compliant pattern.
In BRE POSIX, ? matches a literal ? char, and { / } also match literal { / } chars. To make a range quantifier in a BRE POSIX pattern, you need to escape {...}, \{min,max\}.
The [1] is equal to 1, so the brackets are quite redundant here.
To fix your pattern, you may replace ? with \{0,1\} (0 or 1 occurrences) and {4} with \{4\}:
sed -i "s/&VAR1\s*=\s*'\{0,1\}1\{4\}'\{0,1\}/EXAMPLE/g" "$f"
Thanks to Wiktor Stribiżew tips, we got the solution (SSED GNU 4.1.5). The resulting regexp works with grep and sed. The code was a mix of solutions at the end.
sed -i "s/&VAR1\s*=\s*'\{0,\}1\{4\}'\{0,\};\{0,\}/EXAMPLE/g" "$f"
A few things:
Things like [[:blank:]] caused error of input file.
My sed version didnt support -E, so the {} had to be escaped, didn't know that :)
Thanks again Wiktor!

Using sed to replace text with curly braces

I am trying to find the following text
get_pins {
and replace it with
get_pins -hierarchical {proc_top_*/
I've tried using sed but I'm not sure what I'm doing wrong. I know that you need # in front of curly braces but I still can't get the command to work properly.
The closest I've come is to this:
sed 's/get_pins #{#/get_pins -hierarchical #{#proc_top_*\//g' filename.txt > output
but it doesn't do the replacement I wanted above.
#merlin2011's answer shows you how to do it with alternative delimiters, but as for why your command didn't work:
It's actually perfectly fine, if you just remove all # chars. from your statement:
sed 's/get_pins {/get_pins -hierarchical {proc_top_*\//'g filename.txt > output
There are two distinct escaping requirements involved here:
Escaping literal use of the regex delimiter: this is what you did correctly, by escaping the / as \/.
Escaping characters with special meaning inside a regex in general: this escaping is always done with \-prefixing, but in your case there is NO need for such escaping: since you're NOT using -E or -r to indicate use of extended regexes - and are therefore using a basic regex - { is actually NOT a special character, so you need NOT escape it. If, by contrast, you had used -E (-r), then you should have escaped { as \{.
The problem is not in the curly braces, it's in the /.
This is exactly why sed lets you do alternate delimiters.
The line below uses ! as a delimiter instead, and works correctly for a simple file with get_pins { in it.
sed 's!get_pins {!get_pins -hierarchical {proc_top_*/!g' Input.txt
Output:
get_pins -hierarchical {proc_top_*/
Update: Based mklement0's comment, and testing with the csh shell, the following should work in csh.
sed 's#get_pins {#get_pins -hierarchical {proc_top_*/#g' Input.txt
This awk should do the replace:
awk '{sub(/get_pins {/,"get_pins -hierarchical {proc_top_*/")}1'

sed extract something from string

I have a string " r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
but, I only want "hash-r1.r5218.tbz"
so, I try this
unix$ a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
unix$ echo $a | sed 's/.*\/\([^\/]*\)\.tbz/\1/' //[1]
hash-r1.r5218 //I know this should work
unix$ echo $a | sed 's/.*\/\([^\/]+\)\.tbz/\1/' //[2]
r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz //however I do not know why it does not work.
as far as I remember, + in regexp, means using previous regexp 1 or more times. * in regexp, means using previous regexp 0 or more times.
Could anyone explain why [2] fails, thanks a lot.
a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $a | sed 's:.*/::; s:.tbz$::'
hash-r1.r5218
You don't need to use '/' as the patern/repl marker, you can use other chars. The ':' is very popular.
Also, you don't have to use capture buffers, when you know the exact text on both sides of your target data.
I have substituted out all chars up to the last '/', relying on .* for all chars, and '/' to terminate the standard greedy search of sed. THe you sub out the trailing \.tbz with noting.
IHTH.
Not all versions of sed support + in the regex. Some that do support it require -r to be specified. But why use sed instead of basename or echo ${a##*/}?
Using this submatch via parentheses will grab everything after the last slash to the end of your line.
str="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $str | sed -n -E -e 's/.+\/(.+)$/\1/p'
returns hash-r1.r5218.tbz
Oh, and your #2 fails because sed by default prints out each line that has a match. Using the -n flag suppresses that, and the trailing 'p' on this regex prints out the replace part of the substitution.

Resources