sed extract something from string - unix

I have a string " r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
but, I only want "hash-r1.r5218.tbz"
so, I try this
unix$ a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
unix$ echo $a | sed 's/.*\/\([^\/]*\)\.tbz/\1/' //[1]
hash-r1.r5218 //I know this should work
unix$ echo $a | sed 's/.*\/\([^\/]+\)\.tbz/\1/' //[2]
r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz //however I do not know why it does not work.
as far as I remember, + in regexp, means using previous regexp 1 or more times. * in regexp, means using previous regexp 0 or more times.
Could anyone explain why [2] fails, thanks a lot.

a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $a | sed 's:.*/::; s:.tbz$::'
hash-r1.r5218
You don't need to use '/' as the patern/repl marker, you can use other chars. The ':' is very popular.
Also, you don't have to use capture buffers, when you know the exact text on both sides of your target data.
I have substituted out all chars up to the last '/', relying on .* for all chars, and '/' to terminate the standard greedy search of sed. THe you sub out the trailing \.tbz with noting.
IHTH.

Not all versions of sed support + in the regex. Some that do support it require -r to be specified. But why use sed instead of basename or echo ${a##*/}?

Using this submatch via parentheses will grab everything after the last slash to the end of your line.
str="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $str | sed -n -E -e 's/.+\/(.+)$/\1/p'
returns hash-r1.r5218.tbz
Oh, and your #2 fails because sed by default prints out each line that has a match. Using the -n flag suppresses that, and the trailing 'p' on this regex prints out the replace part of the substitution.

Related

Extracting a specific length substring using SED

I have the following SED command
echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.*\)_\(.*\)$/\2_\1_\3/'
that returns
2222222233333333_abcd_jdkj
That's great, but I really want
22222222-33333333_abcd_jdkj
Is this possible with an easy tweak or do I need some non-sed solution? Basically, I know the number is 16 bytes, but I need to break it into two 8 byte numbers.
Instead of .* to match any number of characters, you can use .{8} to match exactly eight characters.
The below also uses sed -r to allow ERE syntax, which requires fewer backslashes and is generally easier to read than the default BRE. (On systems with BSD-style tools, this might be sed -E instead).
sed -re 's/^(.*)_(.{8})(.*)_(.*)$/\2-\3_\1_\4/' <<<"abcd_2222222233333333_jdkj"
By the way -- I would strongly suggest using [^_]* instead of .* so your regex can't match underscores where you don't want it to. (. means "any character"; [^_] means "any character except _"). That's not just a correctness enhancement -- it can also make your regex faster to evaluate by avoiding backtracking (where the regex engine realizes it's matched too much content and needs to undo some of its prior matches).
Also consider bash's built-in regex support:
string='abcd_2222222233333333_jdkj'
re='([^_]+)_([[:digit:]]{8})([[:digit:]]+)_(.*)'
if [[ $string =~ $re ]]; then
result=${BASH_REMATCH[2]}-${BASH_REMATCH[3]}_${BASH_REMATCH[1]}_${BASH_REMATCH[4]}
echo "Result is: $result"
else
echo "No match found"
fi
Solution per the above commenter's tip works
echo "abcd_2222222233333333_jdkj" | sed -e 's/^\(.*\)_\(.\{8\}\)\(.\{8\}\)_\(.*\)$/\2-\3_\1_\4/'

Why wont the plus work properly with this sed command?

I can't get the ([^/]+) sed regex to work properly.
Instead of returning all non-forward slash characters, it only returns one.
Command:
echo '/test/path/file.log' | sed -r 's|^.*([^/]+)/(.*)$|\1.\2|g'
Expected:
path.file.log
Result:
h.file.log
Also Tried this but got the same result:
echo '/test/path/file.log' | sed -r 's|^.*([^/]{1,})/(.*)$|\1.\2|g'
The problem is not with [^/]+, but with the preceding .*. .* is greedy, and will consume a maximal amount of input. My usual suggestion would be to use .*? to make it non-greedy, but POSIX regexes don't support that syntax.
If there will always be a slash, you could add one to the regex to stop it from consuming too much.
$ echo '/test/path/file.log' | sed -r 's|^.*/([^/]+)/(.*)$|\1.\2|g'
path.file.log
OSes uses different versions of sed. Some sed versions use basic regexp syntax by default; if you need extended regexp syntax (+ is one of those features) then you need to switch option with -E.

How to remove this string in file

I want to remove a string in a file, the string I want to remove is
"/package/myname:". I try to use sed to do that but could not.
Note there are a '/' at beginning and ':' at the end of the string which I do not know how to handle.
e.g. I was able to remove "package/myname" using:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
But when I run:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
the result does not replace anything.
What is the right way to remove "/package/myname:" in my case?
Problem:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
^ ^
| |
problem is mainly because of the two word boundaries.
\< - Boundary which matches between a non-word character and a word character.
\> - Boundary which matches between a word character and a word character and a non-word character.
So in the first case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
The \< before the package string matches the boundary which exists after / (non-word character) and p (word character). Likewise \> matches the boundary which exists between e (word character) and : (non-word character). So finally a match would occur and the characters which are matched are replaced by the empty string.
But in the second case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
\< fails to match the boundary which exists between a and forward slash / because \< matches only between the non-word character (left) and a word character (right) . Likewise \> fails to match the boundary which exists between : and / forward slash because there isn't a word character in the left and non-word character at the right.
Solution:
So, i suggest you to remove the word boundaries <\ and />. Or, you could do like this,
$ echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\>\/package\/myname\://g'
diff a/src/com/abc
I think now you could figure out the reason for the working of above command.
sed -i "s/\/package\/myname\://g;" [__YOUR_FILE_NAME__]
That removes the phrase.
Doesn't not remove the line.
grep -v # removes the line
sed 's%/package/myname:%%g'
using % instead of / to mark the ends of the sections of the substitute command. You can use any character that doesn't appear in the string. It can be quite effective to use Control-A as the delimiter, even.
You could also use:
sed 's/\/package\/myname://g'
but I prefer to avoid messing around with backslashes when there's an easy way to avoid them.

Find parent directory recursively

I am writing a script which requires me to find the parent directory of the existing directory.
Suppose I have a directory called deep which is under /path/is/very/deep/. I need to find the "path" directory but not "/".
Tried couple of combination of code but seems I am stuck in a dead lock.
Appreciate any help.
here is my code which is not working.
if [ $dir_name != "/" || $path != "/" ]
then
path=`dirname $dir_name`
dir_name=`dirname $path`
echo $path
else
dir_name=$path
echo $dirname
fi
here is an alternative way to solve your problem:
Generally all you need to do is to remove all characters in the string after (and including) the second / character. Linux Terminals have the sed function that allows you to do it.
try out this line in your Terminal/code:
echo '/path/is/very/deep' | sed -r 's:^(/[^/]*).*$:\1:'
you can remove the echo command and assign it to any variable in your code as you wish.
Edit: In sed the backslash (/) is also used as a delimiter character, you have to escape it with a forward slash (\). Parentheses are also escaped for some reason.
path="$( echo "${dir_name}" | sed -e 's|/[^/]*/\{0,1\}$||;s|^$|/|' )"
(posix compliant) and work for /, /bla, /bla/, bla/bla, ...

Replace a string in shell script using a variable

I am using the below code for replacing a string
inside a shell script.
echo $LINE | sed -e 's/12345678/"$replace"/g'
but it's getting replaced with $replace instead of the value of that variable.
Could anybody tell what went wrong?
If you want to interpret $replace, you should not use single quotes since they prevent variable substitution.
Try:
echo $LINE | sed -e "s/12345678/${replace}/g"
Transcript:
pax> export replace=987654321
pax> echo X123456789X | sed "s/123456789/${replace}/"
X987654321X
pax> _
Just be careful to ensure that ${replace} doesn't have any characters of significance to sed (like / for instance) since it will cause confusion unless escaped. But if, as you say, you're replacing one number with another, that shouldn't be a problem.
you can use the shell (bash/ksh).
$ var="12345678abc"
$ replace="test"
$ echo ${var//12345678/$replace}
testabc
Not specific to the question, but for folks who need the same kind of functionality expanded for clarity from previous answers:
# create some variables
str="someFileName.foo"
find=".foo"
replace=".bar"
# notice the the str isn't prefixed with $
# this is just how this feature works :/
result=${str//$find/$replace}
echo $result
# result is: someFileName.bar
str="someFileName.sally"
find=".foo"
replace=".bar"
result=${str//$find/$replace}
echo $result
# result is: someFileName.sally because ".foo" was not found
Found a graceful solution.
echo ${LINE//12345678/$replace}
Single quotes are very strong. Once inside, there's nothing you can do to invoke variable substitution, until you leave. Use double quotes instead:
echo $LINE | sed -e "s/12345678/$replace/g"
Let me give you two examples.
Using sed:
#!/bin/bash
LINE="12345678HI"
replace="Hello"
echo $LINE | sed -e "s/12345678/$replace/g"
Without Using sed:
LINE="12345678HI"
str_to_replace="12345678"
replace_str="Hello"
result=${str//$str_to_replace/$replace_str}
echo $result
Hope you will find it helpful!
echo $LINE | sed -e 's/12345678/'$replace'/g'
you can still use single quotes, but you have to "open" them when you want the variable expanded at the right place. otherwise the string is taken "literally" (as #paxdiablo correctly stated, his answer is correct as well)
To let your shell expand the variable, you need to use double-quotes like
sed -i "s#12345678#$replace#g" file.txt
This will break if $replace contain special sed characters (#, \). But you can preprocess $replace to quote them:
replace_quoted=$(printf '%s' "$replace" | sed 's/[#\]/\\\0/g')
sed -i "s#12345678#$replace_quoted#g" file.txt
I had a similar requirement to this but my replace var contained an ampersand. Escaping the ampersand like this solved my problem:
replace="salt & pepper"
echo "pass the salt" | sed "s/salt/${replace/&/\&}/g"
use # if you want to replace things like /. $ etc.
result=$(echo $str | sed "s#$oldstr#$newstr#g")
the above code will replace all occurrences of the specified replacement term
if you want, remove the ending g which means that the only first occurrence will be replaced.
Use this instead
echo $LINE | sed -e 's/12345678/$replace/g'
this works for me just simply remove the quotes
I prefer to use double quotes , as single quptes are very powerful as we used them if dont able to change anything inside it or can invoke the variable substituion .
so use double quotes instaed.
echo $LINE | sed -e "s/12345678/$replace/g"

Resources