I want to remove a string in a file, the string I want to remove is
"/package/myname:". I try to use sed to do that but could not.
Note there are a '/' at beginning and ':' at the end of the string which I do not know how to handle.
e.g. I was able to remove "package/myname" using:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
But when I run:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
the result does not replace anything.
What is the right way to remove "/package/myname:" in my case?
Problem:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
^ ^
| |
problem is mainly because of the two word boundaries.
\< - Boundary which matches between a non-word character and a word character.
\> - Boundary which matches between a word character and a word character and a non-word character.
So in the first case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
The \< before the package string matches the boundary which exists after / (non-word character) and p (word character). Likewise \> matches the boundary which exists between e (word character) and : (non-word character). So finally a match would occur and the characters which are matched are replaced by the empty string.
But in the second case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
\< fails to match the boundary which exists between a and forward slash / because \< matches only between the non-word character (left) and a word character (right) . Likewise \> fails to match the boundary which exists between : and / forward slash because there isn't a word character in the left and non-word character at the right.
Solution:
So, i suggest you to remove the word boundaries <\ and />. Or, you could do like this,
$ echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\>\/package\/myname\://g'
diff a/src/com/abc
I think now you could figure out the reason for the working of above command.
sed -i "s/\/package\/myname\://g;" [__YOUR_FILE_NAME__]
That removes the phrase.
Doesn't not remove the line.
grep -v # removes the line
sed 's%/package/myname:%%g'
using % instead of / to mark the ends of the sections of the substitute command. You can use any character that doesn't appear in the string. It can be quite effective to use Control-A as the delimiter, even.
You could also use:
sed 's/\/package\/myname://g'
but I prefer to avoid messing around with backslashes when there's an easy way to avoid them.
Related
I had to download 15GB of data and for some reason during the downloading process the filenames were messed up in a way so that instead of
test_file.txt
the filenames are doubled, so it's
test_file.txttest_file.txt
instead. My only idea was whether there is any way to count the letters and then rename each file with deleting the first/ or second half of the filename? The filenames are not consistent, so for example in the same folder there might also be files named
files_are_great.txtfiles_are_great.txt
so I'm struggling to find a way to loop over them.
Thanks a lot!
The command sed 's/\(.*\)\1/\1/' will replace all duplicated strings with the single string without requiring a certain part of the file name like .txt. It allows spaces in the string.
Example:
echo 'abc defabc def' | sed 's/^\(.*\)\1$/\1/'
prints
abc def
Explanation of the sed command:
^ anchors the pattern to the beginning of the line
.* is 0 or more occurrences of any character
\(...\) captures what matches the pattern in between
\1 is a reference to the first capture group, i.e. the text that was found before
$ anchors the search pattern to the end of the line
This results in a search pattern that matches a whole line that consists of any text followed by the same text.
\1 in the replacement is the same reference to the matched text, i.e. a single occurrence of the duplicated text.
Any input that does not match the pattern will remain unchanged.
Assuming you want to rename all files in the current directory you can use it like this
for file in *
do
new=$(echo $file|sed 's/\(.*\)\1/\1/')
[ "$file" = "$new" ] || mv "$file" "$new"
done
As the sed command does not change non-matching input, $new will be the same as $file for file names that don't consist of a duplicated string. This would result in an error message from mv. That's why the renaming will be skipped in this case.
Using sed
sed 's#\(\.txt\)#& #g'
Explanation: using \( \) we group the expression which can be accessed using &
Demo:
echo "files_are_great.txtfiles_are_great.txt" | sed 's#\(\.txt\)#& #g'
files_are_great.txt files_are_great.txt
For renaming:
for file_name in $(ls -1 *txt*txt)
do
new_file_name=$(echo $i |sed 's#\(\.txt\)#& #g' | cut -d' ' -f1)
mv $file_name $new_file_name
done
I need a sed command to change a phone number format from 999-999-9999 to (999)999-9999.
Here is what I've been trying:
sed 's/[[:digit:]]\-[[:digit:]]\-[[:digit:]]/\([[:digit:]]\)[[:digit:]]\-[[:digit:]]/gp'
I've also tried this:
sed 's/([0-9]{3})\-([0-9]{3})\-([0-9]{4})/\(([0-9]{3}\))([0-9]{3})\-([0-9]{4})/gp'
The notation [[:digit:]] matches a single digit; you need to match repeated digits, which you do by wrapping the repeat count in \{3\} (for a fixed count; there are variable counted ranges too, but they're not relevant here, and * and so on too). And you need to capture what you match in \(…\) so you can reference them in the replacement. In the replacement, you use \1 etc to refer to captured fragments. The captures are numbered left-to-right in the order of the \( symbols.
sed 's/\([[:digit:]]\{3\}\)-\([[:digit:]]\{3\}-[[:digit:]]\{4\}\)/(\1)\2/g'
Or:
sed 's/\([0-9]\{3\}\)-\([0-9]\{3\}-[0-9]\{4\}\)/(\1)\2/g'
This is classic sed notation; you can find variants using extended regular expressions too, but you need different options depending on platform, unlike this notation. The patterns look for 3 digits (first capture), a dash, then 3 more digits, another dash and 4 digits as the second capture, and replace all that with open bracket (parenthesis in American), the first 3 digits, close bracket, and the remaining 3 digits, dash, 4 digits.
BSD (Mac OS X):
sed -E 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
GNU:
sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
Note that all of these regular expressions would convert
9876-345-54321
to:
9(876)345-54321
Fixing that is less trivial, especially in sed. Using Perl:
$ echo "987-654-3210 and 2987-654-543210 and 222-333-4444 and 543-432-5544" |
> perl -p -e 's/\b([0-9]{3})-([0-9]{3}-[0-9]{4})\b/(\1)\2/g'
(987)654-3210 and 2987-654-543210 and (222)333-4444 and (543)432-5544
$
The \b marks a word boundary in PCRE. That does mean that a222-333-4444 is not matched by the Perl; you can refine things to insist on non-digit or start of string before, and non-digit or end of string after, the matching string.
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> perl -p -e 's/(^|\D)([0-9]{3})-([0-9]{3}-[0-9]{4})(\D|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Or with (BSD or GNU) sed extended regular expressions (BSD shown):
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> sed -E 's/(^|[^0-9])([0-9]{3})-([0-9]{3}-[0-9]{4})([^0-9]|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Note that the negated digit character class notation can be written [^[:digit:]] if you wish.
Iterative development helps.
$ echo 123-456-7890 | sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/'
(123)456-7890
Have
08-01-12|07-30-13|08-09-32|12-43-56|
Want
08-01-12|07-30-13|08-09-32|12-43-56
I want to remove just the last |.
quick and dirty
sed 's/.$//' YourFile
a bit secure
sed 's/[|]$//' YourFile
allowing space
sed 's/[|][[:space:]]*$//' YourFile
same for only last char of last line (thansk #amelie for this comment) :
add a $in front so on quick and dirty it gives sed '$ s/.$//' YourFile
Since you tagged with awk, find here two approaches, one for every interpretation of your question:
If you want to remove | if it is the last character:
awk '{sub(/\|$/,"")}1' file
Equivalent to sed s'/|$//' file, only that escaping | because it has a special meaning in regex content ("or").
If you want to remove the last character, no matter what it is:
awk '{sub(/.$/,"")}1' file
Equivalent to sed s'/.$//' file, since . matches any character.
Test
$ cat a
08-01-12|07-30-13|08-09-32|12-43-56|
rrr.
$ awk '{sub(/\|$/,"")}1' a
08-01-12|07-30-13|08-09-32|12-43-56
rrr. # . is kept
$ awk '{sub(/.$/,"")}1' a
08-01-12|07-30-13|08-09-32|12-43-56
rrr # . is also removed
I have a string " r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
but, I only want "hash-r1.r5218.tbz"
so, I try this
unix$ a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
unix$ echo $a | sed 's/.*\/\([^\/]*\)\.tbz/\1/' //[1]
hash-r1.r5218 //I know this should work
unix$ echo $a | sed 's/.*\/\([^\/]+\)\.tbz/\1/' //[2]
r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz //however I do not know why it does not work.
as far as I remember, + in regexp, means using previous regexp 1 or more times. * in regexp, means using previous regexp 0 or more times.
Could anyone explain why [2] fails, thanks a lot.
a="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $a | sed 's:.*/::; s:.tbz$::'
hash-r1.r5218
You don't need to use '/' as the patern/repl marker, you can use other chars. The ':' is very popular.
Also, you don't have to use capture buffers, when you know the exact text on both sides of your target data.
I have substituted out all chars up to the last '/', relying on .* for all chars, and '/' to terminate the standard greedy search of sed. THe you sub out the trailing \.tbz with noting.
IHTH.
Not all versions of sed support + in the regex. Some that do support it require -r to be specified. But why use sed instead of basename or echo ${a##*/}?
Using this submatch via parentheses will grab everything after the last slash to the end of your line.
str="r1/pkg/amd64/misc/hash/hash-r1.r5218.tbz"
echo $str | sed -n -E -e 's/.+\/(.+)$/\1/p'
returns hash-r1.r5218.tbz
Oh, and your #2 fails because sed by default prints out each line that has a match. Using the -n flag suppresses that, and the trailing 'p' on this regex prints out the replace part of the substitution.
grep -w uses punctuations and whitespaces as delimiters.
How can I set grep to only use whitespaces as a delimiter for a word?
If you want to match just spaces: grep -w foo is the same as grep " foo ". If you also want to match line endings or tabs you can start doing things like: grep '\(^\| \)foo\($\| \)', but you're probably better off with perl -ne 'print if /\sfoo\s/'
You cannot change the way grep -w works. However, you can replace punctuations with, say, X character using tr or sed and then use grep -w, that will do the trick.
The --word-regexp flag is useful, but limited. The grep man page says:
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
If you want to use custom field separators, awk may be a better fit for you. Or you could just write an extended regular expression with egrep or grep --extended-regexp that gives you more control over your search pattern.
Use tr to replace spaces with new lines. Then grep your string. The contiguous string I needed was being split up with grep -w because it has colons in it. Furthermore, I only knew the first part, and the second part was the unknown data I needed to pull. Therefore, the following helped me.
echo "$your_content" | tr ' ' '\n' | grep 'string'