Replace groups of characters by newline [duplicate] - unix

This question already has answers here:
Insert newline (\n) using sed
(4 answers)
Closed 2 years ago.
I have a line with ', ' in my file, and I want to replace this with an new line
Input:
['siteed01pg|10.229.16.153|10.229.0.0|19|test / crt|BACKUP_MUT_SD Vlan981 (PVLAN 1981) New Backup Subnet #1 (site SD)', 'siteed01pg|10.129.135.53|10.129.135.0|26|test / crt|Fmer bopreprodback Vlan 754', '
[...]
My sed command:
sed "s/\', \'/\n/g"
Output:
['siteed01pg|10.229.16.153|10.229.0.0|19|test / crt|BACKUP_MUT_SD Vlan981 (PVLAN 1981) New Backup Subnet #1 (site SD)nsiteed01pg|10.129.135.53|10.129.135.0|26|test / crt|Fmer bopreprodback Vlan 754n
in my output the line break has been replaced by the character n
Why ?

You can use sed like this to use \n in replacement:
sed "s/', '/"$'\\\n'"/g" file
Here we are using $'\n' to use a newline character in replacement. We ended up using ``$'\\n'due to use of double quotes aroundsed` command.
As per man bash:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard
or else with multiline sed:
sed "s/', '/\\
/g" file
This will work on both gnu and POSIX sed versions on bash.
PS: If you're using gnu sed then a simplified command would be:
sed "s/', '/\n/g" file
['siteed01pg|10.229.16.153|10.229.0.0|19|test / crt|BACKUP_MUT_SD Vlan981 (PVLAN 1981) New Backup Subnet #1 (site SD)
siteed01pg|10.129.135.53|10.129.135.0|26|test / crt|Fmer bopreprodback Vlan 754
[...]

Related

Unix ksh replace 2nd to last character of a variable

I am on a UNIX system that uses the Korn Shell. I'm a UNIX beginner. Here is what I want to do:
var1=user/hYbMj8d#RM1
I would like to replace only the 2nd to the last letter, which will always be an M if that matters, and change it to an O, and and store this as a new it into a new variable.
So the new variable will contain:
var2=user/hYbMj8d#RO1
Try this with sed:
sed 's/M\(.$\)/O\1/'
Full solution for storing updated var1 in var2:
var2=`echo $var1 | sed 's/M\(.$\)/O\1/'`
Explanation: using sed replace command: s/<find>/<replace>/. It searches for M.$ pattern where $ is EOL - so it will be 2nd character from the end. Then it captures remainder after M character (escaped parentheses - in sed capture groups should be escaped) and replaces it with O character and remainder after M - what was captured (it is special reference \1 - captured group #1).

I need a sed command to change a phone number format from 999-999-9999 to (999)999-9999

I need a sed command to change a phone number format from 999-999-9999 to (999)999-9999.
Here is what I've been trying:
sed 's/[[:digit:]]\-[[:digit:]]\-[[:digit:]]/\([[:digit:]]\)[[:digit:]]\-[[:digit:]]/gp'
I've also tried this:
sed 's/([0-9]{3})\-([0-9]{3})\-([0-9]{4})/\(([0-9]{3}\))([0-9]{3})\-([0-9]{4})/gp'
The notation [[:digit:]] matches a single digit; you need to match repeated digits, which you do by wrapping the repeat count in \{3\} (for a fixed count; there are variable counted ranges too, but they're not relevant here, and * and so on too). And you need to capture what you match in \(…\) so you can reference them in the replacement. In the replacement, you use \1 etc to refer to captured fragments. The captures are numbered left-to-right in the order of the \( symbols.
sed 's/\([[:digit:]]\{3\}\)-\([[:digit:]]\{3\}-[[:digit:]]\{4\}\)/(\1)\2/g'
Or:
sed 's/\([0-9]\{3\}\)-\([0-9]\{3\}-[0-9]\{4\}\)/(\1)\2/g'
This is classic sed notation; you can find variants using extended regular expressions too, but you need different options depending on platform, unlike this notation. The patterns look for 3 digits (first capture), a dash, then 3 more digits, another dash and 4 digits as the second capture, and replace all that with open bracket (parenthesis in American), the first 3 digits, close bracket, and the remaining 3 digits, dash, 4 digits.
BSD (Mac OS X):
sed -E 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
GNU:
sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
Note that all of these regular expressions would convert
9876-345-54321
to:
9(876)345-54321
Fixing that is less trivial, especially in sed. Using Perl:
$ echo "987-654-3210 and 2987-654-543210 and 222-333-4444 and 543-432-5544" |
> perl -p -e 's/\b([0-9]{3})-([0-9]{3}-[0-9]{4})\b/(\1)\2/g'
(987)654-3210 and 2987-654-543210 and (222)333-4444 and (543)432-5544
$
The \b marks a word boundary in PCRE. That does mean that a222-333-4444 is not matched by the Perl; you can refine things to insist on non-digit or start of string before, and non-digit or end of string after, the matching string.
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> perl -p -e 's/(^|\D)([0-9]{3})-([0-9]{3}-[0-9]{4})(\D|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Or with (BSD or GNU) sed extended regular expressions (BSD shown):
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> sed -E 's/(^|[^0-9])([0-9]{3})-([0-9]{3}-[0-9]{4})([^0-9]|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Note that the negated digit character class notation can be written [^[:digit:]] if you wish.
Iterative development helps.
$ echo 123-456-7890 | sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/'
(123)456-7890

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

SED character after the substitute command ("s")

I know about s// type command in sed, however never saw using s#. Could someone explain what exactly this is doing?
% sed -e "s#SRC_DIR=.*#SRC_DIR=$PROJECT_SRC_DIR#g" -i proj.cfg
I understand that -e defines a script to execute, and the script is withing "", but what exactly s# does?
Checked http://www.grymoire.com/Unix/Sed.html and gnu website, but no luck.
# is a sed delimiter like /. We could use ~, #, /, ;, etc as sed delimiters. They uses a different delimiter # because they don't want to escape / slashes. If you use # as delimiter, you don't need to escape / forward slash. But if you use / as delimiter, you must need to escape / as \/ or otherwise sed would consider / as delimiter.
From sed's manual:
The syntax of the s (as in substitute) command is ‘s/regexp/replacement/flags’. The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.

How to replace a pattern with newline (\n) with sed under UNIX / Linux operating systems?

I have a txt file which contains:
Some random
text here. This file
has multiple lines. Should be one line.
I use:
sed '{:q;N;s/\n/:sl:/g;t q}' file1.txt > singleline.txt
and get:
Some random:sl:text here. This file:sl:has multiple lines. Should be one line.
Now I want to replace the :sl: pattern with newline (\n) character. When I use:
sed 's/:sl:/&\n/g' singleline.txt
I get:
Some random:sl:
text here. This file:sl:
has multiple lines. Should be one line.
How to replace the pattern with newline character instead of adding newline character after the pattern?
Sed uses & as a shortcut for the matched pattern. So you are replacing :s1: with :s1:\n.
Change your sed command like this:
sed 's/:sl:/\n/g' singleline.txt
You can do it more easily with tr : tr '\n' ' ' < singleline.txt

Resources