Substitution from 2nd to 4th occurrence using sed Command - unix

I want to substitute the p with "#" from 2nd occurrence to 4th occurrence.
sed 's/p/#/2g' file.txt
This command substitutes from the 2nd occurrence up to the last occurrence of "p".
But 1 want to substitute from 2nd to 4th.
So how to do it ?

Assuming this is bash or zsh, you can make use of brace expansion.
sed -e's/p/#/2'{,,} file
{,,} will repeat -e's/p/#/2' thrice, so it'll replace 2nd, 3rd and 4th ps.

This might work for you (GNU sed):
sed 's/p/\n/5g;s/p/#/2g;y/\n/p/' file
Replace the 5th and subsequent p's on a line with newline, replace the 2nd and subsequent p's by # and finally restore newlines to p's.
Or, with kudos to oguz ismail:
sed -e's/p/#/'{4..2} file
This uses bash expansion to generate the substitution strings for occurrences of p's 2 to 4 but in reverse (forward substitution would not take into account previous substitutions).

Related

How to use sed to group date/time?

I have a text
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|2021-04-10T08:46:33-0500
I want to change text 2021-04-10T08:46:33-0500 to 10/04/2021 08:46:33
I try use this command
sed -n "s/|\([0-2][0-9][0-9][0-9]\)-\([0-1][0-9]\)-\([1-3][0-9]\)\(T\)\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]\)\(-[0-1][0-9][0][0]\)/|\3\/\2\/\1 \5 /p" filename
but some text hasn't change
Using sed
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\1\4\/\3\/\2 \5/' input_file
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|10/04/2021 08:46:33
\(.*|\) - Match till the last occurance of | pipe symbol
\([^-]*\) - Match till the next occurance of - slash. Stores 2021 and 04 which can be returned with \2 and \3 back reference
\([^T]*\) - Match till the next occurance of T capital T. Stores 10 which can be returned with \4 back reference
T - Exclude the T
\([^-]*\) - Match till the next occurance of - slash. Stores 08:46:33 which can be returned with \5 back reference
.* - Exclude everything else
If your intent is to return only the date and time, you can remove the first back reference
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\4\/\3\/\2 \5/' input_file
10/04/2021 08:46:33
With your shown samples, please try following sed program.
sed -E 's/(.*\\|)([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})-.*/\1\4\/\3\/\2 \5/' Input_file
Explanation: Using sed program's back reference capability here to store matched values into temp buffer and use them later on in substitution. In main sed program using -E option to enable ERE(extended regular expression) then using s option to perform substitution. First creating 5 capturing group to match 7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|(in first capturing group), 2021(in 2nd capturing group), 04(in 3rd capturing group), 10(in 4th) and 08 :46:33(in 5th capturing group). And while substituting them keeping order to capturing group as per OP's needed order since OP wants 2021-04-10T08:46:33-0500 to be changed to 10/04/2021 08:46:33.
This might work for you (GNU sed):
sed -E 's#\|(....)-(..)-(..)T(..:..:..)-....$#|\3/\2/\1 \4#' file
Pattern match and using back references format as required.
N.B. The use of the | and $ to anchor the pattern to the last field on the line and the nature of the dashes, colons and the capital T make it most unlikely any other string will match, so a dot can be used to match the digits, but if you like replace .'s by [0-9]'s. Also the # is used as alternative delimiter to the normal / in the substitution command s#...#...# as / appear in the replacement string.

Replace double consonant letters with one using sed command

How to replace double consonants with only one letter using sed Linux command. Example: WILLIAM -> WILIAM. grep -E '(.)\1+' commands finds the words that follow two same consonants in a row pattern, but how do I replace them with only one occurrence of the letter?
I tried
cat test.txt | head | tr -s '[^AEUIO\n]' '?'
tr is all or nothing; it will replace all occurrences of the selected characters, regardless of context. For regex replacement, look at sed - you even included this in your question's tags, but you don't seem to have explored how it might be useful?
sed 's/\(.\)\1/\1/g' test.txt
The dot matches any character; to restrict to only consonants, change it to [b-df-hj-np-tv-xz] or whatever makes sense (maybe extend to include upper case; perhaps include accented characters?)
The regex dialect understood by sed is more like the one understood by grep without -E (hence all the backslashes); though some sed implementations also support this option to select the POSIX extended regular expression dialect.
Neither sed not tr need cat to read standard input for them (though tr obscurely does not accept a file name argument). See tangentially also Useless use of cat?
Match one consonant, remember it in \( \), then match is again with \1 and substitute it for itself.
sed 's/\([bcdfghjklmnpqrstvxzBCDFGHJKLMNPQRSTVXZ]\)\1/\1/'

replace all commas except last one between two equal sign

I have a requirement that, I have a file and for each line I need to replace all commas except the last one between two equal sign. Can anyone help on this.
(Prefer sed command and no looping condition)
File's data-->>
STREET:1:1=Zwaneweg 23, Box 0001, PIN002,TOWN.COUNTRY:1:1=BE/Schilde
Should be-->>
STREET:1:1=Zwaneweg 23? Box 0001? PIN002,TOWN.COUNTRY:1:1=BE/Schilde
Try something like this:
mayankp#mayank:~/Documents$ cat tt.txt
STREET:1:1=Zwaneweg 23, Box 0001, PIN002,TOWN.COUNTRY:1:1=BE/Schilde
mayankp#mayank:~/Documents$ cat tt.txt| grep -o -P '(?<==).*(?==)'| rev |sed 's/,/?/2g' |rev > out.txt
mayankp#mayank:~/Documents/$ cat out.txt
Zwaneweg 23? Box 0001? PIN002,TOWN.COUNTRY:1:1
Now merge out.txt with tt.txt to retain missed data.
mayankp#mayank:~/Documents/$ perl -0777 -i -pe "s/(=).*(=)/\$1`cat out.txt`\$2/s" tt.txt
mayankp#mayank:~/Documents$ cat t3.txt
STREET:1:1=Zwaneweg 23? Box 0001? PIN002,TOWN.COUNTRY:1:1=BE/Schilde
With sed you can remember matches and restore them.
When you only want to replace the second-last comma, you can use
sed -r 's/(=.*),(.*,.*=)/\1?\2/' inputfile
The wildcard is greedy, when you have 8 commas between the equal signs, the seventh will be replaced.
You can tell sed to repeat his instruction until it doesn't find a match witch a label.
The label :a is inserted in front of the replace, and the "turnback" is instructed with ta. The command becomes
sed -r ':a;s/(=.*),(.*,.*=)/\1?\2/;ta' inputfile
When you have more than 2 equal sign, you must know where to look. This command will replace take the first ant last equal sign:
echo '1,a=2,b,b,b,=3,c=Only, this part, should have, the commas, except this one, replaced=5,e,e'|
sed -r ':a;s/(=.*),(.*,.*=)/\1?\2/;ta'
1,a=2?b?b?b?=3?c=Only? this part? should have? the commas? except this one, replaced=5,e,e
When you only want the replacements done between the last 2 equal signs, you need to replace the wildcard . with everything except the equal sign [^=], what will give an even harder to read command
echo '1,a=2,b,b,b,=3,c=Only, this part, should have, the commas, except this one, replaced=5,e,e'|
sed -r ':a;s/(=[^=]*),([^=]*,[^=]*=)([^=]*)$/\1?\2\3/;ta'
1,a=2,b,b,b,=3,c=Only? this part? should have? the commas? except this one, replaced=5,e,e

Replace characters in a delimited part of a file

I have the file teste.txt with the following content:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 1WFG3SNVDG9 71530894204
I execute the command
sed -e 's/^\(.\{18\}\)[0-9]/\1#/g' teste.txt
The result is:
02183101399205000 GBTD9VBYMBQ 04455927964
02183101409310000 XBQMPL1C93B 27699484827
54183101003651000 #WFG3SNVDG9 71530894204
Only the 19th position in line 3 is changed from 1 to #.
I would like to know how can I change all numeric characters from the 19th to the 30th position.
The expected result is:
02183101399205000 GBTD#VBYMBQ 04455927964
02183101409310000 XBQMPL#C##B 27699484827
54183101003651000 #WFG#SNVDG# 71530894204
An awk command to accomplish your goal:
awk '{ gsub(/[0-9]/,"#",$2); print }' teste.txt
This might work for you (GNU sed):
sed -r 's/./&\n/30;s//\n&/19;h;s/[0-9]/#/g;H;x;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Surround the string, which is from the 19th to the 30th character, by newlines and make a copy. Replace all digits by #'s. Append this string to the original and use pattern matching to rearrange the strings to make a new string with the unchanged parts either side of the changed part, at the same time discarding the introduced newlines.
An alternative method, utilising the fact the the fields are space separated:
sed -r ':a;s/( \S*)[0-9](\S* )/\1#\2/;ta' file
In fact the two methods can be combined:
sed -r 's/./&\n/30;s//\n&/19;:a;s/(\n.*)[0-9](.*\n)/\1#\2/;ta;s/\n//g' file

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

Resources