sed command does not work for certain characters in unix - unix

I am trying to replace 300th character and add a positive sign with decimal point accordingly. Sed command works fine perfectly for all characters except for B, F and {.
Please find the input data below:
result_PHDPTRAR2.txt
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E0000001880498A9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000
Here 300th character is A. If we use following sed command , it works correctly for the above requirement:
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\(.*\)A/\1\2+\3.\4^/' <<< cat result_PHDPTRAR2.txt
It will replace A with ^ and get the following result.
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E000+000188049.8^9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000
But the same commend does not work if we replace 300th character with B, F or {.
if i change 300th character of input(result_PHDPTRAR2.txt) with B and then if i use sed
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\(.*\)B/\1\2+\3.\4^/' <<< cat result_PHDPTRAR2.txt
i get following result :
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E000+000188049.8B9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773^0000000000000{000000000000
You can find + and decimal point are added correctly in "+000188049.8B" but B remains same . Here B should be replaced with ^
Can anyone please help me?

The problem is that the first 'B' character in the input comes later than the 4..300 character. I.e. the input text doesn't match your expectations.
So, what now?
Update
Based on the comment, the problem is that there's more than 1 B in the text after the 300th character. The .* will go to that point. This is how to fix it:
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\([^B]*\)B/\1\2+\3.\4^/'
Watch out for the negated character class: \([^B]*\)B - that will go up to the 1st B. Unfortunately, sed doesn't have non-greedy quantifiers. That would make it even more easy: \(.*?\)B.

Related

How to use sed to group date/time?

I have a text
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|2021-04-10T08:46:33-0500
I want to change text 2021-04-10T08:46:33-0500 to 10/04/2021 08:46:33
I try use this command
sed -n "s/|\([0-2][0-9][0-9][0-9]\)-\([0-1][0-9]\)-\([1-3][0-9]\)\(T\)\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]\)\(-[0-1][0-9][0][0]\)/|\3\/\2\/\1 \5 /p" filename
but some text hasn't change
Using sed
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\1\4\/\3\/\2 \5/' input_file
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|10/04/2021 08:46:33
\(.*|\) - Match till the last occurance of | pipe symbol
\([^-]*\) - Match till the next occurance of - slash. Stores 2021 and 04 which can be returned with \2 and \3 back reference
\([^T]*\) - Match till the next occurance of T capital T. Stores 10 which can be returned with \4 back reference
T - Exclude the T
\([^-]*\) - Match till the next occurance of - slash. Stores 08:46:33 which can be returned with \5 back reference
.* - Exclude everything else
If your intent is to return only the date and time, you can remove the first back reference
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\4\/\3\/\2 \5/' input_file
10/04/2021 08:46:33
With your shown samples, please try following sed program.
sed -E 's/(.*\\|)([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})-.*/\1\4\/\3\/\2 \5/' Input_file
Explanation: Using sed program's back reference capability here to store matched values into temp buffer and use them later on in substitution. In main sed program using -E option to enable ERE(extended regular expression) then using s option to perform substitution. First creating 5 capturing group to match 7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|(in first capturing group), 2021(in 2nd capturing group), 04(in 3rd capturing group), 10(in 4th) and 08 :46:33(in 5th capturing group). And while substituting them keeping order to capturing group as per OP's needed order since OP wants 2021-04-10T08:46:33-0500 to be changed to 10/04/2021 08:46:33.
This might work for you (GNU sed):
sed -E 's#\|(....)-(..)-(..)T(..:..:..)-....$#|\3/\2/\1 \4#' file
Pattern match and using back references format as required.
N.B. The use of the | and $ to anchor the pattern to the last field on the line and the nature of the dashes, colons and the capital T make it most unlikely any other string will match, so a dot can be used to match the digits, but if you like replace .'s by [0-9]'s. Also the # is used as alternative delimiter to the normal / in the substitution command s#...#...# as / appear in the replacement string.

How do I tell sed to repeat substitution until no match was replaced?

How do I tell sed to repeat substitution until no match was replaced?
If doing echo x | sed 's/x/xx/g' I'm really glad sed doesn't restart on the output.
But if I have, say, echo 'x,a,b,x,x,c,x,d,e,x,x,x,f,x' | sed 's/,x,/,y,/g'
it does not substitute every x for y, for an obvious reason: the prior substitution has already consumed the surrounding delimiters.
And I'm aware that I have a tiny problem with the first and last x as well, but I ignore this for simplicity of the question.
Edit: I have to clarify the question, as already mentioned but only in comments: I want to see every x replaced by y, but only if it was a single word for itself, enclosed by delimiters, commas in this example, but if there is a way to cope with more complex delimiters, this will be welcome.
(No way to fall into the y2k trap, replacing Monday by Mondak, just joking.)
Use \b as a word delimiter.
$ echo 'x,xx,x,x' | sed 's/\bx\b/y/g'
y,xx,y,y
\b denotes word boundaries, but even used within a capture group it's not going to cause replacement of the characters outside the word, if any.
Try this:
$ echo 'x,a,b,x,x,c,x,d,e,x,x,x,f,x' | sed 's/x/y/g'
y,a,b,y,y,c,y,d,e,y,y,y,f,y
What about
$ echo 'x,a,b,x,x,c,x,d,e,x,x,x,f,x' | sed ':label; s/,x,/,y,/g; t label;'
x,a,b,y,y,c,y,d,e,y,y,y,f,x
? This will not replace the x at the edges but that was not requested explicitly.
It will also not replace x in words:
$ echo 'x,a,b,fix,x,c,x,d,e,x,x,x,f,x' | sed ':label; s/,x,/,y,/g; t label;'
x,a,b,fix,y,c,y,d,e,y,y,y,f,x
Explanation: the t command will jump to the label if some substition took place. It will the apply the same sed expression to the line again.

How does the below sed command work?

Can anyone help me to understand the below sed command?
These are the values I am using:
InsertPoint - 2
TOT - 15
Count- it is the csv file, input to this command.
sed -e ''"${InsertPoint}"'s/^[^,]*,//' -e ''"${InsertPoint}"'s/$/, '"${TOT}"'/' ${Count}
I need to know, what they are replacing with what?
There are two substitution commands here, applied on line 2 of your file. The first one removes first field, the second one adds a field at the end of the line with value 15.
Basic substitution command with sed have the syntax s/old text/replacement text/ where s can be preceded with a line number to apply the command to, so:
'"${InsertPoint}"': at line 2(value of ${InsertPoint})
s/^[^,]*,// removes first field by replacing from the start of line(^) any number of non comma characters([^,]*) followed by comma(,) with nothing(//)
'"${InsertPoint}"': at line 2
s/$/, '"${TOT}"'/: adds a new field by replacing end of line $ with 15(value of , ${TOT})
sed command is applied on ${Count} value file

Unix ksh replace 2nd to last character of a variable

I am on a UNIX system that uses the Korn Shell. I'm a UNIX beginner. Here is what I want to do:
var1=user/hYbMj8d#RM1
I would like to replace only the 2nd to the last letter, which will always be an M if that matters, and change it to an O, and and store this as a new it into a new variable.
So the new variable will contain:
var2=user/hYbMj8d#RO1
Try this with sed:
sed 's/M\(.$\)/O\1/'
Full solution for storing updated var1 in var2:
var2=`echo $var1 | sed 's/M\(.$\)/O\1/'`
Explanation: using sed replace command: s/<find>/<replace>/. It searches for M.$ pattern where $ is EOL - so it will be 2nd character from the end. Then it captures remainder after M character (escaped parentheses - in sed capture groups should be escaped) and replaces it with O character and remainder after M - what was captured (it is special reference \1 - captured group #1).

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

Resources