Unix ksh replace 2nd to last character of a variable - unix

I am on a UNIX system that uses the Korn Shell. I'm a UNIX beginner. Here is what I want to do:
var1=user/hYbMj8d#RM1
I would like to replace only the 2nd to the last letter, which will always be an M if that matters, and change it to an O, and and store this as a new it into a new variable.
So the new variable will contain:
var2=user/hYbMj8d#RO1

Try this with sed:
sed 's/M\(.$\)/O\1/'
Full solution for storing updated var1 in var2:
var2=`echo $var1 | sed 's/M\(.$\)/O\1/'`
Explanation: using sed replace command: s/<find>/<replace>/. It searches for M.$ pattern where $ is EOL - so it will be 2nd character from the end. Then it captures remainder after M character (escaped parentheses - in sed capture groups should be escaped) and replaces it with O character and remainder after M - what was captured (it is special reference \1 - captured group #1).

Related

How to use sed to group date/time?

I have a text
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|2021-04-10T08:46:33-0500
I want to change text 2021-04-10T08:46:33-0500 to 10/04/2021 08:46:33
I try use this command
sed -n "s/|\([0-2][0-9][0-9][0-9]\)-\([0-1][0-9]\)-\([1-3][0-9]\)\(T\)\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]\)\(-[0-1][0-9][0][0]\)/|\3\/\2\/\1 \5 /p" filename
but some text hasn't change
Using sed
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\1\4\/\3\/\2 \5/' input_file
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|10/04/2021 08:46:33
\(.*|\) - Match till the last occurance of | pipe symbol
\([^-]*\) - Match till the next occurance of - slash. Stores 2021 and 04 which can be returned with \2 and \3 back reference
\([^T]*\) - Match till the next occurance of T capital T. Stores 10 which can be returned with \4 back reference
T - Exclude the T
\([^-]*\) - Match till the next occurance of - slash. Stores 08:46:33 which can be returned with \5 back reference
.* - Exclude everything else
If your intent is to return only the date and time, you can remove the first back reference
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\4\/\3\/\2 \5/' input_file
10/04/2021 08:46:33
With your shown samples, please try following sed program.
sed -E 's/(.*\\|)([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})-.*/\1\4\/\3\/\2 \5/' Input_file
Explanation: Using sed program's back reference capability here to store matched values into temp buffer and use them later on in substitution. In main sed program using -E option to enable ERE(extended regular expression) then using s option to perform substitution. First creating 5 capturing group to match 7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|(in first capturing group), 2021(in 2nd capturing group), 04(in 3rd capturing group), 10(in 4th) and 08 :46:33(in 5th capturing group). And while substituting them keeping order to capturing group as per OP's needed order since OP wants 2021-04-10T08:46:33-0500 to be changed to 10/04/2021 08:46:33.
This might work for you (GNU sed):
sed -E 's#\|(....)-(..)-(..)T(..:..:..)-....$#|\3/\2/\1 \4#' file
Pattern match and using back references format as required.
N.B. The use of the | and $ to anchor the pattern to the last field on the line and the nature of the dashes, colons and the capital T make it most unlikely any other string will match, so a dot can be used to match the digits, but if you like replace .'s by [0-9]'s. Also the # is used as alternative delimiter to the normal / in the substitution command s#...#...# as / appear in the replacement string.

Replace double consonant letters with one using sed command

How to replace double consonants with only one letter using sed Linux command. Example: WILLIAM -> WILIAM. grep -E '(.)\1+' commands finds the words that follow two same consonants in a row pattern, but how do I replace them with only one occurrence of the letter?
I tried
cat test.txt | head | tr -s '[^AEUIO\n]' '?'
tr is all or nothing; it will replace all occurrences of the selected characters, regardless of context. For regex replacement, look at sed - you even included this in your question's tags, but you don't seem to have explored how it might be useful?
sed 's/\(.\)\1/\1/g' test.txt
The dot matches any character; to restrict to only consonants, change it to [b-df-hj-np-tv-xz] or whatever makes sense (maybe extend to include upper case; perhaps include accented characters?)
The regex dialect understood by sed is more like the one understood by grep without -E (hence all the backslashes); though some sed implementations also support this option to select the POSIX extended regular expression dialect.
Neither sed not tr need cat to read standard input for them (though tr obscurely does not accept a file name argument). See tangentially also Useless use of cat?
Match one consonant, remember it in \( \), then match is again with \1 and substitute it for itself.
sed 's/\([bcdfghjklmnpqrstvxzBCDFGHJKLMNPQRSTVXZ]\)\1/\1/'

Related to sed command

What does that mean?
I didn't evaluate this
sed -n "s/,*Receivsed,\([0-9]\+\), *Sent,\([0-9]\+\),*/\1\2/p"
It captures the number after Receivsed (sic!) into \1 and the number after Sent into \2, then replaces the whole substring with just these two numbers, and prints the line.
You can try it with
echo ',Receivsed,123,Sent,456' |
sed -n "s/,*Receivsed,\([0-9]\+\), *Sent,\([0-9]\+\),*/\1\2/p"
In detail:
-n reads the input line by line, but doesn't print anything if not told to
,* matches zero or more commas
\(...\) creates a capture group, groups are numbered from \1
[0-9]\+ matches one or more digits
, * matches a comma followed by zero or more spaces
s/PATTERN/REPLACEMENT/ replaces PATTERN by REPLACEMENT
the final /p means the result of the substitution is printed if the match was successful

I need a sed command to change a phone number format from 999-999-9999 to (999)999-9999

I need a sed command to change a phone number format from 999-999-9999 to (999)999-9999.
Here is what I've been trying:
sed 's/[[:digit:]]\-[[:digit:]]\-[[:digit:]]/\([[:digit:]]\)[[:digit:]]\-[[:digit:]]/gp'
I've also tried this:
sed 's/([0-9]{3})\-([0-9]{3})\-([0-9]{4})/\(([0-9]{3}\))([0-9]{3})\-([0-9]{4})/gp'
The notation [[:digit:]] matches a single digit; you need to match repeated digits, which you do by wrapping the repeat count in \{3\} (for a fixed count; there are variable counted ranges too, but they're not relevant here, and * and so on too). And you need to capture what you match in \(…\) so you can reference them in the replacement. In the replacement, you use \1 etc to refer to captured fragments. The captures are numbered left-to-right in the order of the \( symbols.
sed 's/\([[:digit:]]\{3\}\)-\([[:digit:]]\{3\}-[[:digit:]]\{4\}\)/(\1)\2/g'
Or:
sed 's/\([0-9]\{3\}\)-\([0-9]\{3\}-[0-9]\{4\}\)/(\1)\2/g'
This is classic sed notation; you can find variants using extended regular expressions too, but you need different options depending on platform, unlike this notation. The patterns look for 3 digits (first capture), a dash, then 3 more digits, another dash and 4 digits as the second capture, and replace all that with open bracket (parenthesis in American), the first 3 digits, close bracket, and the remaining 3 digits, dash, 4 digits.
BSD (Mac OS X):
sed -E 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
GNU:
sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/g'
Note that all of these regular expressions would convert
9876-345-54321
to:
9(876)345-54321
Fixing that is less trivial, especially in sed. Using Perl:
$ echo "987-654-3210 and 2987-654-543210 and 222-333-4444 and 543-432-5544" |
> perl -p -e 's/\b([0-9]{3})-([0-9]{3}-[0-9]{4})\b/(\1)\2/g'
(987)654-3210 and 2987-654-543210 and (222)333-4444 and (543)432-5544
$
The \b marks a word boundary in PCRE. That does mean that a222-333-4444 is not matched by the Perl; you can refine things to insist on non-digit or start of string before, and non-digit or end of string after, the matching string.
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> perl -p -e 's/(^|\D)([0-9]{3})-([0-9]{3}-[0-9]{4})(\D|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Or with (BSD or GNU) sed extended regular expressions (BSD shown):
$ echo "987-654-3210 and 2987-654-543210 and a222-333-4444 and 543-432-5544" |
> sed -E 's/(^|[^0-9])([0-9]{3})-([0-9]{3}-[0-9]{4})([^0-9]|$)/\1(\2)\3\4/g'
(987)654-3210 and 2987-654-543210 and a(222)333-4444 and (543)432-5544
$
Note that the negated digit character class notation can be written [^[:digit:]] if you wish.
Iterative development helps.
$ echo 123-456-7890 | sed -r 's/([0-9]{3})-([0-9]{3}-[0-9]{4})/(\1)\2/'
(123)456-7890

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

Resources