Related to sed command

Related to sed command - unix

What does that mean?
I didn't evaluate this
sed -n "s/,*Receivsed,\([0-9]\+\), *Sent,\([0-9]\+\),*/\1\2/p"

It captures the number after Receivsed (sic!) into \1 and the number after Sent into \2, then replaces the whole substring with just these two numbers, and prints the line.
You can try it with
echo ',Receivsed,123,Sent,456' |
sed -n "s/,*Receivsed,\([0-9]\+\), *Sent,\([0-9]\+\),*/\1\2/p"
In detail:
-n reads the input line by line, but doesn't print anything if not told to
,* matches zero or more commas
\(...\) creates a capture group, groups are numbered from \1
[0-9]\+ matches one or more digits
, * matches a comma followed by zero or more spaces
s/PATTERN/REPLACEMENT/ replaces PATTERN by REPLACEMENT
the final /p means the result of the substitution is printed if the match was successful

Related

How to use sed to group date/time?

I have a text
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|2021-04-10T08:46:33-0500
I want to change text 2021-04-10T08:46:33-0500 to 10/04/2021 08:46:33
I try use this command
sed -n "s/|\([0-2][0-9][0-9][0-9]\)-\([0-1][0-9]\)-\([1-3][0-9]\)\(T\)\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]\)\(-[0-1][0-9][0][0]\)/|\3\/\2\/\1 \5 /p" filename
but some text hasn't change

Using sed
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\1\4\/\3\/\2 \5/' input_file
7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|10/04/2021 08:46:33
\(.*|\) - Match till the last occurance of | pipe symbol
\([^-]*\) - Match till the next occurance of - slash. Stores 2021 and 04 which can be returned with \2 and \3 back reference
\([^T]*\) - Match till the next occurance of T capital T. Stores 10 which can be returned with \4 back reference
T - Exclude the T
\([^-]*\) - Match till the next occurance of - slash. Stores 08:46:33 which can be returned with \5 back reference
.* - Exclude everything else
If your intent is to return only the date and time, you can remove the first back reference
$ sed 's/\(.*|\)\([^-]*\)-\([^-]*\)-\([^T]*\)T\([^-]*\).*/\4\/\3\/\2 \5/' input_file
10/04/2021 08:46:33

With your shown samples, please try following sed program.
sed -E 's/(.*\\|)([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})-.*/\1\4\/\3\/\2 \5/' Input_file
Explanation: Using sed program's back reference capability here to store matched values into temp buffer and use them later on in substitution. In main sed program using -E option to enable ERE(extended regular expression) then using s option to perform substitution. First creating 5 capturing group to match 7304628626|duluth/superior|18490|2016|volvo|gas|49230|automatic|sedan|white|mn|46.815216|-92.178109|(in first capturing group), 2021(in 2nd capturing group), 04(in 3rd capturing group), 10(in 4th) and 08 :46:33(in 5th capturing group). And while substituting them keeping order to capturing group as per OP's needed order since OP wants 2021-04-10T08:46:33-0500 to be changed to 10/04/2021 08:46:33.

This might work for you (GNU sed):
sed -E 's#\|(....)-(..)-(..)T(..:..:..)-....$#|\3/\2/\1 \4#' file
Pattern match and using back references format as required.
N.B. The use of the | and $ to anchor the pattern to the last field on the line and the nature of the dashes, colons and the capital T make it most unlikely any other string will match, so a dot can be used to match the digits, but if you like replace .'s by [0-9]'s. Also the # is used as alternative delimiter to the normal / in the substitution command s#...#...# as / appear in the replacement string.

How to use egrep to find a repeated cluster of characters (e.g. abc-abc-abc)?

I'm learning how to use egrep command, i want to find some words has repeat 3 chars in one line (e.g., abc-abc-abc; ssd-ssd-ssd).
I tried some commands like
egrep '[a-z][a-z][a-z]{3}' file
grep -e'{([a-z][a-z][a-z]){3}}' file
but does not work. it just print all word has 9 chars

You can use
grep -e '\(\<[[:alnum:]]\{3\}\>\).*\<\1\>.*\<\1\>'
\<[[:alnum:]]\{3\}\> matches a word formed by exactly 3 alphanumeric chars. \<\> insures surrounding chars are not alphanumeric.
\(...\) puts the match in var \1 to be recalled later
\<\1\> matches a word whose value is exactly the same as the remembered match.

Answer (complicated example using capture groups and repeat counts):
egrep '([a-z]{3})(-\1){2}'
That matches following pattern, with hyphen as only allowed delimiter.
abc-abc-abc
ssd-ssd-ssd
zab-zab-zab
.
.
.
The above example has two sets of parens (capture groups); each captures its matched text into its its capture-group respective buffer. We only need to parenthesize the second match expression so we can give it a repeat count, and interested in the captured text of the 2nd group.
Easier Example
This is a similar case but easier to understand. It matches 3 identical lowercase letters in a row:
egrep '([a-z])\1\1'
The ([a-z]) is a capture group that matches one lowercase letter and stores the matched character in a capture group buffer. Note: Each \1 matches the captured text (in this case 1 matched character) again.
NOTE: The capture group matches the first character of the sequence, so two additional matches against saved text from the first match are required in order to match three identical characters in a row. The following example is identical to the one above, except it uses a repeat count (2) to repeat the 2nd term two times.
egrep '([a-z])\1{2}'
I tested it this way:
$ echo "aaa" | egrep '([a-z])\1{2}'
aaa
$ echo "zzz" | egrep '([a-z])\1{2}'
zzz
$ echo "zaz" | egrep '([a-z])\1{2}'
note: no output for third echo line
How Capture Groups Work
Unescaped parenthesis are use group expression elements together for repeating as a group or providing an operation on them, but also cause the matched text to be captured into an internal buffer.
The first capture group, from left to right in the regex, is \1, second is \2, third \3 ...
Anywhere you want to substitute the captured match text into your regex, use the backslash'd number corresponding to the capture group of interest.

Unix ksh replace 2nd to last character of a variable

I am on a UNIX system that uses the Korn Shell. I'm a UNIX beginner. Here is what I want to do:
var1=user/hYbMj8d#RM1
I would like to replace only the 2nd to the last letter, which will always be an M if that matters, and change it to an O, and and store this as a new it into a new variable.
So the new variable will contain:
var2=user/hYbMj8d#RO1

Try this with sed:
sed 's/M\(.$\)/O\1/'
Full solution for storing updated var1 in var2:
var2=`echo $var1 | sed 's/M\(.$\)/O\1/'`
Explanation: using sed replace command: s/<find>/<replace>/. It searches for M.$ pattern where $ is EOL - so it will be 2nd character from the end. Then it captures remainder after M character (escaped parentheses - in sed capture groups should be escaped) and replaces it with O character and remainder after M - what was captured (it is special reference \1 - captured group #1).

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456

Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.

Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

How to remove this string in file

I want to remove a string in a file, the string I want to remove is
"/package/myname:". I try to use sed to do that but could not.
Note there are a '/' at beginning and ':' at the end of the string which I do not know how to handle.
e.g. I was able to remove "package/myname" using:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
But when I run:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
the result does not replace anything.
What is the right way to remove "/package/myname:" in my case?

Problem:
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
^ ^
| |
problem is mainly because of the two word boundaries.
\< - Boundary which matches between a non-word character and a word character.
\> - Boundary which matches between a word character and a word character and a non-word character.
So in the first case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<package\/myname\>//g'
The \< before the package string matches the boundary which exists after / (non-word character) and p (word character). Likewise \> matches the boundary which exists between e (word character) and : (non-word character). So finally a match would occur and the characters which are matched are replaced by the empty string.
But in the second case,
echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\<\/package\/myname\:\>//g'
\< fails to match the boundary which exists between a and forward slash / because \< matches only between the non-word character (left) and a word character (right) . Likewise \> fails to match the boundary which exists between : and / forward slash because there isn't a word character in the left and non-word character at the right.
Solution:
So, i suggest you to remove the word boundaries <\ and />. Or, you could do like this,
$ echo 'diff a/package/myname:/src/com/abc' | sed -e 's/\>\/package\/myname\://g'
diff a/src/com/abc
I think now you could figure out the reason for the working of above command.

sed -i "s/\/package\/myname\://g;" [__YOUR_FILE_NAME__]
That removes the phrase.
Doesn't not remove the line.
grep -v # removes the line

sed 's%/package/myname:%%g'
using % instead of / to mark the ends of the sections of the substitute command. You can use any character that doesn't appear in the string. It can be quite effective to use Control-A as the delimiter, even.
You could also use:
sed 's/\/package\/myname://g'
but I prefer to avoid messing around with backslashes when there's an easy way to avoid them.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Related to sed command - unix

What does that mean? I didn't evaluate this sed -n "s/,Receivsed,\([0-9]\+\), Sent,\([0-9]\+\),*/\1\2/p"

Related

How to use sed to group date/time?

How to use egrep to find a repeated cluster of characters (e.g. abc-abc-abc)?

Unix ksh replace 2nd to last character of a variable

Insert a new line at nth character after nth occurence of a pattern via a shell script

How to remove this string in file

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Related to sed command - unix

What does that mean? I didn't evaluate this sed -n "s/,*Receivsed,\([0-9]\+\), *Sent,\([0-9]\+\),*/\1\2/p"

Related

How to use sed to group date/time?

How to use egrep to find a repeated cluster of characters (e.g. abc-abc-abc)?

Unix ksh replace 2nd to last character of a variable

Insert a new line at nth character after nth occurence of a pattern via a shell script

How to remove this string in file

Categories

Resources

What does that mean? I didn't evaluate this sed -n "s/,Receivsed,\([0-9]\+\), Sent,\([0-9]\+\),*/\1\2/p"