Function of d and t in sed substitution context - unix

I used a substitution generator for sed and it gives me
sed -E 's/([^ ]+)│m/\1│T/gm;t;d'
I am familiar with regular expression flags g and m, but I have never seen the t and d. After looking them up it seems that t is for testing and d for deleting. But in this particular context, what does that mean? What do they contribute to the full command?

Quoting sed manual:
d: Delete pattern space. Start next cycle.
t label: If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.
In other words, if the s command succeeded, print the resulting pattern space, else skip (delete) the line.
Another way to express this is:
sed -n -E 's/([^ ]+) m/\1 T/gmp'

sed is it's own "language" with it's own commands. It's not a "regular expression tool", but rather a "Stream EDitor".
I am familiar with regular expression flags g and m,
The s command has it's own modifiers and m is a GNU extension. I do not really see how it's used here, as sed reads here one line at a time, and m modifies the behavior of ^ and $....
But in this particular context, what does that mean?
In any possible context, t command jumps to the label if (any) s/// command was successful. If label is omitted, it jumps to the end of script.
The d command deletes pattern space, effectively removing the line in typical parsing.
The t;d is a mnemonic to remove the line if the last s command was unsuccessful. I prefer to do /pattern/!d; s//replacement/g which is more readable to my eyes.
A reference for sed behavior would be the posix sed documentation and gnu sed documentation.

Related

unix SED command to replace part of key value pair

We have requirement where i need to replace part of param value in our configuration file.
Example
key1=123-456
I need to replace the value after hyphen with new value.
I got command which is being used in other projects but i am not sure how it works.
Command
[test]$ cat test_sed_key_value.txt
key1=123-456
[test]$ sed -i -e '/key1/ s/-.*$/-789/' test_sed_key_value.txt
[test]$
[test]$ cat test_sed_key_value.txt
key1=123-789
[test]$
It will be helpful if some one can explain how the above command or is there a simpler way to do this using sed.
Here is a list of parts of that commandline, each followed by a short explanation:
sed
which tool to use
-i
flag: apply the effect directly to the processed file (whithout creating a copy of the input file)
-e
expression parameter: the sed code to apply follows
/key1/
"address": only process lines on which this regex applies, i.e. those containing the text "key1"
s/replacethis/withthis/
command: do a search-and-replace, "replacethis" and "withthis" are the next to explanations
-.*$
regex: (what is actually in the commandline instead of "replacethis") a regular expression representing a "minus" followed by anything, in any number, until the end of the line
-789
literal: (what is actually in the commandline instead of "withthis") simply that string "-789"
test_sed_key_value.txt
file parameter: process this file
I cannot think of any way to do this simpler. The shown command already uses some assumptions on the formatting of the input file.
I'd add to Yunnosch's answer that here the "replacethis" is a regexp:
-.*$
See here for an overview of the syntax of sed's regular expressions by Gnu.
Asterisk means a repetition of the previous thing, dot means any character, so .* means a sequence of characters.
$ is the end of the line.
You might want to be a bit more restrictive, since here you'd lose something in a line like this one for instance:
key1=123-456, key2=abc-def
replacing it by:
key1=123-789
removing completely the key2 part (since the .* takes all characters after the first dash until end of line).
So depending on the format of your values, you might prefer something like
-[0-9]*
(without the $), meaning a sequence of numbers after the -
or
-[0-9a-zA-Z_]
meaning a sequence of numbers or letters or underscore after the -

sed usage not able to understand

I have come across unix sed command usage and not able to understand what it does. Could you please help me to understand the usage ? If possible please share some reference to understand such usages of sed command.
sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/java/default\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
The command is simple, though it assumes GNU sed because of the way it uses the -i option; for macOS Sierra and related systems, you'd need to use -i '' in place of just -i.
Overall, it corresponds to:
sed -i '/Pattern/ s:.*:Replacement:' file
where:
-i means overwrite each input file with its edited output without creating a backup copy.
/Pattern/ is ^export JAVA_HOME; a line starting with the word export and then JAVA_HOME separated by a single space.
s:.*:Replacement: is a substitute command, using : instead of the more conventional / (often s/.*/Replacement/) as the pattern delimiter. This is done because the replacement text contains slashes. The .* matches the whole line. The rest of the material is written in place of the original export JAVA_HOME line. The \n sequence expands to a newline, so it actually produces a number of lines in the output.
file is $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
As others have pointed out, this is a sed command invocation. The command is short for "Stream EDitor" and is quite useful for modifying files programaticallly. Your best bet is to read the man pages (man sed, but I've broken down your particular command here for instructive purposes:
sed # The command
-i # Edit file in place (no backup)
'/^export JAVA_HOME/ # For every line that begins with 'export JAVA_HOME'...
s: # substitue...
.*: # the entire line with...
export JAVA_HOME=/usr/java/default
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
:' # End of command
$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh # Run on the following file
Points of interest:
Commands can be limited to a particular address range or scope. Here, the scope was a search.
The substitue command can be delimited by almost any character (usually it is /, but in this case, : was chosen to prevent escaping of the / in the filepaths
The sed expression was enclosed in ' to prevent shell expansion of variables. Although no expansions would have taken place in this scenario, it is fairly common to see the expression wrapped in ' to eliminate the possibility.

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

Delete Special Word Using sed

I would like to use sed to remove all occurances of this line if and only if it is this
<ab></ab>
If this line, I would not want to delete it
<ab>keyword</ab>
My attempt that's not working:
sed '/<ab></ab>/d'
Thanks for any insight. I'm not sure what's wrong as I should not have to escape anything?
I'm using a shell script named temp to execute this. My command is this:
cat foobar.html | ./temp
This is my temp shell script:
#!/bin/sh
sed -e '/td/!d' | sed '/<ab></ab>/d'
It looks like we have a couple of problems here. The first is with the / in the close-tag. sed uses this to delimit different parts of the command. Fortunately, all we have to do is escape it with \. Try:
sed '/<ab><\/ab>/d'
Here's an example on my machine:
$ cat test
<ab></ab>
<ab></ab>
<ab>test</ab>
$ sed '/<ab><\/ab>/d' test
<ab>test</ab>
$
The other problem is that I'm not sure what the purpose of sed -e '/td/!d' is. In it's default operating mode, you don't need to tell it not to delete something; just tell it exactly what you want to delete.
So, to do this on a file called input.html:
sed '/<ab><\/ab>/d' input.html
Or, to edit the file in-place, you can just do:
sed -i -e '/<ab><\/ab>/d' input.html
Additionally, sed lets you use any character you want as a delimiter; you don't have to use /. So if you'd prefer not to escape your input, you can do:
sed '\#<ab></ab>#d' input.html
Edit
In the comments, you mentioned wanting to delete lines that only contain </ab> and nothing else. To do that, you need to do what's called anchoring the match. The ^ character represents the beginning of the line for anchoring, and $ represents the end of the line.
sed '/^<\/ab>$/d' input.html
This will only match a line that contains (literally) </ab> and nothing else at all, and delete the line. If you want to match lines that contain whitespace too, but no text other than </ab>:
sed '/^[[:blank:]]*<\/ab>[[:blank:]]*$/d' input.html
[[:blank:]]* matches "0 or more whitespace characters" and is called a "POSIX bracket expression".

excluding first and last lines from sed /START/,/END/

Consider the input:
=sec1=
some-line
some-other-line
foo
bar=baz
=sec2=
c=baz
If I wish to process only =sec1= I can for example comment out the section by:
sed -e '/=sec1=/,/=[a-z]*=/s:^:#:' < input
... well, almost.
This will comment the lines including "=sec1=" and "=sec2=" lines, and the result will be something like:
#=sec1=
#some-line
#some-other-line
#
#foo
#bar=baz
#
#=sec2=
c=baz
My question is: What is the easiest way to exclude the start and end lines from a /START/,/END/ range in sed?
I know that for many cases refinement of the "s:::" claws can give solution in this specific case, but I am after the generic solution here.
In "Sed - An Introduction and Tutorial" Bruce Barnett writes: "I will show you later how to restrict a command up to, but not including the line containing the specified pattern.", but I was not able to find where he actually show this.
In the "USEFUL ONE-LINE SCRIPTS FOR SED" Compiled by Eric Pement, I could find only the inclusive example:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
This should do the trick:
sed -e '/=sec1=/,/=sec2=/ { /=sec1=/b; /=sec2=/b; s/^/#/ }' < input
This matches between sec1 and sec2 inclusively and then just skips the first and last line with the b command. This leaves the desired lines between sec1 and sec2 (exclusive), and the s command adds the comment sign.
Unfortunately, you do need to repeat the regexps for matching the delimiters. As far as I know there's no better way to do this. At least you can keep the regexps clean, even though they're used twice.
This is adapted from the SED FAQ: How do I address all the lines between RE1 and RE2, excluding the lines themselves?
If you're not interested in lines outside of the range, but just want the non-inclusive variant of the Iowa/Montana example from the question (which is what brought me here), you can write the "except for the first and last matching lines" clause easily enough with a second sed:
sed -n '/PATTERN1/,/PATTERN2/p' < input | sed '1d;$d'
Personally, I find this slightly clearer (albeit slower on large files) than the equivalent
sed -n '1,/PATTERN1/d;/PATTERN2/q;p' < input
Another way would be
sed '/begin/,/end/ {
/begin/n
/end/ !p
}'
/begin/n -> skip over the line that has the "begin" pattern
/end/ !p -> print all lines that don't have the "end" pattern
Taken from Bruce Barnett's sed tutorial http://www.grymoire.com/Unix/Sed.html#toc-uh-35a
I've used:
sed '/begin/,/end/{/begin\|end/!p}'
This will search all the lines between the patterns, then print everything not containing the patterns
you could also use awk
awk '/sec1/{f=1;print;next}f && !/sec2/{ $0="#"$0}/sec2/{f=0}1' file

Resources