Delete Special Word Using sed - unix

I would like to use sed to remove all occurances of this line if and only if it is this
<ab></ab>
If this line, I would not want to delete it
<ab>keyword</ab>
My attempt that's not working:
sed '/<ab></ab>/d'
Thanks for any insight. I'm not sure what's wrong as I should not have to escape anything?
I'm using a shell script named temp to execute this. My command is this:
cat foobar.html | ./temp
This is my temp shell script:
#!/bin/sh
sed -e '/td/!d' | sed '/<ab></ab>/d'

It looks like we have a couple of problems here. The first is with the / in the close-tag. sed uses this to delimit different parts of the command. Fortunately, all we have to do is escape it with \. Try:
sed '/<ab><\/ab>/d'
Here's an example on my machine:
$ cat test
<ab></ab>
<ab></ab>
<ab>test</ab>
$ sed '/<ab><\/ab>/d' test
<ab>test</ab>
$
The other problem is that I'm not sure what the purpose of sed -e '/td/!d' is. In it's default operating mode, you don't need to tell it not to delete something; just tell it exactly what you want to delete.
So, to do this on a file called input.html:
sed '/<ab><\/ab>/d' input.html
Or, to edit the file in-place, you can just do:
sed -i -e '/<ab><\/ab>/d' input.html
Additionally, sed lets you use any character you want as a delimiter; you don't have to use /. So if you'd prefer not to escape your input, you can do:
sed '\#<ab></ab>#d' input.html
Edit
In the comments, you mentioned wanting to delete lines that only contain </ab> and nothing else. To do that, you need to do what's called anchoring the match. The ^ character represents the beginning of the line for anchoring, and $ represents the end of the line.
sed '/^<\/ab>$/d' input.html
This will only match a line that contains (literally) </ab> and nothing else at all, and delete the line. If you want to match lines that contain whitespace too, but no text other than </ab>:
sed '/^[[:blank:]]*<\/ab>[[:blank:]]*$/d' input.html
[[:blank:]]* matches "0 or more whitespace characters" and is called a "POSIX bracket expression".

Related

back reference with sed command getting unexpected result

File Content
abab102
cdcd103
efef105
I want the username and id separated. Here, abab is user and 102 is id.
I use the command
sed 's/\([a-z]\)\{4\}\([0-9]\)\{3\}/Username:\1 ID:\2/' file.txt
Get this
Username:b ID:2
Username:d ID:3
Username:f ID:5
But I am expecting
Username:abab ID:102
Username:cdcd ID:103
Username:efef ID:105
But using the command
sed -e 's/\([a-z]\)\{4\}/Username:&/' -e 's/\([0-9]\)\{3\}/ID:&/' file2.txt
Output
Username:ababID:102
Username:cdcdID:103
Username:efefID:105
This output is close to what I need, but still I am expecting a blank space between Username:abab ID:102.
I want to know why \1 or \2 is not working here.
\([a-z]\) - between ( ) is one letter. If you repeat the \(\) group, back reference goes to the last matched expression. Put all the repetition inside.
's/\([a-z]\{4\}\)\([0-9]\{3\}\)/Username:\1 ID:\2/'
Ugh, simpler with extended:
sed -E 's/([a-z]{4})([0-9]{3})/Username:\1 ID:\2/'
Using sed
$ sed -E 's/[[:alpha:]]{4}/Username:& ID:/' input_file
Username:abab ID:102
Username:cdcd ID:103
Username:efef ID:105

Add " to the end of any line that ends in This or this using sed in unix

I have a file where a few lines end with tux. How do I add " to the end of any line that ends in words like this or This?
You could visit this site for more examples and help about using sed in overall. Also check it's "Regular expressions" tab or search the web for something like "unix anchor characters".
For this actual problem, these are the relevant parts of the site:
Sed has the ability to specify which lines are to be examined and/or modified, by specifying addresses before the command. I will just describe the simplest version for now - the /PATTERN/ address. When used, only lines that match the pattern are given the command after the address. Briefly, when used with the /p flag, matching lines are printed twice:
sed '/PATTERN/p' file
And of course PATTERN is any regular expression.
According to these, you could use a sed command like this to get the lines ending with "this" or "This" in your file, or "tux" if you meant that:
$ sed '/[tT]his$/p' yourfile
or
$ sed '/tux$/p' yourfile
For putting the double quotes at the end of these lines, you also need to understand:
$ has a special meaning (end of the input line) as an anchor character in regular expressions
... and the character "$" is the end anchor. The expression "A$" will match all lines that end with the capital A. If the anchor characters are not used at the proper end of the pattern, then they no longer act as anchors. The "$" is only an anchor if it is the last character.
how to use sed for substitution of characters (see the linked page)
Sed has several commands, but most people only learn the substitute command: s. The substitute command changes all occurrences of the regular expression into a new value. A simple example is changing "day" in the "old" file to "night" in the "new" file:
$ sed 's/day/night/' newfile
Or another way (for UNIX beginners),
$ sed 's/day/night/' old >new
and for those who want to test this:
$ echo day | sed 's/day/night/'
This will output "night".
After these you can construct your own sed command, knowing that you can use this two parts together in one command like this:
$ sed '/[pP]atternAtTheEndOfLine$/s/$/patternToAddToEndOfTheLine/' yourfile

Sed replace only exact match

I wan't to replace a string like Europe12 with Europe12_yesturday in a file. Without changing the Europe12-36 strings that also exists in the file.
I tried:
$basename=Europe12
sed -i 's/\b$basename\b/${basename}_yesterday/g' file.txt
but this also changed the Europe12-36 strings.
Require a space or end of line character:
sed 's/Europe12\([ ]|$\)/Europe12_yesturday\1/g' input
Manually construct the delimiter list you want instead of using \b, \W or \<. - is not part of the word characters (alphanumericals), so that's why this also matches your other string. So try something like this, expanding the list as needed: [-a-zA-Z0-9].
You can do it in 2 times:
sed -e 's/Europe12/Europe12_yesturday/g' -e 's/Europe12_yesturday-36/Europe12-36/g' file.txt
sed 's/\(Europe12[[:blank:]]\)/\1_yesturday/g;s/Europe12$/&_yesturday/' YourFile
[[:blank:]] could be completeted with any boundary you accept also like .,;:/]) etc (be carrefull of regex meaning of this char in this case)
It is little late to reply..
It can be achieved easily by "word boundary" notation (\<..\>)
sed -i 's/\<$basename\>/${basename}_yesterday/g' file.txt

UNIX sed command help

sed -n '$'!p abc.txt | tail +2 > def.txt
I have the above mentioned sed command in my code - I am unable to figure out what it does -I am going through sed tutorials to find it out but am not able to - Can some one please help me in figuring out what it does - Thanks
Taking this in stages:
sed -n abc.txt
"Run abc.txt through sed, but don't print anything out."
sed -n '$!p' abc.txt
(Note that I've corrected what I think was a misplaced quote mark.)
"Run abc.txt through sed; if a line isn't the last line, print it (i.e. print all but the last line)."
I guess you know the rest, but note that tail +2 is obsolete syntax-- tail -n 2 would be better.
EDIT:
To remove the last two lines, try
sed 'N;$d'
or if that doesn't work, crude but effective:
sed '$d' | sed '$d'
As far as the sed command '$'!p is concerned:
the $ matches only the last line of the input file.
the ! negates the sense of the match (so that it matches all but the last line).
the p prints out whatever was matched.
So basically this prints all but the last line of the file.
The -n option stops sed from performing its default action (to print the pattern space) - without that, you'd get one copy of the last line and two copies of all the other lines.
The quotes around $ are to stop the shell from trying to interpret it as a shell variable - I would have quoted the lot myself ('$!p') but that's a style issue, at least on bash. Other shells like csh (which uses ! for command history retrieval) may not be so forgiving.

Why sed removes last line?

$ cat file.txt
one
two
three
$ cat file.txt | sed "s/one/1/"
1
two
Where is the word "three"?
UPDATED:
There is no line after the word "three".
As Ivan suggested, your text file is missing the end of line (EOL) marker on the final line. Since that's not present, three is printed out by sed but then immediately over-written by your prompt. You can see it if you force an extra line to be printed.
sed 's/one/1/' file.txt && echo
This is a common problem since people incorrectly think of the EOL as an indication that there's a following line (which is why it's commonly called a "newline") and not as an indication that the current line has ended.
Using comments from other posts:
older versions of sed do not process the last line of a file if no EOL or "new line" is present.
echo can be used to add a new line
Then, to solve the problem you can re-order the commands:
( cat file.txt && echo ) | sed 's/one/1/'
I guess there is no new line character after last line. sed didn't find line separator after last line and ignore it.
Update
I suggest you to rewrite this in perl (if you have it installed):
cat file.txt | perl -pe 's/one/1/'
Instead of cat'ing the file and piping into sed, run sed with the file name as an argument after the substitution string, like so:
sed "s/one/1/" file.txt
When I did it this way, I got the "three" immediately following by the prompt:
1
two
three$
A google search shows that the man page for some versions of sed (not the GNU or BSD versions, which work as you'd expect) indicate that it won't process an incomplete line (one that's not newline-terminated) at the end of a file. The solution is to ensure your files end with a newline, install GNU sed, or use awk or perl instead.
here's an awk solution
awk '{gsub("one","1")}1' file.txt

Resources