back reference with sed command getting unexpected result - unix

File Content
abab102
cdcd103
efef105
I want the username and id separated. Here, abab is user and 102 is id.
I use the command
sed 's/\([a-z]\)\{4\}\([0-9]\)\{3\}/Username:\1 ID:\2/' file.txt
Get this
Username:b ID:2
Username:d ID:3
Username:f ID:5
But I am expecting
Username:abab ID:102
Username:cdcd ID:103
Username:efef ID:105
But using the command
sed -e 's/\([a-z]\)\{4\}/Username:&/' -e 's/\([0-9]\)\{3\}/ID:&/' file2.txt
Output
Username:ababID:102
Username:cdcdID:103
Username:efefID:105
This output is close to what I need, but still I am expecting a blank space between Username:abab ID:102.
I want to know why \1 or \2 is not working here.

\([a-z]\) - between ( ) is one letter. If you repeat the \(\) group, back reference goes to the last matched expression. Put all the repetition inside.
's/\([a-z]\{4\}\)\([0-9]\{3\}\)/Username:\1 ID:\2/'
Ugh, simpler with extended:
sed -E 's/([a-z]{4})([0-9]{3})/Username:\1 ID:\2/'

Using sed
$ sed -E 's/[[:alpha:]]{4}/Username:& ID:/' input_file
Username:abab ID:102
Username:cdcd ID:103
Username:efef ID:105

Related

Using sed to replace symbol after semicolon

Trying to make use of the sed command in order to change a word after a semicolon, like so (fileGrades.txt):
Student;Grade;Comment;
Eric;1;None;
Smith;2;None;
Thomas;1;None;
Chad;3;Nice work;
Now using sed command should find Eric and Chad and change both of their grades to 2, but leave the rest untouched. I was thinking of doing it with this method (see below), but it didn't work as it would not allow me to utilize the semicolon to know where to change the grade.
sed -i 's/Chad;*/Chad;2/g' fileGrades.txt
I also tried this method using wild cards such as *, ^ and . , but it didn't work.
You can use
sed -E -i 's/(Eric|Chad);[0-9]*/\1;2/g' fileGrades.txt
Details:
-E - POSIX ERE enabled
-i - the contents of the input file gets modified
s/(Eric|Chad);[0-9]*/\1;2 - matches and captures into Group 1 (\1) Eric or Chad, then matches ; and zero or more digits, and replaces this match with the Group 1 value, ; and 2.
See the online demo:
#!/bin/bash
s='Student;Grade;Comment;
Eric;1;None;
Smith;2;None;
Thomas;1;None;
Chad;3;Nice work;'
sed -E 's/(Eric|Chad);[0-9]*/\1;2/g' <<< "$s"
Output:
Student;Grade;Comment;
Eric;2;None;
Smith;2;None;
Thomas;1;None;
Chad;2;Nice work;
This is a tailor made problem for awk, use following awk code in your shown samples case.
awk 'BEGIN{FS=OFS=";"} FNR==1{print;next} $1=="Eric" || $1=="Chad"{$2=2} 1' Input_file
Once you are happy with above code's results then try following code to save output into Input_file itself.
awk 'BEGIN{FS=OFS=";"} FNR==1{print;next} $1=="Eric" || $1=="Chad"{$2=2} 1' Input_file > temp && mv temp Input_file

Removing comments from a datafile. What are the differences?

Let's say that you would like to remove comments from a datafile using one of two methods:
cat file.dat | sed -e "s/\#.*//"
cat file.dat | grep -v "#"
How do these individual methods work, and what is the difference between them? Would it also be possible for a person to write the clean data to a new file, while avoiding any possible warnings or error messages to end up in that datafile? If so, how would you go about doing this?
How do these individual methods work, and what is the difference
between them?
Yes, they work same though sed and grep are 2 different commands. Your sed command simply substitutes all those lines which having # with NULL. On other hand grep will simply skip or ignore those lines which will skip lines which have # in it.
You could get more information on these by man page as follows:
man grep:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The
replacement may
contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1
through \9 to
refer to the corresponding matching sub-expressions in the regexp.
Would it also be possible for a person to write the clean data to a
new file, while avoiding any possible warnings or error messages to
end up in that datafile?
yes, we could re-direct the errors by using 2>/dev/null in both the commands.
If so, how would you go about doing this?
You could try like 2>/dev/null 1>output_file
Explanation of sed command: Adding explanation of sed command too now. This is only for understanding purposes and no need to use cat and then use sed you could use sed -e "s/\#.*//" Input_file instead.
sed -e " ##Initiating sed command here with adding the script to the commands to be executed
s/ ##using s for substitution of regexp following it.
\#.* ##telling sed to match a line if it has # till everything here.
//" ##If match found for above regexp then substitute it with NULL.
That grep -v will lose all the lines that have # on them, for example:
$ cat file
first
# second
thi # rd
so
$ grep -v "#" file
first
will drop off all lines with # on it which is not favorable. Rather you should:
$ grep -o "^[^#]*" file
first
thi
like that sed command does but this way you won't get empty lines. man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

Delete Special Word Using sed

I would like to use sed to remove all occurances of this line if and only if it is this
<ab></ab>
If this line, I would not want to delete it
<ab>keyword</ab>
My attempt that's not working:
sed '/<ab></ab>/d'
Thanks for any insight. I'm not sure what's wrong as I should not have to escape anything?
I'm using a shell script named temp to execute this. My command is this:
cat foobar.html | ./temp
This is my temp shell script:
#!/bin/sh
sed -e '/td/!d' | sed '/<ab></ab>/d'
It looks like we have a couple of problems here. The first is with the / in the close-tag. sed uses this to delimit different parts of the command. Fortunately, all we have to do is escape it with \. Try:
sed '/<ab><\/ab>/d'
Here's an example on my machine:
$ cat test
<ab></ab>
<ab></ab>
<ab>test</ab>
$ sed '/<ab><\/ab>/d' test
<ab>test</ab>
$
The other problem is that I'm not sure what the purpose of sed -e '/td/!d' is. In it's default operating mode, you don't need to tell it not to delete something; just tell it exactly what you want to delete.
So, to do this on a file called input.html:
sed '/<ab><\/ab>/d' input.html
Or, to edit the file in-place, you can just do:
sed -i -e '/<ab><\/ab>/d' input.html
Additionally, sed lets you use any character you want as a delimiter; you don't have to use /. So if you'd prefer not to escape your input, you can do:
sed '\#<ab></ab>#d' input.html
Edit
In the comments, you mentioned wanting to delete lines that only contain </ab> and nothing else. To do that, you need to do what's called anchoring the match. The ^ character represents the beginning of the line for anchoring, and $ represents the end of the line.
sed '/^<\/ab>$/d' input.html
This will only match a line that contains (literally) </ab> and nothing else at all, and delete the line. If you want to match lines that contain whitespace too, but no text other than </ab>:
sed '/^[[:blank:]]*<\/ab>[[:blank:]]*$/d' input.html
[[:blank:]]* matches "0 or more whitespace characters" and is called a "POSIX bracket expression".

UNIX sed command help

sed -n '$'!p abc.txt | tail +2 > def.txt
I have the above mentioned sed command in my code - I am unable to figure out what it does -I am going through sed tutorials to find it out but am not able to - Can some one please help me in figuring out what it does - Thanks
Taking this in stages:
sed -n abc.txt
"Run abc.txt through sed, but don't print anything out."
sed -n '$!p' abc.txt
(Note that I've corrected what I think was a misplaced quote mark.)
"Run abc.txt through sed; if a line isn't the last line, print it (i.e. print all but the last line)."
I guess you know the rest, but note that tail +2 is obsolete syntax-- tail -n 2 would be better.
EDIT:
To remove the last two lines, try
sed 'N;$d'
or if that doesn't work, crude but effective:
sed '$d' | sed '$d'
As far as the sed command '$'!p is concerned:
the $ matches only the last line of the input file.
the ! negates the sense of the match (so that it matches all but the last line).
the p prints out whatever was matched.
So basically this prints all but the last line of the file.
The -n option stops sed from performing its default action (to print the pattern space) - without that, you'd get one copy of the last line and two copies of all the other lines.
The quotes around $ are to stop the shell from trying to interpret it as a shell variable - I would have quoted the lot myself ('$!p') but that's a style issue, at least on bash. Other shells like csh (which uses ! for command history retrieval) may not be so forgiving.

Why sed removes last line?

$ cat file.txt
one
two
three
$ cat file.txt | sed "s/one/1/"
1
two
Where is the word "three"?
UPDATED:
There is no line after the word "three".
As Ivan suggested, your text file is missing the end of line (EOL) marker on the final line. Since that's not present, three is printed out by sed but then immediately over-written by your prompt. You can see it if you force an extra line to be printed.
sed 's/one/1/' file.txt && echo
This is a common problem since people incorrectly think of the EOL as an indication that there's a following line (which is why it's commonly called a "newline") and not as an indication that the current line has ended.
Using comments from other posts:
older versions of sed do not process the last line of a file if no EOL or "new line" is present.
echo can be used to add a new line
Then, to solve the problem you can re-order the commands:
( cat file.txt && echo ) | sed 's/one/1/'
I guess there is no new line character after last line. sed didn't find line separator after last line and ignore it.
Update
I suggest you to rewrite this in perl (if you have it installed):
cat file.txt | perl -pe 's/one/1/'
Instead of cat'ing the file and piping into sed, run sed with the file name as an argument after the substitution string, like so:
sed "s/one/1/" file.txt
When I did it this way, I got the "three" immediately following by the prompt:
1
two
three$
A google search shows that the man page for some versions of sed (not the GNU or BSD versions, which work as you'd expect) indicate that it won't process an incomplete line (one that's not newline-terminated) at the end of a file. The solution is to ensure your files end with a newline, install GNU sed, or use awk or perl instead.
here's an awk solution
awk '{gsub("one","1")}1' file.txt

Resources