I have a text file in the below format . I need to remove the text between the first and second semicolon (delimiter ), but retain the second semicolon
$cat test.txt
abc;def;ghi;jkl
mno;pqr;stu,xxx
My expected output
abc;ghi;jkl
mno;stu,xxx
I tried using sed 's/^([^;][^;]*);.*$/\1/', but it removes everything after the first semicolon. I also tried with cut -d ';' -f2, this only give the 2nd field as output.
Using cut
cut -d";" -f2 --complement file
-d is for delimeter, i.e ";" in your case
-f is for field, i.e keep the fields listed
--complement is to reverse the selection, i.e remove the fields listed
So:
$ cat test.txt
abc;def;ghi;jkl
mno;pqr;stu;xxx
$ cut -d";" -f2 --complement test.txt
abc;ghi;jkl
mno;stu;xxx
You may use this sed:
sed 's/;[^;]*//' file
abc;ghi;jkl
mno;stu,xxx
You can do it directly by simply removing the 2nd occurrence of the characters in question, e.g.
sed 's/[^;]*;//2' test.txt
Example Use/Output
$ sed 's/[^;]*;//2' test.txt
abc;ghi;jkl
mno;stu,xxx
A thanks to #EdMorton for improvements here as well.
If you did want to use awk, you could simply replace the 2nd field with nothing as well, e.g.
awk -F';' '{sub(/;[^;]*/,"")}1' test.txt
(same output)
With a thanks to #EdMorton for the improvement to the original.
Or as Cyrus suggest with cut, deleting field 2, e.g.
cut -d';' -f-1,3- test.txt
(same output)
Trying to fix OP's attempts here, with sed you could try following code. Simple explanation would be, create 1st back reference which has value till 1st occurrence of ; then from 1st ; to 2nd ; don't keep it in backreference and keep rest of the value in 2nd back reference. Finally while substituting substitute it with 1st and 2nd back reference values.
sed -E 's/^([^;]*);[^;]*;(.*)/\1;\2/' Input_file
OR as per Ed's comment please try following;
sed -E 's/^([^;]*);[^;]*/\1/' Input_file
super lazy awk solution
gawk/mawk/mawk2 'sub(/;[^;]+/,"")'
a more verbose solution but makes it clearer what it's doing
g/mawk 'BEGIN {FS=";+"; OFS=";"} ($2="")||($0=$0)&&($1=$1)'
clean out 2nd field, but since null string is assigned in, it returns 0 (false), thus requiring logical or || to continue.
$0=$0 plus $1=$1 to clean up extra ;, which will also print it.
I have config.xml. Here I need to retrieve the value of the attribute from the xpath
/domain/server/name
I can only use grep/sed/awk. Need Help
The content of the xml is below where I need to retrieve the Server Name only.
<domain>
<server>
<name>AdminServer</name>
<port>1234</port>
</server>
<server>
<name>M1Server</name>
<port>5678</port>
</server>
<machine>
<name>machine01</name>
</machine>
<machine>
<name>machine02</name>
</machine>
</domain>
The output should be :
AdminServer
M1Server
I tried to do,
sed -ne '/<\/name>/ { s/<[^>]*>(.*)<\/name>/\1/; p }' config.xml
sed is only for simple substitutions on individual lines, doing anything else with sed is strictly for mental exercise, not for real code. That's not what you are trying to do so you shouldn't even be considering sed. Just use awk:
$ awk -F'[<>]' 'p=="server" && $2=="name"{print $3} {p=$2}' file
AdminServer
M1Server
That will work with any awk on any UNIX box. If that's not all you need then edit your question to provide more truly representative sample input and expected output.
Try this command. Name your xml and supply that file as an input.
awk '/<server>/,/<\/server>/' < name.xml | grep "name" | cut -d ">" -f2 | cut -d "<" -f1
OutPut:
AdminServer
M1Server
Based on your sample Input_file shown, could you please try following.
awk -F"[><]" '/<\/server>/{a="";next} /<server>/{a=1;next} a && /<name>/{print $3}' Input_file
sed -n '/<server>/{n;s/\s*<[^>]*>//gp}'
for example. for the first match
1. /<server>/
match the line that contains "<server>" got " <server>"
2. n
the "n" command will go to next line. after executed "n" command got " <name>AdminServer</name>"
3.s/\s*<[^>]*>//gp
replece all "\s*<[^>]*>" as "". then print the pattern space
type "info sed" for more sed command
You can get the desired output with just sed:
sed -n 's:.*<name>\(.*\)</name>.*:\1:p' config.xml
I feel dirty parsing XML in awk.
The following finds the correct depth of entry with the right tag name. It does not verify the path, though it depends on the elements you specified. While this works on your example data, it makes certain ugly assumptions and it's not guaranteed to work elsewhere:
awk -F'[<>]' '$2~/^(domain|server|name)$/{n++} $1~/\// {n--} n==3&&$2=="name"{print $3}' input.xml
A better solution would be to parse the XML itself.
$ awk -F'[<>]' -v check="domain.server.name" '$2~/^[a-z]/ { path=path "." $2; closex="</"$2">" } $0~closex { sub(/\.[^.]$/,"",path) } substr(path,2)==check {print path " = " $3}' input.xml
.domain.server.name = AdminServer
Here it is split out for easier commenting.
$ awk -F'[<>]' -v check="domain.server.name" '
# Split fields around pointy brackets. Supply a path to check.
$2~/^[a-z]/ { # If we see an open tag,
path=path "." $2 # append the current tag to our path,
closex="</"$2">" # compose a close tag which we'll check later.
}
$0~closex { # If we see a close tag,
sub(/\.[^.]$/,"",path) # truncate the path.
}
substr(path,2)==check { # If we match the given path,
print path " = " $3 # print the result.
}
' input.xml
Note that this solution barfs horribly if you feed it badly formatted XML. The recognition of tags could be improved, but may be sufficient if you have consistently formatted XML. It may barf horribly for other reasons too. Do not do this. Install the correct tools to parse XML properly.
I need to remove all the blank lines from an input file and write into an output file. Here is my data as below.
11216,33,1032747,64310,1,0,0,1.878,0,0,0,1,1,1.087,5,1,1,18-JAN-13,000603221321
11216,33,1033196,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,059762153003
11216,33,1033246,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,000603211032
11216,33,1033280,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,055111034001
11216,33,1033287,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000378689701
11216,33,1033358,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000093737301
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041926
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041954
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049326
11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049383
11216,33,1036985,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000093415580
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781202001
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781261305
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781603955
11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781615746
sed -i '/^$/d' foo
This tells sed to delete every line matching the regex ^$ i.e. every empty line. The -i flag edits the file in-place, if your sed doesn't support that you can write the output to a temporary file and replace the original:
sed '/^$/d' foo > foo.tmp
mv foo.tmp foo
If you also want to remove lines consisting only of whitespace (not just empty lines) then use:
sed -i '/^[[:space:]]*$/d' foo
Edit: also remove whitespace at the end of lines, because apparently you've decided you need that too:
sed -i '/^[[:space:]]*$/d;s/[[:space:]]*$//' foo
awk 'NF' filename
awk 'NF > 0' filename
sed -i '/^$/d' filename
awk '!/^$/' filename
awk '/./' filename
The NF also removes lines containing only blanks or tabs, the regex /^$/ does not.
Use grep to match any line that has nothing between the start anchor (^) and the end anchor ($):
grep -v '^$' infile.txt > outfile.txt
If you want to remove lines with only whitespace, you can still use grep. I am using Perl regular expressions in this example, but here are other ways:
grep -P -v '^\s*$' infile.txt > outfile.txt
or, without Perl regular expressions:
grep -v '^[[:space:]]*$' infile.txt > outfile.txt
sed -e '/^ *$/d' input > output
Deletes all lines which consist only of blanks (or is completely empty). You can change the blank to [ \t] where the \t is a representation for tab. Whether your shell or your sed will do the expansion varies, but you can probably type the tab character directly. And if you're using GNU or BSD sed, you can do the edit in-place, if that's what you want, with the -i option.
If I execute the above command still I have blank lines in my output file. What could be the reason?
There could be several reasons. It might be that you don't have blank lines but you have lots of spaces at the end of a line so it looks like you have blank lines when you cat the file to the screen. If that's the problem, then:
sed -e 's/ *$//' -e '/^ *$/d' input > output
The new regex removes repeated blanks at the end of the line; see previous discussion for blanks or tabs.
Another possibility is that your data file came from Windows and has CRLF line endings. Unix sees the carriage return at the end of the line; it isn't a blank, so the line is not removed. There are multiple ways to deal with that. A reliable one is tr to delete (-d) character code octal 15, aka control-M or \r or carriage return:
tr -d '\015' < input | sed -e 's/ *$//' -e '/^ *$/d' > output
If neither of those works, then you need to show a hex dump or octal dump (od -c) of the first two lines of the file, so we can see what we're up against:
head -n 2 input | od -c
Judging from the comments that sed -i does not work for you, you are not working on Linux or Mac OS X or BSD — which platform are you working on? (AIX, Solaris, HP-UX spring to mind as relatively plausible possibilities, but there are plenty of other less plausible ones too.)
You can try the POSIX named character classes such as sed -e '/^[[:space:]]*$/d'; it will probably work, but is not guaranteed. You can try it with:
echo "Hello World" | sed 's/[[:space:]][[:space:]]*/ /'
If it works, there'll be three spaces between the 'Hello' and the 'World'. If not, you'll probably get an error from sed. That might save you grief over getting tabs typed on the command line.
grep . file
grep looks at your file line-by-line; the dot . matches anything except a newline character. The output from grep is therefore all the lines that consist of something other than a single newline.
with awk
awk 'NF > 0' filename
To be thorough and remove lines even if they include spaces or tabs something like this in perl will do it:
cat file.txt | perl -lane "print if /\S/"
Of course there are the awk and sed equivalents. Best not to assume the lines are totally blank as ^$ would do.
Cheers
You can sed's -i option to edit in-place without using temporary file:
sed -i '/^$/d' file