The 'N' command in sed works differently with Cygwin's sed(GNU sed I think) & AIX's sed.
$cat > input
Apple
$cat input
Apple
$sed 'N' input
$cat > input
Apple
Orange
$sed 'N' input
Apple
Orange
$
As seen above, the first sed 'N' input command printed nothing for AIX's sed as there was no new input line. However Cygwin's sed printed Apple for the same.
Can some unix/sed guru throw some light into this? Thanks in advance.
FWIW, I just found this behavior has been documented here:
http://sed.sourceforge.net/sedfaq6.html#s6.7.5
It looks like AIX is behaving correctly, as per the POSIX standard (my italics):
[2addr]N
Append the next line of input, less its terminating < newline >, to the pattern space, using an embedded < newline > to separate the appended material from the original material. Note that the current line number changes.
If no next line of input is available, the N command verb shall branch to the end of the script and quit without starting a new cycle or copying the pattern space to standard output.
This is from http://pubs.opengroup.org/onlinepubs/009695399/utilities/sed.html.
So, you've probably found a bug (or a least a non-conformance to POSIX) in GNU sed.
Related
I am new to Unix and currently I have a large file of various data. In this file there are lines that are now redundant and will need to be removed.
In the file the pattern:
<contact contact_id="<number>" txn="D">
</contact>
Edit: There are also similar lines to the ones to be removed, an example is:
<contact contact_id="<number>" txn="N">
</contact>
I have attempted to use grep -A 1 to pick up the pattern and remove the next line however I am operating on an old version of Solaris and -A is an illegal expression.
As well as this I have attempted to use sed -e '12442,+1d' and this just give the ouput of
sed: command garbled: 12442,+1d
.
Please can you help me with a new solution.
use awk?
something like
/<contact contact_id=.* txn="D">/ { got_contact = 1; next }
got_contact == 1 { got_contact = 0; next }
{ print }
even the ancient awk should be able to handle that. (There might be a more compact solution, but this isn't code golf)
Can you use GNU sed ?
For those who want to write portable sed scripts, be aware that some implementations have been known to limit line lengths (for the pattern and hold spaces) to be no more than 4000 bytes. The POSIX standard specifies that conforming sed implementations shall support at least 8192 byte line lengths. GNU sed has no built-in limit on line length; as long as it can malloc() more (virtual) memory, you can feed or construct lines as long as you like.
The next solution starts converting the file to one long line:
tr '\n' '\r' < your_file |
sed 's#<contact contact_id=[^ ]* txn="D">\r</contact>\r##g;
s#\r#\n#g'
I have come across unix sed command usage and not able to understand what it does. Could you please help me to understand the usage ? If possible please share some reference to understand such usages of sed command.
sed -i '/^export JAVA_HOME/ s:.*:export JAVA_HOME=/usr/java/default\nexport HADOOP_PREFIX=/usr/local/hadoop\nexport HADOOP_HOME=/usr/local/hadoop\n:' $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
The command is simple, though it assumes GNU sed because of the way it uses the -i option; for macOS Sierra and related systems, you'd need to use -i '' in place of just -i.
Overall, it corresponds to:
sed -i '/Pattern/ s:.*:Replacement:' file
where:
-i means overwrite each input file with its edited output without creating a backup copy.
/Pattern/ is ^export JAVA_HOME; a line starting with the word export and then JAVA_HOME separated by a single space.
s:.*:Replacement: is a substitute command, using : instead of the more conventional / (often s/.*/Replacement/) as the pattern delimiter. This is done because the replacement text contains slashes. The .* matches the whole line. The rest of the material is written in place of the original export JAVA_HOME line. The \n sequence expands to a newline, so it actually produces a number of lines in the output.
file is $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
As others have pointed out, this is a sed command invocation. The command is short for "Stream EDitor" and is quite useful for modifying files programaticallly. Your best bet is to read the man pages (man sed, but I've broken down your particular command here for instructive purposes:
sed # The command
-i # Edit file in place (no backup)
'/^export JAVA_HOME/ # For every line that begins with 'export JAVA_HOME'...
s: # substitue...
.*: # the entire line with...
export JAVA_HOME=/usr/java/default
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
:' # End of command
$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh # Run on the following file
Points of interest:
Commands can be limited to a particular address range or scope. Here, the scope was a search.
The substitue command can be delimited by almost any character (usually it is /, but in this case, : was chosen to prevent escaping of the / in the filepaths
The sed expression was enclosed in ' to prevent shell expansion of variables. Although no expansions would have taken place in this scenario, it is fairly common to see the expression wrapped in ' to eliminate the possibility.
I am trying to replace 300th character and add a positive sign with decimal point accordingly. Sed command works fine perfectly for all characters except for B, F and {.
Please find the input data below:
result_PHDPTRAR2.txt
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E0000001880498A9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000
Here 300th character is A. If we use following sed command , it works correctly for the above requirement:
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\(.*\)A/\1\2+\3.\4^/' <<< cat result_PHDPTRAR2.txt
It will replace A with ^ and get the following result.
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E000+000188049.8^9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773B0000000000000{000000000000
But the same commend does not work if we replace 300th character with B, F or {.
if i change 300th character of input(result_PHDPTRAR2.txt) with B and then if i use sed
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\(.*\)B/\1\2+\3.\4^/' <<< cat result_PHDPTRAR2.txt
i get following result :
H009704COV2009084 PHD0000001H009700204COV2009084 PROD2015122016010418371304COVH009704COV2009084 PTR0000001H0097002C00000000140000000043610000003408092A0000000068061C0000000000000{0000002939340H0000000537585H0000003476926F0000001218378G0000000040292E0000000016497{0000000000827E000+000188049.8B9000000320436J000000004391000000001606000000000030000000000128000000000006000000004227000000000000000000000000 00000140 0000000000000{0000000000773^0000000000000{000000000000
You can find + and decimal point are added correctly in "+000188049.8B" but B remains same . Here B should be replaced with ^
Can anyone please help me?
The problem is that the first 'B' character in the input comes later than the 4..300 character. I.e. the input text doesn't match your expectations.
So, what now?
Update
Based on the comment, the problem is that there's more than 1 B in the text after the 300th character. The .* will go to that point. This is how to fix it:
sed -e 's/\(.\{1,255\}\)\(.\{1,34\}\)\(.\{1,9\}\)\([^B]*\)B/\1\2+\3.\4^/'
Watch out for the negated character class: \([^B]*\)B - that will go up to the 1st B. Unfortunately, sed doesn't have non-greedy quantifiers. That would make it even more easy: \(.*?\)B.
Consider the input:
=sec1=
some-line
some-other-line
foo
bar=baz
=sec2=
c=baz
If I wish to process only =sec1= I can for example comment out the section by:
sed -e '/=sec1=/,/=[a-z]*=/s:^:#:' < input
... well, almost.
This will comment the lines including "=sec1=" and "=sec2=" lines, and the result will be something like:
#=sec1=
#some-line
#some-other-line
#
#foo
#bar=baz
#
#=sec2=
c=baz
My question is: What is the easiest way to exclude the start and end lines from a /START/,/END/ range in sed?
I know that for many cases refinement of the "s:::" claws can give solution in this specific case, but I am after the generic solution here.
In "Sed - An Introduction and Tutorial" Bruce Barnett writes: "I will show you later how to restrict a command up to, but not including the line containing the specified pattern.", but I was not able to find where he actually show this.
In the "USEFUL ONE-LINE SCRIPTS FOR SED" Compiled by Eric Pement, I could find only the inclusive example:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
This should do the trick:
sed -e '/=sec1=/,/=sec2=/ { /=sec1=/b; /=sec2=/b; s/^/#/ }' < input
This matches between sec1 and sec2 inclusively and then just skips the first and last line with the b command. This leaves the desired lines between sec1 and sec2 (exclusive), and the s command adds the comment sign.
Unfortunately, you do need to repeat the regexps for matching the delimiters. As far as I know there's no better way to do this. At least you can keep the regexps clean, even though they're used twice.
This is adapted from the SED FAQ: How do I address all the lines between RE1 and RE2, excluding the lines themselves?
If you're not interested in lines outside of the range, but just want the non-inclusive variant of the Iowa/Montana example from the question (which is what brought me here), you can write the "except for the first and last matching lines" clause easily enough with a second sed:
sed -n '/PATTERN1/,/PATTERN2/p' < input | sed '1d;$d'
Personally, I find this slightly clearer (albeit slower on large files) than the equivalent
sed -n '1,/PATTERN1/d;/PATTERN2/q;p' < input
Another way would be
sed '/begin/,/end/ {
/begin/n
/end/ !p
}'
/begin/n -> skip over the line that has the "begin" pattern
/end/ !p -> print all lines that don't have the "end" pattern
Taken from Bruce Barnett's sed tutorial http://www.grymoire.com/Unix/Sed.html#toc-uh-35a
I've used:
sed '/begin/,/end/{/begin\|end/!p}'
This will search all the lines between the patterns, then print everything not containing the patterns
you could also use awk
awk '/sec1/{f=1;print;next}f && !/sec2/{ $0="#"$0}/sec2/{f=0}1' file
I need to have a script read the files coming in and check information for verification.
On the first line of the files to be read is a date but in numeric form. eg: 20080923
But before the date is other information, I need to read it from position 27. Meaning line 1 position 27, I need to get that number and see if it’s greater then another number.
I use the grep command to check other information but I use special characters to search, in this case the information before the date is always different, so I can’t use a character to search on. It has to be done by line 1 position 27.
sed 1q $file | cut -c27-34
The sed command reads the first line of the file and the cut command chops out characters 27-34 of the one line, which is where you said the date is.
Added later:
For the more general case - where you need to read line 24, for example, instead of the first line, you need a slightly more complex sed command:
sed -n -e 24p -e 24q | cut -c27-34
sed -n '24p;24q' | cut -c27-34
The -n option means 'do not print lines by default'; the 24p means print line 24; the 24q means quit after processing line 24. You could leave that out, in which case sed would continue processing the input, effectively ignoring it.
Finally, especially if you are going to validate the date, you might want to use Perl for the whole job (or Python, or Ruby, or Tcl, or any scripting language of your choice).
You can extract the characters starting at position 27 of line 1 like so:
datestring=`head -1 $file | cut -c27-`
You'd perform your next processing step on $datestring.