sed with line numbers as variables - unix

I want to get specific lines using sed, where the first line and the last line to get are stored in variables.
Here is an example, I want to get all the lines between the first line (here line number 5) and the last line (here number 8), and then I use grep to search a specific word.
firstLine=5
lastLine=8
sedResult="$(sed -n "$firstLine,$lastLine p" text.txt | grep word -aIi)"
But I am having errors. The errors look like this :
sed: -e expression #1, char 5: unexpected `,'
what is the proper way to use variables as line numbers ?

Your initial expression has broken double quotes usage. It should be:
firstLine=5
lastLine=8
sedResult=$(sed -n "$firstLine,$lastLine p" text.txt)

Related

Insert a new line at nth character after nth occurence of a pattern via a shell script

I have a single line big string which has '~|~' as delimiter. 10 fields make up a row and the 10th field is 9 characters long. I want insert a new line after each row, meaning insert a \n at 10 character after (9,18,27 ..)th occurrence of '~|~'
Is there any quick single line sed/awk option available without looping through the string?
I have used
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g'
but it will replace every 10th occurrence with a new line. I want to keep the delimiter but add a new line after 9 characters in field 10
cat test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
sed -e's/\(\([^~|~]*~|~\)\{9\}[^~|~]*\)~|~/\1\n/g' test.txt
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one
2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two
3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Below is what I want
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten123456
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
Let's try awk:
awk 'BEGIN{FS="[~|~]+"; OFS="~|~"}
{for(i=10; i<NF; i+=9){
str=$i
$i=substr(str, 1, 9)"\n"substr(str, 10, length(str))
}
print $0}' t.txt
Input:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
The output:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2‌​two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~‌​3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten123456
I assume there some error in your comment: If your input contains ten1234562one and 2ten1234563one, then the line break has to be inserted after 2 in the first case and after 6 in the second case (as this is the tenth character). But your expected output is different to this.
Your sed script wasn't too far off. This seems to do the job you want:
sed -e '/^$/d' \
-e 's/\([^~|]*~|~\)\{9\}.\{9\}/&\' \
-e '/' \
-e 'P;D' \
data
For your input file (I called it data), I get:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
The script requires a little explanation, I fear. It uses some obscure shell and some obscure sed behaviour. The obscure shell behaviour is that within a single-quoted string, backslashes have no special meaning, so the backslash before the second single quote in the second -e appears to sed as a backslash at the end of the argument. The obscure sed behaviour is that it treats the argument for each -e option as if it is a line. So, the trailing backslash plus the / after the third -e is treated as if there was a backslash, newline, slash sequence, which is how BSD sed (and POSIX sed) requires you to add a newline. GNU sed treats \n in the replacement as a newline, but POSIX (and BSD) says:
The escape sequence '\n' shall match a <newline> embedded in the pattern space.
It doesn't say anything about \n being treated as a <newline> in the replacement part of a s/// substitution. So, the first two -e options combine to add a newline after what is matched. What's matched? Well, that's a sequence of 'zero or more non-tilde, non-pipe characters followed by ~|~', repeated 9 times, followed by 9 'any characters'. This is an approximation to what you want. If you had a field such as ~|~tilde~pipe|bother~|~, the regex would fail because of the ~ between 'tilde' and 'pipe' and also because of the | between 'pipe' and 'bother'. Fixing it to handle all possible sequences like that is non-trivial, and not warranted by the sample data.
The remainder of the script is straight-forward: the -e '/^$/d' deletes an empty line, which matters if the data is exactly the right length, and in -e 'P;D' the P prints the initial segment of the pattern space up to the first newline (the one we just added); the D deletes the initial segment of the pattern space up to the first newline and starts over.
I'm not convinced this is worth the complexity. It might be simpler to understand if the script was in a file, script.sed:
/^$/d
s/\([^~|]*~|~\)\{9\}.\{9\}/&\
/
P
D
and the command line was:
$ sed -f script.sed data
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
6
$
Needless to say, it produces the same output. Without the /^$/d, the script only works because of the odd 6 at the end of the input. With exactly 9 characters after the third record, it then flops into in infinite loop.
Using extended regular expressions
If you use extended regular expressions, you can deal with odd-ball fields that contain ~ or | (or, indeed, ~|) in the middle.
script2.sed:
/^$/d
s/(([^~|]{1,}|~[^|]|~\|[^~])*~\|~){9}.{9}/&\
/
P
D
data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten1234562one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten1234563one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
Output from sed -E -f script.sed data2:
one~|~two~|~three~|~four~|~five~|~six~|~seven~|~eight~|~nine~|~ten123456
2one~|~2two~|~2three~|~2four~|~2five~|~2six~|~2seven~|~2eight~|~2nine~|~2ten12345
63one~|~3two~|~3three~|~3four~|~3five~|~3six~|~3seven~|~3eight~|~3nine~|~3ten12345
666=beast~tilde|pipe~|twiddle~|~4-two~|~4-three~|~4-four~|~4-five~|~4-six~|~4-seven~|~4-eighty-eight~|~4-999~|~987654321
That still won't handle a field like tilde~~|~. Using -E is correct for BSD (Mac OS X) sed; it enables extended regular expressions. The equivalent option for GNU sed is -r.

Sed line match plus line below

I can find my lines with this pattern, but in some case the info is on the line after the match. How can I also get the line following my match line?
sed -n '/SQL3227W Record token/p' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
Firstly, this looks like a job for grep:
grep -A 1 'SQL3227W Record token' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
(-A 1 means to print an additional 1 line After the match).
Secondly, if you're using GNU sed, you can use a second address of +1 thus:
sed -n '/SQL3227W Record token/,+1p' /log/PLAN_2015-08-16*.MSG >ERRORS.txt
Otherwise, (if you really must use non-Gnu sed), then each time you match, append the following line to your pattern space. Delete the first line, before continuing loop (in case the second line is also a match).
Untested code:
#!/bin/sed -nf
/SQL3227W Record token/{
N
P
D
}
sed is for simple substitutions on individual lines, that is all. For anything even slightly more interesting just use awk:
awk '/SQL3227W Record token/{c=2} c&&c--' file
See Printing with sed or awk a line following a matching pattern for other related idioms.

How can I merge two specific lines with sed?

I have a file that looks like this
line one
line two
line three
line four
and I want it to look like this (I want lines 2 and 3 merged into one)
line one
line two line three
line four
I've tried the following, that based on my research SHOULD do what I want:
$ sed '2s/\n/ /' test.txt > test2.txt
However test2.txt looks like this
line one
line two
line three
line four
I've seen some references to it being different on Solaris than Linux. Here's my server's details
$ uname -a
SunOS myserver 5.10 Generic_142909-17 sun4u sparc SUNW,Sun-Fire-V490
How can I make this give the results I want?
Based on the answers in the linked question, this sed should work:
sed '2N;s/\n/ /' file
2N means on the second line, append the next line to the pattern space. This results in the pattern space containing line two\nline three. The substitution (which replaces the newline \n with a space) applies to every line in the file but only has an effect here.
Otherwise, this awk would do the trick:
awk 'NR==2{printf("%s ",$0);next}1' file
On line two, use printf to print the contents of the line, followed by a space. next skips to the next line. For all other lines, the 1 at the end is effectively a {print $0} block (which prints the line).
Alternatively:
awk '{printf("%s%s",$0,NR==2?OFS:ORS)}' file
Where OFS by default is a space and ORS is a newline
Or a bit of perl:
perl -pe 's/\n/ / if $. == 2' file
$. is the line number. Substitute the newline for a space on the 2nd line.
Output (using any of the above approaches on my system):
line one
line two line three
line four

Combining -v flag and -A flag in grep

I need to search a file for a string, remove any line that contains the string, and also remove the two lines following any line that contains the string. I was hoping I could accomplish this using something like this...
$ grep -v -A 2 two temp.txt
one
five
$
...but unfortunately this did not work. Is there a simple I can do this with grep or another shell command?
The following works both with GNU sed and with OS X.
$ sed '/two/{N;N;d;}' temp.txt
one
five
find line matching two
read in two more lines
delete them
You can do this with awk, as per the following transcript:
pax> echo 'one
two
three
four
five' | awk '/two/ {skip=3} skip>0 {skip--;next} {print}'
one
five
It basically starts a counter of lines to throw away (3) whenever it finds the two string on a line. It then throws those lines away until the skip counter reaches zero. Any line that isn't marked for skipping is printed.
With GNU sed:
sed '/two/,+2d' temp.txt
This uses two-address syntax (addr1,addr2) to match lines with the word two (/two/) plus the two lines after (+2). The d command deletes those lines.
Here's a way to do it with Perl:
$ perl -ne'if (/two/){$x=<>;$x=<>;}else{print}' temp.txt
one
five
The -n is an implicit loop over the input. If you match /two/, then read the next two lines, otherwise print the line you're on.
The problem is, however, that if you had the third or fourth lines matched /two/, then you would still get the same output. #paxdiablo's solution is more complete. But mine's more Q&D.

How to delete duplicate lines in a file without sorting it in Unix

Is there a way to delete duplicate lines in a file in Unix?
I can do it with sort -u and uniq commands, but I want to use sed or awk.
Is that possible?
awk '!seen[$0]++' file.txt
seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.
From http://sed.sourceforge.net/sed1line.txt:
(Please don't ask me how this works ;-) )
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Perl one-liner similar to jonas's AWK solution:
perl -ne 'print if ! $x{$_}++' file
This variation removes trailing white space before comparing:
perl -lne 's/\s*$//; print if ! $x{$_}++' file
This variation edits the file in-place:
perl -i -ne 'print if ! $x{$_}++' file
This variation edits the file in-place, and makes a backup file.bak:
perl -i.bak -ne 'print if ! $x{$_}++' file
An alternative way using Vim (Vi compatible):
Delete duplicate, consecutive lines from a file:
vim -esu NONE +'g/\v^(.*)\n\1$/d' +wq
Delete duplicate, nonconsecutive and nonempty lines from a file:
vim -esu NONE +'g/\v^(.+)$\_.{-}^\1$/d' +wq
The one-liner that Andre Miller posted works except for recent versions of sed when the input file ends with a blank line and no characterss. On my Mac my CPU just spins.
This is an infinite loop if the last line is blank and doesn't have any characterss:
sed '$!N; /^\(.*\)\n\1$/!P; D'
It doesn't hang, but you lose the last line:
sed '$d;N; /^\(.*\)\n\1$/!P; D'
The explanation is at the very end of the sed FAQ:
The GNU sed maintainer felt that despite the portability problems
this would cause, changing the N command to print (rather than
delete) the pattern space was more consistent with one's intuitions
about how a command to "append the Next line" ought to behave.
Another fact favoring the change was that "{N;command;}" will
delete the last line if the file has an odd number of lines, but
print the last line if the file has an even number of lines.
To convert scripts which used the former behavior of N (deleting
the pattern space upon reaching the EOF) to scripts compatible with
all versions of sed, change a lone "N;" to "$d;N;".
The first solution is also from http://sed.sourceforge.net/sed1line.txt
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr '$!N;/^(.*)\n\1$/!P;D'
1
2
3
4
5
The core idea is:
Print only once of each duplicate consecutive lines at its last appearance and use the D command to implement the loop.
Explanation:
$!N;: if the current line is not the last line, use the N command to read the next line into the pattern space.
/^(.*)\n\1$/!P: if the contents of the current pattern space is two duplicate strings separated by \n, which means the next line is the same with current line, we can not print it according to our core idea; otherwise, which means the current line is the last appearance of all of its duplicate consecutive lines. We can now use the P command to print the characters in the current pattern space until \n (\n also printed).
D: we use the D command to delete the characters in the current pattern space until \n (\n also deleted), and then the content of pattern space is the next line.
and the D command will force sed to jump to its first command $!N, but not read the next line from a file or standard input stream.
The second solution is easy to understand (from myself):
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr 'p;:loop;$!N;s/^(.*)\n\1$/\1/;tloop;D'
1
2
3
4
5
The core idea is:
print only once of each duplicate consecutive lines at its first appearance and use the : command and t command to implement LOOP.
Explanation:
read a new line from the input stream or file and print it once.
use the :loop command to set a label named loop.
use N to read the next line into the pattern space.
use s/^(.*)\n\1$/\1/ to delete the current line if the next line is the same with the current line. We use the s command to do the delete action.
if the s command is executed successfully, then use the tloop command to force sed to jump to the label named loop, which will do the same loop to the next lines until there are no duplicate consecutive lines of the line which is latest printed; otherwise, use the D command to delete the line which is the same with the latest-printed line, and force sed to jump to the first command, which is the p command. The content of the current pattern space is the next new line.
uniq would be fooled by trailing spaces and tabs. In order to emulate how a human makes comparison, I am trimming all trailing spaces and tabs before comparison.
I think that the $!N; needs curly braces or else it continues, and that is the cause of the infinite loop.
I have Bash 5.0 and sed 4.7 in Ubuntu 20.10 (Groovy Gorilla). The second one-liner did not work, at the character set match.
The are three variations. The first is to eliminate adjacent repeat lines, the second to eliminate repeat lines wherever they occur, and the third to eliminate all but the last instance of lines in file.
pastebin
# First line in a set of duplicate lines is kept, rest are deleted.
# Emulate human eyes on trailing spaces and tabs by trimming those.
# Use after norepeat() to dedupe blank lines.
dedupe() {
sed -E '
$!{
N;
s/[ \t]+$//;
/^(.*)\n\1$/!P;
D;
}
';
}
# Delete duplicate, nonconsecutive lines from a file. Ignore blank
# lines. Trailing spaces and tabs are trimmed to humanize comparisons
# squeeze blank lines to one
norepeat() {
sed -n -E '
s/[ \t]+$//;
G;
/^(\n){2,}/d;
/^([^\n]+).*\n\1(\n|$)/d;
h;
P;
';
}
lastrepeat() {
sed -n -E '
s/[ \t]+$//;
/^$/{
H;
d;
};
G;
# delete previous repeated line if found
s/^([^\n]+)(.*)(\n\1(\n.*|$))/\1\2\4/;
# after searching for previous repeat, move tested last line to end
s/^([^\n]+)(\n)(.*)/\3\2\1/;
$!{
h;
d;
};
# squeeze blank lines to one
s/(\n){3,}/\n\n/g;
s/^\n//;
p;
';
}
This can be achieved using AWK.
The below line will display unique values:
awk file_name | uniq
You can output these unique values to a new file:
awk file_name | uniq > uniq_file_name
The new file uniq_file_name will contain only unique values, without any duplicates.
Use:
cat filename | sort | uniq -c | awk -F" " '$1<2 {print $2}'
It deletes the duplicate lines using AWK.

Resources