I have a file that looks like this
line one
line two
line three
line four
and I want it to look like this (I want lines 2 and 3 merged into one)
line one
line two line three
line four
I've tried the following, that based on my research SHOULD do what I want:
$ sed '2s/\n/ /' test.txt > test2.txt
However test2.txt looks like this
line one
line two
line three
line four
I've seen some references to it being different on Solaris than Linux. Here's my server's details
$ uname -a
SunOS myserver 5.10 Generic_142909-17 sun4u sparc SUNW,Sun-Fire-V490
How can I make this give the results I want?
Based on the answers in the linked question, this sed should work:
sed '2N;s/\n/ /' file
2N means on the second line, append the next line to the pattern space. This results in the pattern space containing line two\nline three. The substitution (which replaces the newline \n with a space) applies to every line in the file but only has an effect here.
Otherwise, this awk would do the trick:
awk 'NR==2{printf("%s ",$0);next}1' file
On line two, use printf to print the contents of the line, followed by a space. next skips to the next line. For all other lines, the 1 at the end is effectively a {print $0} block (which prints the line).
Alternatively:
awk '{printf("%s%s",$0,NR==2?OFS:ORS)}' file
Where OFS by default is a space and ORS is a newline
Or a bit of perl:
perl -pe 's/\n/ / if $. == 2' file
$. is the line number. Substitute the newline for a space on the 2nd line.
Output (using any of the above approaches on my system):
line one
line two line three
line four
Related
This post could count as duplicate , but i have not found any relevant answer in previous threads. I have a large (6 GB) text file and i wish to remove every 3rd and 4th line in a set of 4 lines . For example , the following
line1
line2
line3
line4
line5
line6
line7
line8
needs to be converted to this
line1
line2
line5
line6
Is there any vim script / command to remove those lines ? It could be also in multiple passes . 1 pass to delete the 3rd lines (in a set of 4 (line1,line2,line3,line4)) and another pass to delete again the 3rd lines (previously 4th ones , in a set of 3 (line1,line2,line3)) .
The commands :g/^/+1 d3 is close to what i want but it also removes the second lines .
If you have GNU sed, you can filter the buffer through this pipeline:
sed -e '0~4d' | sed '0~3d'
The first sed deletes every 4th line, the second deletes every 3rd line.
This has the desired effect.
To pipe the current buffer through this command, enter this in command mode:
%!sed -e '0~4d' | sed '0~3d'
The % selects the range of lines to pass to a command (% means all lines, the entire buffer), and !cmd is the command to pipe through.
To perform this outside of vim, run these two steps:
sed -ie '0~4d' file
sed -ie '0~3d' file
This will modify the file, in two steps.
Alternatively you can also use Awk.
awk 'NR%4==3||NR%4==0{next;}1' file.txt > output.txt
To do this via Vim:
%!awk 'NR\%4==3||NR\%4==0{next;}1'
UPDATE: It is a bad approach for large files, it needs ~3 sec for a 6MB file to perform a substitution.
This approach works in vim. Using regular expression, you find 4 lines and substitute them with first two lines of these 4. Works for a long file as well. Doesn't work for last 1–3 lines if there is a remainder of division of total lines number by 4.
:%s#\(^.*\n^.*\)\n^.*\n^.*\n#\1\r#g
Explanation:
:%s — substitute in the whole file, # used as a delimiter
\(^.*\n^.*\) — \(\) select two lines that will be used later as \1; \n stands for linebreak; ^ for the beginning of the line; .* for any symbol repeated as much times as possible before the linebreak
\n — linebreak after the second line
^.*\n^.*\n — next two lines to be deleted
\1\r — substitute for lines with first two lines and add a linebreak \r
g — apply to the whole file
I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with a space, empty line or a nonnumeric character. E.g.
40403813|7|Failed|No such file or directory|1
40403816|7|Hi,
The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...
I'd like join the split line back with the previous line (as mentioned below):
40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
...
using a Unix command like sed/awk. I'm not clear how to join a line with the preceeding one.
Any suggestion?
awk to the rescue!
awk -v ORS='' 'NR>1 && /^[0-9]/{print "\n"} NF' file
only print newline when the current line starts with a digit, otherwise append rows (perhaps you may want to add a space to ORS if the line break didn't preserve the space).
Don't do anything based on the values of the strings in your fields as that could go wrong. You COULD get a wrapping line that starts with a digit, for example. Instead just print after every complete record of 5 fields:
$ awk -F'|' '{rec=rec $0; nf+=NF} nf>=5{print rec; nf=0; rec=""}' file
40403813|7|Failed|No such file or directory|1
40403816|7|Hi, The Conversion System could not be reached.|No such file or directory||1
40403818|7|Failed|No such file or directory|1
Try:
awk 'NF{printf("%s",$0 ~ /^[0-9]/ && NR>1?RS $0:$0)} END{print ""}' Input_file
OR
awk 'NF{printf("%s",/^[0-9]/ && NR>1?RS $0:$0)} END{print ""}' Input_file
It will check if each line starts from a digit or not if yes and greater than line number 1 than it will insert a new line with-it else it will simply print it, also it will print a new line after reading the whole file, if we not mention it, it is not going to insert that at end of the file reading.
If you only ever have the line split into two, you can use this sed command:
sed 'N;s/\n\([^[:digit:]]\)/\1/;P;D' infile
This appends the next line to the pattern space, checks if the linebreak is followed by something other than a digit, and if so, removes the linebreak, prints the pattern space up to the first linebreak, then deletes the printed part.
If a single line can be broken across more than two lines, we have to loop over the substitution:
sed ':a;N;s/\n\([^[:digit:]]\)/\1/;ta;P;D' infile
This branches from ta to :a if a substitution took place.
To use with Mac OS sed, the label and branching command must be separate from the rest of the command:
sed -e ':a' -e 'N;s/\n\([^[:digit:]]\)/\1/;ta' -e 'P;D' infile
If the continuation lines always begin with a single space:
perl -0000 -lape 's/\n / /g' input
If the continuation lines can begin with an arbitrary amount of whitespace:
perl -0000 -lape 's/\n(\s+)/$1/g' input
It is probably more idiomatic to write:
perl -0777 -ape 's/\n / /g' input
You can use sed when you have a file without \r :
tr "\n" "\r" < inputfile | sed 's/\r\([^0-9]\)/\1/g' | tr '\r' '\n'
iam searching for some command line that takes a text file and a file with line numbers (one on each line) (alternatively from stdin) and outputs only that lines from the first file.
the text file may be several hundreds of MB large and the line list may contains several thousands of entries (but are sorted ascending)
in short:
one file contains data
another file contains indexes
a command should extract only indexed lines
first file:
many lines
of course they are all very different
and contain very important data
...
more lines
...
even more lines
second file
1
5
7
expected output
many lines
more lines
even more lines
The second (line number) file does not necessarily have to exist. Its data also may come from stdin (in deed this would the optimum). Also the format of that data may vary from the shown if this would make the task easier.
This can be an approach:
$ awk 'FNR==NR {a[$1]; next} FNR in a' file_with_line_numbers file_with_data
many lines
more lines
even more lines
It reads the file_with_line_numbers and stores the lines in an array a[]. Then it reads the other file and keeps checking if the line number is in the array, in which case the line is printed.
The trick used is the following:
awk 'FNR==NR {something; next} {other things}' file1 file2
that performs actions related to file1 in the {something} block and then actions related to file2 in the {other things} block.
What if the line numbers are given through stdin?
For this you can use awk '...' - file, so that stdin is called with -. This is called Naming Standard Input. So that you can do:
your_commands | awk 'FNR==NR {a[$1]; next} FNR in a' - file_with_data
Test
$ echo "1
5
7" | awk 'FNR==NR {a[$1]; next} FNR in a' - file_with_data
many lines
more lines
even more lines
With sed, convert the line numbers to a sed program, and use that generated program to print out the wanted lines;
$ sed -n "$( sed 's/$/p/' second_file )" first_file
many lines
more lines
even more lines
This works too.
foreach line ( "cat file2" )
foreach? sed -n "$line p" file1
foreach? end
many lines
more lines
even more lines
I need to search a file for a string, remove any line that contains the string, and also remove the two lines following any line that contains the string. I was hoping I could accomplish this using something like this...
$ grep -v -A 2 two temp.txt
one
five
$
...but unfortunately this did not work. Is there a simple I can do this with grep or another shell command?
The following works both with GNU sed and with OS X.
$ sed '/two/{N;N;d;}' temp.txt
one
five
find line matching two
read in two more lines
delete them
You can do this with awk, as per the following transcript:
pax> echo 'one
two
three
four
five' | awk '/two/ {skip=3} skip>0 {skip--;next} {print}'
one
five
It basically starts a counter of lines to throw away (3) whenever it finds the two string on a line. It then throws those lines away until the skip counter reaches zero. Any line that isn't marked for skipping is printed.
With GNU sed:
sed '/two/,+2d' temp.txt
This uses two-address syntax (addr1,addr2) to match lines with the word two (/two/) plus the two lines after (+2). The d command deletes those lines.
Here's a way to do it with Perl:
$ perl -ne'if (/two/){$x=<>;$x=<>;}else{print}' temp.txt
one
five
The -n is an implicit loop over the input. If you match /two/, then read the next two lines, otherwise print the line you're on.
The problem is, however, that if you had the third or fourth lines matched /two/, then you would still get the same output. #paxdiablo's solution is more complete. But mine's more Q&D.
Is there a way to delete duplicate lines in a file in Unix?
I can do it with sort -u and uniq commands, but I want to use sed or awk.
Is that possible?
awk '!seen[$0]++' file.txt
seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.
From http://sed.sourceforge.net/sed1line.txt:
(Please don't ask me how this works ;-) )
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Perl one-liner similar to jonas's AWK solution:
perl -ne 'print if ! $x{$_}++' file
This variation removes trailing white space before comparing:
perl -lne 's/\s*$//; print if ! $x{$_}++' file
This variation edits the file in-place:
perl -i -ne 'print if ! $x{$_}++' file
This variation edits the file in-place, and makes a backup file.bak:
perl -i.bak -ne 'print if ! $x{$_}++' file
An alternative way using Vim (Vi compatible):
Delete duplicate, consecutive lines from a file:
vim -esu NONE +'g/\v^(.*)\n\1$/d' +wq
Delete duplicate, nonconsecutive and nonempty lines from a file:
vim -esu NONE +'g/\v^(.+)$\_.{-}^\1$/d' +wq
The one-liner that Andre Miller posted works except for recent versions of sed when the input file ends with a blank line and no characterss. On my Mac my CPU just spins.
This is an infinite loop if the last line is blank and doesn't have any characterss:
sed '$!N; /^\(.*\)\n\1$/!P; D'
It doesn't hang, but you lose the last line:
sed '$d;N; /^\(.*\)\n\1$/!P; D'
The explanation is at the very end of the sed FAQ:
The GNU sed maintainer felt that despite the portability problems
this would cause, changing the N command to print (rather than
delete) the pattern space was more consistent with one's intuitions
about how a command to "append the Next line" ought to behave.
Another fact favoring the change was that "{N;command;}" will
delete the last line if the file has an odd number of lines, but
print the last line if the file has an even number of lines.
To convert scripts which used the former behavior of N (deleting
the pattern space upon reaching the EOF) to scripts compatible with
all versions of sed, change a lone "N;" to "$d;N;".
The first solution is also from http://sed.sourceforge.net/sed1line.txt
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr '$!N;/^(.*)\n\1$/!P;D'
1
2
3
4
5
The core idea is:
Print only once of each duplicate consecutive lines at its last appearance and use the D command to implement the loop.
Explanation:
$!N;: if the current line is not the last line, use the N command to read the next line into the pattern space.
/^(.*)\n\1$/!P: if the contents of the current pattern space is two duplicate strings separated by \n, which means the next line is the same with current line, we can not print it according to our core idea; otherwise, which means the current line is the last appearance of all of its duplicate consecutive lines. We can now use the P command to print the characters in the current pattern space until \n (\n also printed).
D: we use the D command to delete the characters in the current pattern space until \n (\n also deleted), and then the content of pattern space is the next line.
and the D command will force sed to jump to its first command $!N, but not read the next line from a file or standard input stream.
The second solution is easy to understand (from myself):
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr 'p;:loop;$!N;s/^(.*)\n\1$/\1/;tloop;D'
1
2
3
4
5
The core idea is:
print only once of each duplicate consecutive lines at its first appearance and use the : command and t command to implement LOOP.
Explanation:
read a new line from the input stream or file and print it once.
use the :loop command to set a label named loop.
use N to read the next line into the pattern space.
use s/^(.*)\n\1$/\1/ to delete the current line if the next line is the same with the current line. We use the s command to do the delete action.
if the s command is executed successfully, then use the tloop command to force sed to jump to the label named loop, which will do the same loop to the next lines until there are no duplicate consecutive lines of the line which is latest printed; otherwise, use the D command to delete the line which is the same with the latest-printed line, and force sed to jump to the first command, which is the p command. The content of the current pattern space is the next new line.
uniq would be fooled by trailing spaces and tabs. In order to emulate how a human makes comparison, I am trimming all trailing spaces and tabs before comparison.
I think that the $!N; needs curly braces or else it continues, and that is the cause of the infinite loop.
I have Bash 5.0 and sed 4.7 in Ubuntu 20.10 (Groovy Gorilla). The second one-liner did not work, at the character set match.
The are three variations. The first is to eliminate adjacent repeat lines, the second to eliminate repeat lines wherever they occur, and the third to eliminate all but the last instance of lines in file.
pastebin
# First line in a set of duplicate lines is kept, rest are deleted.
# Emulate human eyes on trailing spaces and tabs by trimming those.
# Use after norepeat() to dedupe blank lines.
dedupe() {
sed -E '
$!{
N;
s/[ \t]+$//;
/^(.*)\n\1$/!P;
D;
}
';
}
# Delete duplicate, nonconsecutive lines from a file. Ignore blank
# lines. Trailing spaces and tabs are trimmed to humanize comparisons
# squeeze blank lines to one
norepeat() {
sed -n -E '
s/[ \t]+$//;
G;
/^(\n){2,}/d;
/^([^\n]+).*\n\1(\n|$)/d;
h;
P;
';
}
lastrepeat() {
sed -n -E '
s/[ \t]+$//;
/^$/{
H;
d;
};
G;
# delete previous repeated line if found
s/^([^\n]+)(.*)(\n\1(\n.*|$))/\1\2\4/;
# after searching for previous repeat, move tested last line to end
s/^([^\n]+)(\n)(.*)/\3\2\1/;
$!{
h;
d;
};
# squeeze blank lines to one
s/(\n){3,}/\n\n/g;
s/^\n//;
p;
';
}
This can be achieved using AWK.
The below line will display unique values:
awk file_name | uniq
You can output these unique values to a new file:
awk file_name | uniq > uniq_file_name
The new file uniq_file_name will contain only unique values, without any duplicates.
Use:
cat filename | sort | uniq -c | awk -F" " '$1<2 {print $2}'
It deletes the duplicate lines using AWK.