Delete newline in Vim - unix

Is there a way to delete the newline at the end of a line in Vim, so that the next line is appended to the current line?
For example:
Evaluator<T>():
_bestPos(){
}
I'd like to put this all on one line without copying lines and pasting them into the previous one. It seems like I should be able to put my cursor to the end of each line, press a key, and have the next line jump onto the same one the cursor is on.
End result:
Evaluator<T>(): _bestPos(){ }
Is this possible in Vim?

If you are on the first line, pressing (upper case) J will join that line and the next line together, removing the newline. You can also combine this with a count, so pressing 3J will combine all 3 lines together.

Certainly. Vim recognizes the \n character as a newline, so you can just search and replace.
In command mode type:
:%s/\n/

While on the upper line in normal mode, hit Shift+j.
You can prepend a count too, so 3J on the top line would join all those lines together.

As other answers mentioned, (upper case) J and search + replace for \n can be used generally to strip newline characters and to concatenate lines.
But in order to get rid of the trailing newline character in the last line, you need to do this in Vim:
:set noendofline binary
:w

J deletes extra leading spacing (if any), joining lines with a single space. (With some exceptions: after /[.!?]$/, two spaces may be inserted; before /^\s*)/, no spaces are inserted.)
If you don't want that behavior, gJ simply removes the newline and doesn't do anything clever with spaces at all.

set backspace=indent,eol,start
in your .vimrc will allow you to use backspace and delete on \n (newline) in insert mode.
set whichwrap+=<,>,h,l,[,]
will allow you to delete the previous LF in normal mode with X (when in col 1).

All of the following assume that your cursor is on the first line:
Using normal mappings:
3Shift+J
Using Ex commands:
:,+2j
Which is an abbreviation of
:.,.+2 join
Which can also be entered by the following shortcut:
3:j
An even shorter Ex command:
:j3

It probably depends on your settings, but I usually do this with A<delete>
Where A is append at the end of the line. It probably requires nocompatible mode :)

I would just press A (append to end of line, puts you into insert mode) on the line where you want to remove the newline and then press delete.

<CURSOR>Evaluator<T>():
_bestPos(){
}
cursor in first line
NOW, in NORMAL MODE do
shift+v
2j
shift+j
or
V2jJ
:normal V2jJ

if you don't mind using other shell tools,
tr -d "\n" < file >t && mv -f t file
sed -i.bak -e :a -e 'N;s/\n//;ba' file
awk '{printf "%s",$0 }' file >t && mv -f t file

The problem is that multiples char 0A (\n) that are invisible may accumulate.
Supose you want to clean up from line 100 to the end:
Typing ESC and : (terminal commander)
:110,$s/^\n//
In a vim script:
execute '110,$s/^\n//'
Explanation: from 110 till the end
search for lines that start with new line (are blank)
and remove them

A very slight improvement to TinkerTank's solution if you're just looking to quickly concatenate all the lines in a text file is to have something like this in your .vimrc:
nnoremap <leader>j :%s/\n/\ /g<CR>
This globally substitutes newlines with a space meaning you don't end up with the last word of a line being joined onto the first word of the next line. This works perfectly for my typical use-case.
If you're wanting to maintain deliberate paragraph breaks, V):join is probably the easiest solution.

Related

special character removal 'sed'

I'm facing an issue where I'm getting some special characters in my file at the beggining; a snap of the same below:
^#<9b>200931350515,test1,910,420032400825443
^#<9a>200931350515,test1,910,420032400825443
^#<9d>200931746996,test2,910,420031390086807
I'm using the following command to remove anything other than numbers in first column:
sed 's/^[^0-9]*//g' file.dat
No success on that. The file is created btw during a fastexport from Teradata, the process adds some special characters by itself during extract.
Any idea on the command?
If you want to remove any non-ASCII characters anywhere in a line, you can use tr.
tr -d '\000\200-\377' <file >file.new
Using perl
perl -lne 'print /\d+,.*/g'
200931350515,test1,910,420032400825443
200931350515,test1,910,420032400825443
200931746996,test2,910,420031390086807
matches only digits up to the first comma and then everything else.
sed is to big gun for such a small problem,
use cut to remove the beginning of each line:
cut -b 2- file.dat
Where 2- is the range of bytes you want to retain, I'm not sure how many such strange characters you have there, so I would experiment with 1-, 2-, 3-, 4-, 5-, etc.
It looks like the number of characters that should be removed is constant across all line. To remove a fixed number of characters from the beginning of each line, you could simply do
$ sed 's/^.....//' input >output
Adjust the number of dots to fit your need.

Double spacing a file using awk

I was going through the awk and found the below two commands for double spacing a file.
Can someone please explain how these commands actually work ?
awk '1;{print ""}' filename
awk 'BEGIN{ORS="\n\n"};1' filename
Thanks
Your first example uses two common awk shortcuts: 1 is just a pattern that is always true, so the default action, which is "print the line", is executed for every line. Then, there is a rule with an empty pattern (which is also always true, but you can't omit both pattern and action), which in its action just prints an empty line.
Your second example changes the Output Record Separator, which is usually just a single end-of-line, to be two, so that just copying every line will be enough. (BEGIN rules are executed before the input file is read.)

How to delete one specific line in a file and modify the next line in unix?

I have a text file and there was a mistake when it was created. To fix this I need to delete a line with a specific unique string and delete the characters in the following line that precede the # symbol. I was able to do this with sed and cut but it only output that one line, not the many other 1000s of lines in my file. Here is an example of the part of the file that needs fixing. I know the line #s (delete 45603341 and modify 45603342) where this mistake occurs.
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
I need the output to look like this and for it to leave the rest of the file intact.
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
Thanks!
How about:
sed -i -e '45603341d;45603342s/^.*\(#.*\)$/\1/' <filename>
where you replace <filename> with the name of your file.
If you want to change a particular line and delete the above line then run,
sed -ri '45603342s/^([^#]*)(#.*)$/\2/g; 45603341d' aa
Example:
$ cat aa
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
$ sed -r '2s/^([^#]*)(#.*)$/\2/g; 1d' aa
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
This might work for you (GNU sed):
sed '45603341!b;N;s/^.*\n[^#]*//' file
Leave as is any other line ecsept 45603341. On this line , append the following line and then remove everything from the start to the first non-# in the the appended line.
An alternative approach to 'sed' can be to use vim macros (This also works on Windows). The main disadvantage is that you will not be able to integrate inside scripts like 'sed' does. The main advantage is that it allows for complex replacements like "search for this pattern, then clear the line, go down 3 lines, move to column 40, switch lines,...). If you are already familiar with VIM it's also much more intuitive.
In this particular case you will have to do something like
qq (start macro recording)
/^#HWI.*CAGATC$ (search pattern)
dd (delete line)
vw (select word)
d (delete selected word)
q (end macro)
To run the macro 100 times:
100#q

sed '$!N;$!D' explanation

I know that
cat foo | sed '$!N;$!D'
will print out the last two lines of the file foo, but I don't understand why.
I have read the man page and know that N joins the next line to the currently processed line etc - but could someone explain in 'good english' that matches the order of operation what is happening here, step by step?
thanks!
Here is what that script looks like when run through the sedsed debugger (by Aurelio Jargas):
$ echo -e 'a\nb\nc\nd' | sed '$!N;$!D' PATT:^a$
PATT:^a$
COMM:$ !N
PATT:^a\Nb$
COMM:$ !D
PATT:^b$
COMM:$ !N
PATT:^b\Nc$
COMM:$ !D
PATT:^c$
COMM:$ !N
PATT:^c\Nd$
COMM:$ !D
PATT:^c\Nd$
c
d
I've filtered out the lines that would show hold space ("HOLD") since it's not being used. "PATT" shows what's in pattern space and "COMM" shoes the command about to be executed. "\N" indicates an embedded newline. Of course, "^" and "$" indicate the beginning and end of the string.
!N appends the next line and !D deletes the previous line and loops to the beginning of the script without doing an implicit print. When the last line is read, the $! tests fail so there's nothing left to do and the script exits and performs an implicit print of what remains in the pattern space (since -n was not specified in the arguments).
Disclaimer: I am a contributor to the sedsed project and have made a few minor improvements including expanded color support, adding the ^ line-beginning indicator and preliminary support for Python 3. The bleeding edge version (which hasn't been touched lately) is here.
$!N;$!D is a sed program consisting of two statements, $!N and $!D.
$!N matches everything but the last line of the last file of input ($ negated by !) and runs the N command on it, which as you said yourself appends the next line of input to the line currently under scrutiny (the "pattern space"). In other words, sed now has two lines in the pattern space and has advanced to the next line.
$!D also matches everything but the last line, and wipes the pattern space up to the first newline. D also prevents sed from wiping the entire pattern space when reading the next line.
So, the algorithm being executed is roughly:
For every line up to but not including the last {
Read the next line and append it to the pattern space
If still not at the last line
Delete the first line in the pattern space
}
Print the pattern space

How to delete duplicate lines in a file without sorting it in Unix

Is there a way to delete duplicate lines in a file in Unix?
I can do it with sort -u and uniq commands, but I want to use sed or awk.
Is that possible?
awk '!seen[$0]++' file.txt
seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.
From http://sed.sourceforge.net/sed1line.txt:
(Please don't ask me how this works ;-) )
# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^\(.*\)\n\1$/!P; D'
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
Perl one-liner similar to jonas's AWK solution:
perl -ne 'print if ! $x{$_}++' file
This variation removes trailing white space before comparing:
perl -lne 's/\s*$//; print if ! $x{$_}++' file
This variation edits the file in-place:
perl -i -ne 'print if ! $x{$_}++' file
This variation edits the file in-place, and makes a backup file.bak:
perl -i.bak -ne 'print if ! $x{$_}++' file
An alternative way using Vim (Vi compatible):
Delete duplicate, consecutive lines from a file:
vim -esu NONE +'g/\v^(.*)\n\1$/d' +wq
Delete duplicate, nonconsecutive and nonempty lines from a file:
vim -esu NONE +'g/\v^(.+)$\_.{-}^\1$/d' +wq
The one-liner that Andre Miller posted works except for recent versions of sed when the input file ends with a blank line and no characterss. On my Mac my CPU just spins.
This is an infinite loop if the last line is blank and doesn't have any characterss:
sed '$!N; /^\(.*\)\n\1$/!P; D'
It doesn't hang, but you lose the last line:
sed '$d;N; /^\(.*\)\n\1$/!P; D'
The explanation is at the very end of the sed FAQ:
The GNU sed maintainer felt that despite the portability problems
this would cause, changing the N command to print (rather than
delete) the pattern space was more consistent with one's intuitions
about how a command to "append the Next line" ought to behave.
Another fact favoring the change was that "{N;command;}" will
delete the last line if the file has an odd number of lines, but
print the last line if the file has an even number of lines.
To convert scripts which used the former behavior of N (deleting
the pattern space upon reaching the EOF) to scripts compatible with
all versions of sed, change a lone "N;" to "$d;N;".
The first solution is also from http://sed.sourceforge.net/sed1line.txt
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr '$!N;/^(.*)\n\1$/!P;D'
1
2
3
4
5
The core idea is:
Print only once of each duplicate consecutive lines at its last appearance and use the D command to implement the loop.
Explanation:
$!N;: if the current line is not the last line, use the N command to read the next line into the pattern space.
/^(.*)\n\1$/!P: if the contents of the current pattern space is two duplicate strings separated by \n, which means the next line is the same with current line, we can not print it according to our core idea; otherwise, which means the current line is the last appearance of all of its duplicate consecutive lines. We can now use the P command to print the characters in the current pattern space until \n (\n also printed).
D: we use the D command to delete the characters in the current pattern space until \n (\n also deleted), and then the content of pattern space is the next line.
and the D command will force sed to jump to its first command $!N, but not read the next line from a file or standard input stream.
The second solution is easy to understand (from myself):
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr 'p;:loop;$!N;s/^(.*)\n\1$/\1/;tloop;D'
1
2
3
4
5
The core idea is:
print only once of each duplicate consecutive lines at its first appearance and use the : command and t command to implement LOOP.
Explanation:
read a new line from the input stream or file and print it once.
use the :loop command to set a label named loop.
use N to read the next line into the pattern space.
use s/^(.*)\n\1$/\1/ to delete the current line if the next line is the same with the current line. We use the s command to do the delete action.
if the s command is executed successfully, then use the tloop command to force sed to jump to the label named loop, which will do the same loop to the next lines until there are no duplicate consecutive lines of the line which is latest printed; otherwise, use the D command to delete the line which is the same with the latest-printed line, and force sed to jump to the first command, which is the p command. The content of the current pattern space is the next new line.
uniq would be fooled by trailing spaces and tabs. In order to emulate how a human makes comparison, I am trimming all trailing spaces and tabs before comparison.
I think that the $!N; needs curly braces or else it continues, and that is the cause of the infinite loop.
I have Bash 5.0 and sed 4.7 in UbuntuĀ 20.10 (Groovy Gorilla). The second one-liner did not work, at the character set match.
The are three variations. The first is to eliminate adjacent repeat lines, the second to eliminate repeat lines wherever they occur, and the third to eliminate all but the last instance of lines in file.
pastebin
# First line in a set of duplicate lines is kept, rest are deleted.
# Emulate human eyes on trailing spaces and tabs by trimming those.
# Use after norepeat() to dedupe blank lines.
dedupe() {
sed -E '
$!{
N;
s/[ \t]+$//;
/^(.*)\n\1$/!P;
D;
}
';
}
# Delete duplicate, nonconsecutive lines from a file. Ignore blank
# lines. Trailing spaces and tabs are trimmed to humanize comparisons
# squeeze blank lines to one
norepeat() {
sed -n -E '
s/[ \t]+$//;
G;
/^(\n){2,}/d;
/^([^\n]+).*\n\1(\n|$)/d;
h;
P;
';
}
lastrepeat() {
sed -n -E '
s/[ \t]+$//;
/^$/{
H;
d;
};
G;
# delete previous repeated line if found
s/^([^\n]+)(.*)(\n\1(\n.*|$))/\1\2\4/;
# after searching for previous repeat, move tested last line to end
s/^([^\n]+)(\n)(.*)/\3\2\1/;
$!{
h;
d;
};
# squeeze blank lines to one
s/(\n){3,}/\n\n/g;
s/^\n//;
p;
';
}
This can be achieved using AWK.
The below line will display unique values:
awk file_name | uniq
You can output these unique values to a new file:
awk file_name | uniq > uniq_file_name
The new file uniq_file_name will contain only unique values, without any duplicates.
Use:
cat filename | sort | uniq -c | awk -F" " '$1<2 {print $2}'
It deletes the duplicate lines using AWK.

Resources