Adding an extra line to a text file after every N lines - unix

Hi I have a Unix command that produces a list of ip addresses along with other columns information . i want to add something to the command so that it displays it as a set of 3 lines then a space or ---- and then the next 3 lines and so on.
how can I achieve this ?
for example:
1.2.3.4 xy
1.3.5.7 ab
1.25.7.9 cd
-------------
1.25.7.8 kl
1.3.4.5 mn
1.25.7.8 op
-------------
1.24.5.6 la
1.3.4.5 ka
1.25.7.8 xz

You can use awk to print an extra line after every 40 lines:
awk '{ print $0; if(++i % 40 == 0) printf("-------------\n") }' file
Change % 40 to % 3 to print that extra line after every 3 lines.

$ seq 9 | awk 'NR>1 && (NR%3)==1{print "---"} 1'
1
2
3
---
4
5
6
---
7
8
9

Related

Extract two consecutive lines that have non-consecutive strings

I have a very large text file with 2 columns and more than 10 mio of lines.
Most lines have in column 2 a number that is the number of column 2 of the previous line +1. However, few thousands of lines behave differently (see example below).
Input file:
A 1
A 2
A 3
A 10
A 11
A 12
A 40
A 41
I would like to extract the pair of two lines that do not respect the +1 increment in column 2.
Desired output file:
A 3
A 10
A 12
A 40
Is there (preferentially) an awk command that allows to do that?
I tried several codes comparing column 2 of two consecutive lines but unfortunately I fail until now (see the code below).
awk 'FNR==1 {print; next} $2==p2+1 {print p $0; p=""; next} {p=$0 ORS; p2=$2}' input.txt > output.txt
Thanks for your help. Best,
Would you please try the following:
awk 'NR>1 {if ($2!=p2+1) print p ORS $0} {p=$0; p2=$2}' input.txt > output.txt
Output:
A 3
A 10
A 12
A 40
The variables names are similar to yours: p holds the previous line and
p2 holds the second column of the previous line.
The condition NR>1 suppresses to print on the 1st line.
if ($2!=p2+1) print p ORS $0 prints the pairs of two lines which
meet the condition.
The block {p=$0; p2=$2} preserves values of current line for the next iteration.
I like perl for the text processing that needs arithmetic.
$ perl -ane 'print and next if $.<3; print $p and print if $F[3]!=$fp+1; $fp=$F[3]; $p=$_' input.txt
| COLUMN 1 | COLUMN 2 |
| -------- | -------- |
| A | 3 |
| A | 10 |
| A | 12 |
| A | 40 |
This is using -a to autosplit into #F.
Prints first 2 lines: print and next if $.<3
On subsequent lines, prints previous line and current line if the 4th field isn't exactly one more than the prior 4th field: print $p and print if $F[3]!=$fp+1
Saves the 4th field as $fp and the entire line as $p: $fp=$F[3]; $p=$_
Assumptions:
columns are tab-delimited
the 1st column may contain white space (this isn't demonstrated in the sample provided by OP but it also hasn't been ruled out)
lines of interest must have the same value in the 1st column (ie, if the values in the 1st column differ then we don't bother with comparing the values in the 2nd column and instead proceed to the next input line)
if 3 consecutive lines meet the criteria, the 2nd/middle line is only printed once
Setup:
$ cat input.txt
A 1
A 2
A 3 # match
A 10 # match
A 11
A 12 # match
A 23 # match
A 40 # match
A 41
X to Z 101
X to Z 102 # match
X to Z 104 # match
X to Z 105
NOTE: comments only added here to highlight the lines that match the search criteria
One awk idea:
awk -F'\t' '
FNR==1 { prevline=$0 }
FNR>1 { if ($1 == prev1 && $2+0 != prev2+1) {
if (prevline) print prevline
print
prevline="" # make sure this line is not printed again if next line also meets criteria
}
else
prevline=$0
}
{ prev1=$1; prev2=$2 }
' input.txt
This generates:
A 3
A 10
A 12
A 23
A 40
X to Z 102
X to Z 104
This might work for you (GNU sed):
sed -nE 'N;h
s/.*\s+(.*)\n.*(\s.*)/echo "$((\1+1))\2"/e;/^(.*)\s\1$/!{x;p;x};x;D' file
Open a two line window throughout the length of the file.
Make a copy of the window and increment the 2nd column of the first line by one. If this amended value is equal to the 2nd column of the second line then print both unadulterated lines.
Delete the first line and repeat.
N.B. This may print the second of these lines twice if the following line meets the same criteria.

How to Remove Code Specific Code Lines using Unix

Could someone please help/advise how could I removed the first 4 line and the last 2 line of codes in my 3 JavaScript files using the Shell Script?
I tried using this guide: UNIX - delete specific lines but it will only work for the first 4 lines. All 3 Javascript files have different set of line of codes.
set -vx
lines2del="(1,2,3,4)"
sedCmds=${lines2del//,/d;}
sedCmds=${sedCmds/(/}
sedCmds=${sedCmds/)/}
sedCmds=${sedCmds}d
sed -i "$sedCmds" file
Any inputs are highly appreciated. Thanks
This might work for you (GNU sed):
sed -i '1,4d;N;$d;P;D' file
This deletes the lines 1 to 4 and then prints all other lines except the last two which it also deletes.
Add the following to your lines2del:
$(($(cat file | wc -l)-2)) // third last line
$(($(cat file | wc -l)-1)) // second last line
$(cat file | wc -l) // last line
$ seq 10 | tail -n +5 | head -n -2
5
6
7
8
$ seq 10 | awk '{p3=p2; p2=p1; p1=$0} NR>6{print p3}'
5
6
7
8
$ seq 10 | awk '{p[NR%6]=$0} NR>6{print p[(NR-2)%6]}'
5
6
7
8
$ seq 10 | awk -v b=4 -v a=2 'BEGIN{t=b+a} {p[NR%t]=$0} NR>t{print p[(NR-a)%t]}'
5
6
7
8
$ seq 10 | awk -v b=3 -v a=5 'BEGIN{t=b+a} {p[NR%t]=$0} NR>t{print p[(NR-a)%t]}'
4
5

Subtracting lines in one file from another file

I couldn't find an answer that truly subtracts one file from another.
My goal is to remove lines in one file that occur in another file.
Multiple occurences should be respected, which means for exammple if one line occurs 4 times in file A and only once in file B, file C should have 3 of those lines.
File A:
1
3
3
3
4
4
File B:
1
3
4
File C (desired output)
3
3
4
Thanks in advance
In awk:
$ awk 'NR==FNR{a[$0]--;next} ($0 in a) && ++a[$0] > 0' f2 f1
3
3
4
Explained:
NR==FNR { # for each record in the first file
a[$0]--; # for each identical value, decrement a[value] (of 0)
next
}
($0 in a) && ++a[$0] > 0' # if record in a, increment a[value]
# once over remove count in first file, output
If you want to print items in f1 that are not in f2 you can lose ($0 in a) &&:
$ echo 5 >> f1
$ awk 'NR==FNR{a[$0]--;next} (++a[$0] > 0)' f2 f1
3
3
4
5
If the input files are already sorted as shown in sample, comm would be more suited
$ comm -23 f1 f2
3
3
4
option description from man page:
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
You can do:
awk 'NR==FNR{++cnt[$1]
next}
cnt[$1]-->0{next}
1' f2 f1

How do I list specific lines using awk?

I have a file that is sorted like the following:
2 Good
2 Hello
3 Goodbye
3 Begin
3 Yes
3 No
I want to search for the highest value in the file and display what is one the line?
3 Goodbye
3 begin
3 Yes
3 No
How would I do this?
awk to the rescue!
$ awk 'FNR==NR{if(max<$1) max=$1; next} $1==max' file{,}
3 Goodbye
3 Begin
3 Yes
3 No
double-pass, find the maximum and filter out the rest.
cat file.txt | sort -r | awk '{if ($1>=prev) {print $0; prev=$1}}'
3 Yes
3 No
3 Goodbye
3 Begin
Assuming file.txt contains
2 Good
2 Hello
3 Goodbye
3 Begin
3 Yes
3 No
First get the highest value in the file into a variable. Considering the file is already sorted, pickup the last line in the file. Then parse out the number using awk.
highest=`tail -1 file.list|awk '{print $1}'`
Then grep the file using that value.
grep "^${highest} " file.list
This should do the job. I am only using awk as required in the question:
awk 'BEGIN {v=0} {l = l "\n" $0} {if ($1>v) {l = $0; v = $1}} END {print l}' file.txt
The variable v is initialized (before parsing the file) to 0. Then each line is read and kept in memory; if the first field ($1) is greater than v, then update v and empty what is in l. At the end, just print the content of l.
It's easier than you think.
awk '/^3/' file
3 Goodbye
3 Begin
3 Yes
3 No

How can I get a range of line every nth interval using awk, sed, or other unix command?

I know how to get a range of lines by using awk and sed.
I also do know how to print out every nth line using awk and sed.
However, I don't know how to combined the two.
For example, I have a file with 1780000 lines.
For every 17800th line, I would like to print 17800th line plus the two after that.
So if I have a file with 1780000 lines and it starts from 1 and ends at 1780000, this will print:
1
2
3
17800
17801
17802
35600
35601
35602
# ... and so on.
Does anyone know how to get a range of line every nth interval using awk, sed, or other unix command?
Using GNU sed:
sed -n '0~17800{N;N;p}' input
Meaning,
For every 17800th line: 0~17800
Read two lines: {N;N;
And print these out: p}
We can also add the first three lines:
sed -n -e '1,3p' -e '0~17800{N;N;p}' input
Using Awk, this would be simpler:
awk 'NR%17800<3 || NR==3 {print}' input
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '!(NR%3)' file
3
6
9
$ awk -v intvl=3 -v delta=2 '!(NR%intvl){print "-----"; c=delta} c&&c--' file
-----
3
4
-----
6
7
-----
9
10
$ awk -v intvl=4 -v delta=2 '!(NR%intvl){print "-----"; c=delta} c&&c--' file
-----
4
5
-----
8
9
$ awk -v intvl=4 -v delta=3 '!(NR%intvl){print "-----"; c=delta} c&&c--' file
-----
4
5
6
-----
8
9
10
seq -f %.0f 1780000 | awk 'NR < 4 || NR % 17800 < 3' | head
output:
1
2
3
17800
17801
17802
35600
35601
35602
53400
Explanation
The NR < 4 is for the first 3 lines because the requirement For every 17800th line, print 17800th line plus the two after that. doesn't fit the output you gave.
Here I use head for reducing the output size and you should remove it in your use case.
For GNU seq, you don't need -f %.0f.

Resources