AWK Split File every n-th Row but group IDs together - unix

Lets assume I have the following file text.txt:
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
4
4
4
5
5
6
7
7
8
9
9
9
10
11
11
11
14
15
I want to split this into multiple files by every 5th data row, but if the number of the next row is identical it should still end up in the same file. Header should be in every file, but that could also be ignored and reintroduced later.
This means something like this:
text.txt.1
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
text.txt.2
#something
#somethingelse
#anotherthing
4
4
4
5
5
text.txt.3
#something
#somethingelse
#anotherthing
6
7
7
8
9
9
9
text.txt.4
#something
#somethingelse
#anotherthing
10
11
11
11
14
text.txt.5
#something
#somethingelse
#anotherthing
15
So I was thinking about something like this:
awk 'NR%5==1 && $1!=prev{i++;prev=$1}{print > FILENAME"."i}' test.txt
Both statements work by itself but not together.. is that possible using awk?

Nice question.
With your example, this would work:
awk 'BEGIN{i=1;}/\#/{header= header == ""? $0 : header "\n" $0; next}c>=5 && $1!=prev{i++;c=0;}{if(!c) print header>FILENAME"."i; print > FILENAME"."i;c++;prev=$1;}' test.txt
You need strip the header out, and set a counter (c in above), NR is just current line number of the input, it will not meet your needs when the actual lines are not times of 5.
Break it up and improve a tiny bit:
awk 'BEGIN{i=1;}
/\#/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0;}
!c {print header>FILENAME"."i;}
{print > FILENAME"."i;c++;prev=$1;}
' test.txt
To solve the potential problems mentioned in the comment:
awk 'BEGIN{i=1}
/\#/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0}
!c {close(f);f=(FILENAME"."i);print header>f}
{print>f;c++;prev=$1}
' test.txt
or check Ed's answer which is more precise and different platforms/versions compatible.

Using any awk in any shell on every Unix box:
$ cat tst.awk
/^#/ {
hdr = hdr $0 ORS
next
}
( (++numLines) % 5 ) == 1 {
if ( $0 == prev ) {
--numLines
}
else {
close(out)
out = FILENAME "." (++numBlocks)
printf "%s", hdr > out
numLines = 1
}
}
{
print > out
prev = $0
}
$ awk -f tst.awk text.txt
$ head text.txt.*
==> text.txt.1 <==
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
==> text.txt.2 <==
#something
#somethingelse
#anotherthing
4
4
4
5
5
==> text.txt.3 <==
#something
#somethingelse
#anotherthing
6
7
7
8
9
9
9
==> text.txt.4 <==
#something
#somethingelse
#anotherthing
10
11
11
11
14
==> text.txt.5 <==
#something
#somethingelse
#anotherthing
15

With your shown samples, please try following awk program. Written and tested in GNU awk.
awk '
BEGIN{
outFile="test.txt"
count=1
}
/#/{
header=(header?header ORS:"")$0
next
}
{
arr[$0]=(arr[$0]?arr[$0] ORS:"")$0
}
END{
PROCINFO["sorted_in"] = "#ind_num_asc"
print header > (outFile count)
for(i in arr){
num=split(arr[i],arr2,"\n")
print arr[i] > (outFile count)
len+=num
if(len>=5){ len=0 }
if(len==0){
close(outFile count)
count++
print header > (outFile count)
}
}
}
' Input_file

Related

How to make "sort" execute and print earlier before the "awk script" completely execute so that I can add something after the sorted data?

So, I basically want to store sorted array/data into another array and use that data to print something else?
Even when I want to have the footer, sorted data is printed after the footer.
printf (" %-25s %-20s %d\n", employee_name[working_employee_id[y]], title[employee_name[working_employee_id[y]]], salary[employee_name[working_employee_id[y]]]) | "sort -nr -k2"
I want to print other things after the execution of this line instead of letting sort to print at the end
You need to close() the pipe at the end of your input before printing anything else if you want to make sure the command you're piping to finishes displaying all its output before your footer text.
Example:
$ paste <(seq 10 | shuf) <(seq 10 | shuf) |
awk '{ printf "%d\t%d\t%d\n", $1, $2, $1 + $2 | "sort -k1,1n" }
END { close("sort -k1,1n"); print "a\tb\tc" }'
1 8 9
2 3 5
3 6 9
4 4 8
5 2 7
6 10 16
7 9 16
8 1 9
9 7 16
10 5 15
a b c

How to add a column in Unix

Add a column before column n:
awk 'BEGIN{FS=OFS="fs"}{$n = value OFS $n}1' filename.
I have tried this command but it doesn't work. What does the "n" represent here? Do I have to change the n to a value?
All together I have a file with 17 columns. I would like to add a new column in between column 6 and 7.
This is better achieved by looping on the field:
Input file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Then adding "18" in between 6 and 7:
awk -F \| '{ for (i=1;i<=6;i++) { printf "%s ",$i } printf "%s","18";for (i=7;i<=$NF;i++) { printf " %s",$i } printf "\n" }' file
Explanation:
awk -F \| '{
for (i=1;i<=6;i++) {
printf "%s ",$i # Loop through the first 6 space delimited fields and print with a space after each one to replicate the delimiter
}
printf "%s","18"; # Print "18" with no spaces
for (i=7;i<=NF;i++) {
printf " %s",$i # Loop through the rest of the field printing a space and then the field (NF - represent the last field)
}
printf "\n" # Print a new line
}' file
Output:
1 2 3 4 5 6 18 7 8 9 10 11 12 13 14 15 16 17

Format the last column with separator :

I have an awk command
awk '{printf "%-11s %-24s %-16s %-1s %-6s %-1s %-1s %-7s %-1s %-1s\n", $1, $2, $3,$4,$5,$6,$7,$8,$9,$10}' fhost2
That output this
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1 Module 7 1 FC_Port 7 1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1 Module 8 1 FC_Port 8 1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1 Module 9 1 FC_Port 9 1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF20D0A8 1 Module 7 1 FC_Port 7 3
How can i make this output look like this.
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1:Module:7 1:FC_Port:7:1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1:Module:8 1:FC_Port:8:1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF652604 1:Module:9 1:FC_Port:9:1
GBL-SVC-ES1 GBL-esx403.ad.gbl.com 10008C7CFF20D0A8 1:Module:7 1:FC_Port:7:3
Any help is much appreciated.
Something more flexible than your code (which will become huge if number of fields increases):
awk '
BEGIN{
nCol = split("4 5 7 8 9", chgSepCol)
for(i=1; i<=nCol; i++){
chgSep[chgSepCol[i]]
}
}
{
for(i=1; i<NF; i++){
sep = (i in chgSep)? ":" : OFS
printf "%s%s", $i, sep
}
print $NF
}' file
String "4 5 7 8 9" represents the columns after which the separator must be changed

AWK, Unix command: How to match two files using corresponding first column in unix command

I have two file, first with single column (with repeated IDs), second file is three columns file, first column is IDs which is same with first file BUT unique number, I want to print remaining two columns of second file corresponding to first file IDs.
Example:
First file:
IDs
1
3
6
7
11
13
13
14
18
20
Second file:
IDs Freq Status
1 1 JD611
2 1 QD51
3 2
5
6
7 2
11 2
13 2
14 2
Desired OUTPUT
1 1 JD611
3 2
6
7 2
11 2
13 2
13 2
14 2
18
20
You can use this awk:
awk 'NR==FNR{a[$1]=$2 FS $3; next} {print $1, a[$1]}' f2 f1
To skip the header line,
awk 'FNR==1{next} NR==FNR{a[$1]=$2 FS $3; next} {print $1, a[$1]}' f2 f1
If second file has multiple columns,
awk 'NR==FNR{c=$1; $1=""; a[c]=$0; next} {print $1, a[$1]}' f2 f1

row to column using awk

I would like to know how I could transform the following ('Old') to 'New1' and 'New2' using awk:
Old:
5
21
31
4
5
11
12
15
5
19
5
12
5
.
.
New1:
5 21 31 4
5 11 12 15
5 19
5 12
.
.
New2:
521314
5111215
519
512
.
.
Thanks so much!
Requires gawk for multi-character RS:
$ awk 'BEGIN {RS="\n5\n"} {$1=$1; print (NR>1 ? 5 OFS $0 : $0)}' file
5 21 31 4
5 11 12 15
5 19
5 12
For the second version, just set OFS to the empty string:
$ awk -v OFS="" 'BEGIN {RS="\n5\n"} {$1=$1; print (NR>1 ? 5 OFS $0 : $0)}' file
521314
5111215
519
512
To get new1:
awk '/^5/{printf "%s", (NR>1?RS:"")$0;next}{printf " %s",$0}END{print ""}' file
To get new2:
awk '/^5/{printf "%s", (NR>1?RS:"")$0;next}{printf "%s",$0}END{print ""}' file
some variation of #jas's script
$ awk -v RS="(^|\n)5\n" -v OFS='' 'NR>1{$1=$1; print 5,$0}' file
521314
5111215
519
512
$ awk -v RS="(^|\n)5\n" -v OFS=' ' 'NR>1{$1=$1; print 5,$0}' file
5 21 31 4
5 11 12 15
5 19
5 12
in the second one you don't have to set the OFS explicitly since it's the default value, otherwise both scripts are the same (essentially same as the other referenced answer).
With any awk:
$ awk -v ORS= '{print ($0==5 ? ors : OFS) $0; ors=RS} END{print ors}' file
5 21 31 4
5 11 12 15
5 19
5 12
$ awk -v ORS= -v OFS= '{print ($0==5 ? ors : OFS) $0; ors=RS} END{print ors}' file
521314
5111215
519
512

Resources