row to column using awk - unix

I would like to know how I could transform the following ('Old') to 'New1' and 'New2' using awk:
Old:
5
21
31
4
5
11
12
15
5
19
5
12
5
.
.
New1:
5 21 31 4
5 11 12 15
5 19
5 12
.
.
New2:
521314
5111215
519
512
.
.
Thanks so much!

Requires gawk for multi-character RS:
$ awk 'BEGIN {RS="\n5\n"} {$1=$1; print (NR>1 ? 5 OFS $0 : $0)}' file
5 21 31 4
5 11 12 15
5 19
5 12
For the second version, just set OFS to the empty string:
$ awk -v OFS="" 'BEGIN {RS="\n5\n"} {$1=$1; print (NR>1 ? 5 OFS $0 : $0)}' file
521314
5111215
519
512

To get new1:
awk '/^5/{printf "%s", (NR>1?RS:"")$0;next}{printf " %s",$0}END{print ""}' file
To get new2:
awk '/^5/{printf "%s", (NR>1?RS:"")$0;next}{printf "%s",$0}END{print ""}' file

some variation of #jas's script
$ awk -v RS="(^|\n)5\n" -v OFS='' 'NR>1{$1=$1; print 5,$0}' file
521314
5111215
519
512
$ awk -v RS="(^|\n)5\n" -v OFS=' ' 'NR>1{$1=$1; print 5,$0}' file
5 21 31 4
5 11 12 15
5 19
5 12
in the second one you don't have to set the OFS explicitly since it's the default value, otherwise both scripts are the same (essentially same as the other referenced answer).

With any awk:
$ awk -v ORS= '{print ($0==5 ? ors : OFS) $0; ors=RS} END{print ors}' file
5 21 31 4
5 11 12 15
5 19
5 12
$ awk -v ORS= -v OFS= '{print ($0==5 ? ors : OFS) $0; ors=RS} END{print ors}' file
521314
5111215
519
512

Related

AWK Split File every n-th Row but group IDs together

Lets assume I have the following file text.txt:
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
4
4
4
5
5
6
7
7
8
9
9
9
10
11
11
11
14
15
I want to split this into multiple files by every 5th data row, but if the number of the next row is identical it should still end up in the same file. Header should be in every file, but that could also be ignored and reintroduced later.
This means something like this:
text.txt.1
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
text.txt.2
#something
#somethingelse
#anotherthing
4
4
4
5
5
text.txt.3
#something
#somethingelse
#anotherthing
6
7
7
8
9
9
9
text.txt.4
#something
#somethingelse
#anotherthing
10
11
11
11
14
text.txt.5
#something
#somethingelse
#anotherthing
15
So I was thinking about something like this:
awk 'NR%5==1 && $1!=prev{i++;prev=$1}{print > FILENAME"."i}' test.txt
Both statements work by itself but not together.. is that possible using awk?
Nice question.
With your example, this would work:
awk 'BEGIN{i=1;}/\#/{header= header == ""? $0 : header "\n" $0; next}c>=5 && $1!=prev{i++;c=0;}{if(!c) print header>FILENAME"."i; print > FILENAME"."i;c++;prev=$1;}' test.txt
You need strip the header out, and set a counter (c in above), NR is just current line number of the input, it will not meet your needs when the actual lines are not times of 5.
Break it up and improve a tiny bit:
awk 'BEGIN{i=1;}
/\#/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0;}
!c {print header>FILENAME"."i;}
{print > FILENAME"."i;c++;prev=$1;}
' test.txt
To solve the potential problems mentioned in the comment:
awk 'BEGIN{i=1}
/\#/{header= header == ""? $0 : header ORS $0; next}
c>=5 && $1!=prev{i++;c=0}
!c {close(f);f=(FILENAME"."i);print header>f}
{print>f;c++;prev=$1}
' test.txt
or check Ed's answer which is more precise and different platforms/versions compatible.
Using any awk in any shell on every Unix box:
$ cat tst.awk
/^#/ {
hdr = hdr $0 ORS
next
}
( (++numLines) % 5 ) == 1 {
if ( $0 == prev ) {
--numLines
}
else {
close(out)
out = FILENAME "." (++numBlocks)
printf "%s", hdr > out
numLines = 1
}
}
{
print > out
prev = $0
}
$ awk -f tst.awk text.txt
$ head text.txt.*
==> text.txt.1 <==
#something
#somethingelse
#anotherthing
1
2
2
3
3
3
==> text.txt.2 <==
#something
#somethingelse
#anotherthing
4
4
4
5
5
==> text.txt.3 <==
#something
#somethingelse
#anotherthing
6
7
7
8
9
9
9
==> text.txt.4 <==
#something
#somethingelse
#anotherthing
10
11
11
11
14
==> text.txt.5 <==
#something
#somethingelse
#anotherthing
15
With your shown samples, please try following awk program. Written and tested in GNU awk.
awk '
BEGIN{
outFile="test.txt"
count=1
}
/#/{
header=(header?header ORS:"")$0
next
}
{
arr[$0]=(arr[$0]?arr[$0] ORS:"")$0
}
END{
PROCINFO["sorted_in"] = "#ind_num_asc"
print header > (outFile count)
for(i in arr){
num=split(arr[i],arr2,"\n")
print arr[i] > (outFile count)
len+=num
if(len>=5){ len=0 }
if(len==0){
close(outFile count)
count++
print header > (outFile count)
}
}
}
' Input_file

How to add a column in Unix

Add a column before column n:
awk 'BEGIN{FS=OFS="fs"}{$n = value OFS $n}1' filename.
I have tried this command but it doesn't work. What does the "n" represent here? Do I have to change the n to a value?
All together I have a file with 17 columns. I would like to add a new column in between column 6 and 7.
This is better achieved by looping on the field:
Input file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Then adding "18" in between 6 and 7:
awk -F \| '{ for (i=1;i<=6;i++) { printf "%s ",$i } printf "%s","18";for (i=7;i<=$NF;i++) { printf " %s",$i } printf "\n" }' file
Explanation:
awk -F \| '{
for (i=1;i<=6;i++) {
printf "%s ",$i # Loop through the first 6 space delimited fields and print with a space after each one to replicate the delimiter
}
printf "%s","18"; # Print "18" with no spaces
for (i=7;i<=NF;i++) {
printf " %s",$i # Loop through the rest of the field printing a space and then the field (NF - represent the last field)
}
printf "\n" # Print a new line
}' file
Output:
1 2 3 4 5 6 18 7 8 9 10 11 12 13 14 15 16 17

Awk command to print from 3rd column to till nth column

How to print from 3rd column to till last columns using awk command in unix, if there are 'n' columns in a file. I am getting with cut command but I need awk command. I am trying to do with awk -F " " '{ for{i=3;i<=NF;i++) print $i}', I am getting the output but it is not in the correct format. Can anyone suggest me the proper command.
Combining Ed Morton's answers in:
Print all but the first three columns
delete a column with awk or sed
We get something like this:
awk '{sub(/^(\S+\s*){2}/,""); sub(/(\s*\S+){2}$/,"")}1'
# ^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
# remove 2 first cols remove 2 last cols
Which you can adapt to your exact needs in terms of columns.
See an example given this input:
$ paste -d ' ' <(seq 5) <(seq 2 6) <(seq 3 7) <(seq 4 8) <(seq 5 9)
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
Let's just print the 3rd column:
$ awk '{sub(/^(\S+\s*){2}/,""); sub(/(\s*\S+){2}$/,"")}1' <(paste -d ' ' <(seq 5) <(seq 2 6) <(seq 3 7) <(seq 4 8) <(seq 5 9))
3
4
5
6
7
Your attempt was close but appears that it would print each and every column on a new line.
To correct this we create a variable called 'line' and initialize it to an empty string. The first time we are in the loop we just add the column to 'line'. From that point on we will append to 'line' with the field separator and the next column. Finally, we print 'line'. This will happen for each line in the file.
awk '{line="";for(i=3;i<=NF;i++) if(i==3) line=$i; else line=line FS $i; print line}'
In this example I assume to use awk's default field separator. Also any lines that are less than three will print blank lines.
Assuming your fields are space-separated then with GNU awk for gensub():
$ cat file
a b c d e f
g h i j k l
$ awk '{print gensub(/(\S\s){2}/,"",1)}' file
c d e f
i j k l
In general to print from, say, field 3 to field 5 if they are blank separated using GNU awk again with gensub():
$ awk '{print gensub(/(\S\s){2}((\S\s){2}\S).*/,"\\2",1)}' file
c d e
i j k
or the 3rd arg to match():
$ awk 'match($0,/(\S\s){2}((\S\s){2}\S)/,a){print a[2]}' file
c d e
i j k
or in general if they are separated by any single character:
$ awk '{print gensub("([^"FS"]"FS"){2}(([^"FS"]"FS"){2}[^"FS"]).*","\\2",1)}' file
c d e
i j k
$ awk 'match($0,"([^"FS"]"FS"){2}(([^"FS"]"FS"){2}[^"FS"])",a){print a[2]}' file
c d e
i j k
If the fields are separated by a string instead of a single-character but the RS is a single character then you should temporarily change FS to RS (which by definition you KNOW can't be present in the record) so you can negate it in the bracket expressions:
$ cat file
aSOMESTRINGbSOMESTRINGcSOMESTRINGdSOMESTRINGeSOMESTRINGf
gSOMESTRINGhSOMESTRINGiSOMESTRINGjSOMESTRINGkSOMESTRINGl
$ awk -F'SOMESTRING' '{gsub(FS,RS)} match($0,"([^"RS"]"RS"){2}(([^"RS"]"RS"){2}[^"RS"])",a){gsub(RS,FS,a[2]); print a[2]}' file
cSOMESTRINGdSOMESTRINGe
iSOMESTRINGjSOMESTRINGk
If both the FS and the RS are multi-char then there's various options but the simplest is to use the NUL character or some other character you know can't appear in your input file instead of RS as the temporary replacement FS:
$ awk -F'SOMESTRING' '{gsub(FS,"\0")} match($0,/([^\0]\0){2}(([^\0]\0){2}[^\0])/,a){gsub("\0",FS,a[2]); print a[2]}' file
cSOMESTRINGdSOMESTRINGe
iSOMESTRINGjSOMESTRINGk
Obviously change FS to OFS in the final gsub()s above if desired.
If the FS was a regexp instead of a string and you want to retain it in the output then you need to look at GNU awk for the 4th arg for split().
If you don't mind normalizing the space, the most straightforward way is
$ awk '{$1=$2=""}1' | sed -r 's/^ +//'
in action
$ seq 11 40 | pr -6ts' ' | awk '{$1=$2=""}1' | sed -r 's/^ +//'
21 26 31 36
22 27 32 37
23 28 33 38
24 29 34 39
25 30 35 40
for the input
$ seq 11 40 | pr -6ts' '
11 16 21 26 31 36
12 17 22 27 32 37
13 18 23 28 33 38
14 19 24 29 34 39
15 20 25 30 35 40
To print from third column to till end then
cat filename|awk '{for(i=1;i<3;i++) $i="";print $0}'

AWK, Unix command: How to match two files using corresponding first column in unix command

I have two file, first with single column (with repeated IDs), second file is three columns file, first column is IDs which is same with first file BUT unique number, I want to print remaining two columns of second file corresponding to first file IDs.
Example:
First file:
IDs
1
3
6
7
11
13
13
14
18
20
Second file:
IDs Freq Status
1 1 JD611
2 1 QD51
3 2
5
6
7 2
11 2
13 2
14 2
Desired OUTPUT
1 1 JD611
3 2
6
7 2
11 2
13 2
13 2
14 2
18
20
You can use this awk:
awk 'NR==FNR{a[$1]=$2 FS $3; next} {print $1, a[$1]}' f2 f1
To skip the header line,
awk 'FNR==1{next} NR==FNR{a[$1]=$2 FS $3; next} {print $1, a[$1]}' f2 f1
If second file has multiple columns,
awk 'NR==FNR{c=$1; $1=""; a[c]=$0; next} {print $1, a[$1]}' f2 f1

How to write a unix filter that outputs only a line every N lines

Suppose to feed the filter standard input with these line:
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
It would be nicer if someone tell me how to write a script that prints only every 4 lines, in the case of the example input above:
line 1
line 5
line 9
$ yes | cat -n | head -10 | awk 'NR % 4 == 1'
1 y
5 y
9 y
That is, your answer is "awk 'NR % 4 == 1'".
awk '{ if ((NR-1) %4 ==0) print}'
awk 'NR%4 == 1 {print}'</etc/hosts
Replace 4 by whatever value you want of course.
sed -ne '1~4p’
(GNU sed. Not tested on OSX, etc.)

Resources