I have one (fairly large) file, formatted like such:
SET1
A B C D E F G
SET2
H I J K L M
SETX
(...)
etc.
I would prefer to have them
SET1 SET2 SETX
A H (...)
B I
C J
D K
E L
F M
G
Note that the columns are unequally long, and they are not ordered by size. My file is too big to use the column function inbuilt in unix, and attempts at getting cute by splicing the file and then pasting it together have had problematic results (that is, it has resulted in the empty columns getting the same content as the separator, which doesn't work for my purposes - they both ended up being "\t"). Note that each set may contain several hundred entries, and I have thousands of sets, making awk impractical (at least with my admittedly limited skills there).
Ideally, the output should be readable in R, but at this point I'd be very happy for something that is practically translatable into R input. Note that I can totally live with this having a non-whitespace separator if that is more practical.
Many thanks in advance for any help! Working in an external linux environment.
Edit:
I also have the file available as
SET1
A
B
C
D
E
F
G
SET2
H
I
J
K
L
M
If that could make it easier.
I guess this is more what you wanted:
awk -v OFS="\t"
'/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next}
NF{a[cols,++c]=$0}
END {
for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS
print ""
for (i=1; i<=max_recs; i++) {
for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS
print ""
}
}' file
For this given input:
SET1
B
C
D
E
F
G
SET2
H
I
J
K
L
M
AAA
SET3
A
B
C
D
It returns:
$ awk -v OFS="\t" '/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next} NF{a[cols,++c]=$0} END {for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS; print ""; for (i=1; i<=max_recs; i++) { for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS; print ""}}' file
SET1 SET2 SET3
B H A
C I B
D J C
E K D
F L
G M
AAA
Previous solution with just one block.
You can use paste to show files side by side.
In this case, let's use head and tail to the get half and half. Then, xargs to print one block of text per line. Then they are ready to be pasted:
paste -d"\t" <(head -2 file | xargs -n1) <(tail -2 file | xargs -n1)
For your given input it returns:
SET1 SET2
A H
B I
C J
D K
E L
F M
G
Related
Scenario: Give production rules for a RIGHT-recursive grammar that
describes the set of all non-empty strings made from the characters
R and N, which may contain arbitrarily many contiguous
repetitions of R, but precisely two or precisely three contiguous
repetitions of N.
Answer:
A -> N B | R+ A
B -> N D | N C | N ε
C -> N D | N ε
D -> R+ D | R ε
Incorrect:
A -> NNB | NNNB | RA | R
B -> R | RA | ε
edit: the above is not correct, I misunderstood the scenario.
Correct:
S -> RS | A
A -> NA | NB
B -> RB | RC
C -> NC | ND
D -> RD | RE | ε
E -> NE | NF
F -> RF | ε
How it works:
It starts with S, that can generate 0 or more R or move to A, which generates the first group of Ns. Then it moves to B, which generates the Rs between 1st and 2nd group of Ns. Then it move to C, which generates the 2nd group of Ns. Then it moves to D, which can generate 0 or more Rs and either finish or move to E, which generates the 3rd group of Ns. Lastly it moves to F, which generates 0 or more Rs.
This works just as well and is simpler:
S -> RS | A
A -> NA | NB
B -> RB | RC
C -> NC | ND
D -> RD | E
E -> NE | F
F -> RF | ε
It is the same up to D where instead of providing an ε option it provides an option to add another group of R's or go to E which is another group of N's, but this would not occur if there were no R's previously anyway as they would have been outputted as a conversion from C, and then another option to recursively add R's or an empty string.
Example parse tree generated from the input NRNR
S
\
A
/ \
N B
/ \
R C
/ \
N D
/ \
R D
\
E
\
F
\
ε
Given a file that contains something like this:
1 2 3 4
5 6 7 8
a b c d
e f g h
Is there any unix command that I could use to circular shift the rows and coluns?
I am looking for something like say,
circular_shift -r 2 <file> (shift row by 2) to give :
a b c d
e f g h
1 2 3 4
5 6 7 8
and
circular_shift -c 2 <file> (shift column by 2) to give :
3 4 1 2
7 8 5 6
c d a b
g h e f
Thanks!
Using awk for row shift processing the file twice:
$ awk -v r=2 'NR==FNR && FNR>r || NR>FNR && FNR<=r' file file
a b c d
e f g h
1 2 3 4
5 6 7 8
Basically it prints records where NR > r on the first go and NR <= r on the second.
Edit: Version regarding records and fields:
$ awk -v r=1 -v c=1 '
NR==FNR && FNR>r || NR>FNR && FNR<=r {
j=0;
for(i=c+1;++j<=NF;i=(i<NF?i+1:1)){
printf "%s%s",$i,(i==c?ORS:OFS)
}
}
' foo foo
6 7 8 5
b c d a
f g h e
2 3 4 1
(Pretty much untested as I'm in a meeting... it fails at least for c=0)
Another solution using multidimensional arrays in gawk
circular_shift.awk
{for(i=1; i<=NF; ++i){d[NR][i]=$i}}
END{
c=c%NF; r=r%NR
for(i=1; i<=NR; ++i){
nr = i + (i>r?0:NR) - r
for(j=1; j<=NF; ++j){
nc = j + (j>c?0:NF) - c
printf d[nr][nc] (j!=NF?OFS:RS)
}
}
}
awk -vr=2 -f circular_shift.awk file
a b c d
e f g h
1 2 3 4
5 6 7 8
awk -vc=2 -f circular_shift.awk file
3 4 1 2
7 8 5 6
c d a b
g h e f
awk -vr=2 -vc=2 -f circular_shift.awk file
c d a b
g h e f
3 4 1 2
7 8 5 6
Shifting Rows
You can use head, tail and the shell:
function circular_shift() {
n=$1
file=$2
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Call the function like this:
circular_shift 2 <file>
One restriction. The above function just works for n <= nlines(file). If you want to get rid of that restriction you need to know the length of the file in advance and use the modulo operator:
function circular_shift() {
n=$1
file=$2
len="$(wc -l "$file"|cut -d" " -f1)"
n=$((n%len))
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Now try to call:
circular_shift 6 <file>
Shifting Columns
For the column shift I would use awk:
column-shift.awk
{
n = n % NF
c = 1
for(i=NF-n+1; i<=NF; i++) {
a[c++] = $i
}
for(i=1; i<NF-n+1; i++) {
a[c++] = $i
}
for(i=1; i<c; i++) {
$i = a[i]
}
}
print
Wrap it in a shell function:
function column_shift() {
n="$1"
file="$2"
awk -v n="$n" -f column-shift.awk "$file"
}
#Vivek V K, Try:
For moving the rows to a number up-wards.
awk -vcount=2 'NR>count{print;next} NR<=count{Q=Q?Q ORS $0:$0} END{print Q}' Input_file
For shifting the fields, could you please try following:
awk -vcount=2 '{for(i=count+1;i<=NF;i++){Q=Q?Q FS $i:$i};for(j=1;j<=count;j++){P=P?P FS $j:$j};print Q FS P;Q=P=""}' Input_file
awk -v C=$1 -v R=$2 '
function PrintReverse () {
if( ! R ) return
for( i=1; i>=0; i--) {
for( j=1; j<=R; j++) {
#print "DEBUG:: i: "i " j:" j " i * R + j :" i * R + j " lr:" lr
print L[ i * R + j ]
L[ i * R + j ] = ""
}
}
}
{
if( C ) {
# Reverse Column
for ( i=1; i<=NF; i+=2*C) {
for( j=0; j<C; j++) {
#print "DEBUG:: i: "i " j:" j " NF:" NF
tmp = $(i+j)
$(i+j) = $(i+j+C)
$(i+j+C) = tmp
}
}
$1=$1
}
if ( R ) {
# Line buffer
lr = ( FNR - 1 ) % ( R * 2 ) + 1
L[ lr] = $0
}
else print
}
lr >= ( R * 2) { PrintReverse() }
END { if( lr < ( R * 2 )) PrintReverse() }
' YourFile
Will do both your reverse action
R is the number of row, C the number of column to reverse.
using 2 loop (1 loop inside another one) [not the fastest but the more explicit for understanding the concept ion this case)
this is a buffer permutation for lines by loading line in a buffer of twice the number of Row and print 2 half content in reverse order
this is a field swap for column permutation, it cycle by 2 * number of column swaping field content with field with index + number of column
Row are treated after the buffer is feeded (in fact each R * 2 lines)
column are treated at each line
i add a test ( C ), ( ! R ), ... to allow single reverse (Row only or Column only)
I am trying to add a counter to a specific string using unix, I have tried some sed and awk commands but I can't seem to do it properly.
My input file is:
Event_ A D L K
Event_ B P R
Event_ C F I
Event_ J K
M
N
O
Event_ Q S
X
Y
Z
G
T
What I'm hoping to get is:
Event_00000001 A D L K
Event_00000002 B P R
Event_00000003 C F I
Event_00000004 J K
M
N
O
Event_00000005 Q S
X
Y
Z
G
T
Can anyone help?
Use this awk:
awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' yourfile
If fields are delimited by \t(Tab),
awk -F"\t" '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' OFS='\t' yourfile
Test:
$ awk '/^Event/{$1=sprintf("%s%06d", $1,++counter)}1' file
Event_000001 A D L K
Event_000002 B P R
Event_000003 C F I
Event_000004 J K
M
N
O
Event_000005 Q S
X
Y
Z
G
T
Is it possible to have nested if without else statements. I wrote the following useless program to demonstrate nested ifs. How do I fix this so it's correct in terms of syntax. lines 5 and 6 gives errors.
let rec move_helper b sz r = match b with
[] -> r
|(h :: t) ->
if h = 0 then
if h - 1 = sz then h - 1 ::r
if h + 1 = sz then h + 1 ::r
else move_helper t sz r
;;
let move_pos b =
move_helper b 3 r
;;
let g = move_pos [0;8;7;6;5;4;3;2;1]
You can't have if without else unless the result of the expression is of type unit. This isn't the case for your code, so it's not possible.
Here's an example where the result is unit:
let f x =
if x land 1 <> 0 then print_string "1";
if x land 2 <> 0 then print_string "2";
if x land 4 <> 0 then print_string "4"
You must understand that if ... then is an expression like any other. If no else is present, it must be understood as if ... then ... else () and thus has type unit. To emphasize the fact that it is an expression, suppose you have two functions f and g of type, say, int → int. You can write
(if test then f else g) 1
You must also understand that x :: r does not change r at all, it constructs a new list putting x in front of r (the tail of this list is shared with the list r). In your case, the logic is not clear: what is the result when h=0 but the two if fail?
let rec move_helper b sz r = match b with
| [] -> r
| h :: t ->
if h = 0 then
if h - 1 = sz then (h - 1) :: r
else if h + 1 = sz then (h + 1) :: r
else (* What do you want to return here? *)
else move_helper t sz r
When you have a if, always put an else. Because when you don't put an else, Java will not know if the case is true or false.
I have several files that look like these, e.g. test.in:
apple foo bar
hello world
I need to achieve this desired output, a space after every character:
a p p l e f o o b a r
h e l l o w o r l d
I though possibly i'll first remove all spaces and then add spaces to each character, as such:
sed 's/\s//g' test.in | sed -e 's/\(.\)/\1 /g'
but is there other ways?
This awk may do:
awk -v FS="" '{gsub(/ /,"");$1=$1}1' file
a p p l e f o o b a r
h e l l o w o r l d
This first remove all space, then since FS (Field Separator) is set to nothing, the $1=$1 reconstruct all fields with one space.
This does not add space at the end as most of the other sed and perl command here.
Or based on sed posted here.
awk '{gsub(/ /,"");gsub(/./,"& ")}1' file
a p p l e f o o b a r
h e l l o w o r l d
You can combine your two sed commands into a single command instead:
$ sed 's/\s//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Note the use of . and & instead of \(.\) and \1.
On systems that do not support \s to designate matching whitespace, you can use [[::blank::]] instead:
$ sed 's/[[:blank:]]//g;s/./& /g' test.in
a p p l e f o o b a r
h e l l o w o r l d
Through perl,
$ perl -ple 's/([^ ]|^)(?! )/\1 /g' file
a p p l e f o o b a r
h e l l o w o r l d
Add an inline edit option -i to save the changes made,
perl -i -ple 's/([^ ]|^)(?! )/\1 /g' file
sed 's/ //g;s/./& /g' filename
&: refers to that portion of the pattern space which matched
Or maybe something like this with sed :
$ sed 's/./& /g;s/ //g' file
a p p l e f o o b a r
h e l l o w o r l d
This might work for you (GNU sed):
sed 's/\B/ /g' file