Unix: How to group every two columns? - unix

I have a file like:
id1 A B C T G A B C
id2 G V L P A J M T
and I would like to have:
id1 AB CT GA BC
id2 GV LP AJ MT
Any suggestion?

Simple sed approach:
sed 's/\([A-Z]\) \([A-Z]\)/\1\2/g' file
Or awk alternative:
awk '{ r=$1; for(i=2;i<=NF;i+=2) r=r FS $i$(i+1); print r }' file
The output (for both approaches):
id1 AB CT GA BC
id2 GV LP AJ MT

another sed
$ sed -r 's/\s(\w+)\s(\w+)/ \1\2/g' file
id1 AB CT GA BC
id2 GV LP AJ MT
works even if your id field has the same char set.

perl -np -e 's/ ([A-Z]) ([A-Z])/ $1$2/g' <FILENAME
Replace FILENAME with the name of the input file.

awk '{for(i=1;i<=NF;i++){printf (i%2)?$i" ":$i}printf RS}' file
If the number of the field is odd, print $i" ", else print $i

Related

Get Numeric Filename Suffix for Unix Split

The Unix Split command produces filenames with the suffix aa to zz, there is no way to change the split command, the only way is to change the filenames afterward.
I would like to ask a way to change the suffix aa to zz into 001, 002, ...
Anyone could help?
typeset -i loopI
typeset -Z3 newSuf
typeset -i numFile
set -A suf aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az
numFile=`echo ${#suf[*]}`
numFile=$numFile-1
fileName=foo
split -100 $fileName.txt $fileName.
loopI=0
cont=y
while [ $cont = "y" ]; do
newSuf=`expr $loopI + 1`
mv $fileName.${suf[$loopI]} $fileName.$newSuf
loopI=$loopI+1
if [ $loopI -le $numFile ]; then
if [ ! -f $fileName.${suf[$loopI]} ]; then
cont=n
fi
else
cont=n
fi
done

Cut specific columns and collapse with delimiter in Unix

Say I have 6 different columns in a text file (as shown below)
A1 B1 C1 D1 E1 F1
1 G PP GG HH GG
z T CC GG FF JJ
I would like to extract columns first, second and fourth columns as A1_B1_D1 collapsed together and the third column separated by tab.
So the result would be:
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
I tried
cut -f 1,2,4 -d$'\t' 3, but is just not what I want.
If you need to maintain your column alignment, you can check the length of the combination of fields 1, 2 and 4 and add one or two tab characters as necessary,
awk '{
printf (length($1"_"$2"_"$4) >= 8) ? "%s_%s_%s\t%s\n" : "%s_%s_%s\t\t%s\n",
$1,$2,$4,$3
}' file
Example Output
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
Could you please try following.
awk '
BEGIN{
OFS="\t"
}
{
print $1"_"$2"_"$4,$3
}
' Input_file
I've tried RavinderSingh13 code and it has the same output as mine but I don't quite know the difference, anyways, here it is:
awk -F ' ' '{print $1"_"$2"_"$4"\t"$3}' /path/to/file
This might work for you (GNU sed):
sed 's/^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+.*/\1_\2_\4\t\3/' -E file
Use pattern matching and back references.
\S+ means one or more non-white space characters.
\s+ means one or more white space characters.
\t represents a tab.
Another awk and using column -t for formatting.
$ cat cols_345.txt
A1 B1 C1 D1 E1 F1
1 G PP GG HH GG
z T CC GG FF JJ
$ awk -v OFS="_" '{ $3="\t"$3; print $1,$2,$4 $3 } ' cols_345.txt | column -t
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
$

Compare size of each cell in Unix Scripting

I want to compare each cell size/length and change its content depending on its length.
The current table is of format
AB
CD
AB
AB
CD
155668/01
AB
1233/10
I want to replace the cells which have length more than "2" to DE.
Output
AB
CD
AB
AB
CD
DE
AB
DE
I tried
awk -F "," '{ if($(#1) > "2") {print "DE"} else {print $1 }}'
It says syntax error.
If I use wc -m in place if $(# the output is same is the input.
The easiest way is to use sed:
sed '/^..$/!s/.*/DE/' file
In awk, you could say:
awk '!/^..$/ { $0 = "DE" } 1' file
In both cases, the idea is the same: if the line does not consist of exactly two characters, replace the whole line with DE. In the case of sed, the whole line is .*, in the case of awk, it is $0.
Try this -
$ awk '{print (length($1)>2?"DE":$1)}' f
AB
CD
AB
AB
CD
DE
AB
DE
The idiomatic way would be:
awk 'length($1) > 2 { $1 = "DE" } 1'

Drop or remove column using awk

I wanted to drop first 3 column;
This is my data;
DETAIL 02032017
Name Gender State School Class
A M Melaka SS D
B M Johor BB E
C F Pahang AA F
EOF 3
I want my data like this:
DETAIL 02032017
School Class
SS D
BB E
AA F
EOF 3
This is my current command that I get mycommandoutput:
awk -v date="$(date +"%d%m%Y")" -F\| 'NR==1 {h=$0; next}
{file="TEST_"$1"_"$2"_"date".csv";
print (a[file]++?"": "DETAIL"date"" ORS h ORS) $0 > file} END{for(file in a) print "EOF " a[file] > file}' testing.csv
Can anyone help me?
Thank you :)
I want to remove first three column
If you just want to remove the first three columns, you can just set them to empty strings, leaving alone those that don't have three columns, something like:
awk 'NF>=3 {$1=""; $2=""; $3=""; print; next}{print}'
That has the potentially annoying habit of still having the field separators between those empty fields but, since modifying columns will reformat the line anyway, I assume that's okay:
DETAIL 02032017
School Class
SS D
BB E
AA F
EOF 3
If awk is the only tool being used to process them, the spacing won't matter. If you do want to preserve formatting (meaning that the columns are at very specific locations on the line), you can just get a substring of the entire line:
awk '{if (NF>=3) {$0 = substr($0,25)}; print}'
Since that doesn't modify individual fields, it won't trigger a recalculation of the line that would change its format:
DETAIL 02032017
School Class
SS D
BB E
AA F
EOF 3

How to produce the same string for x number of lines and paste 2 files? unix

How to produce the same string for x number of lines and then use paste to combine the files:
I have a file as such with unknown number of lines, e.g.:
$ echo -e "a\tb\tc\nd\te\tf" > in.txt
$ cat in.txt
a b c
d e f
I want to concat the files with a new column that has the same string for every row. I have tried using echo to create a file and then using paste to concat the columns but i have to know the number of rows in in.txt first and then create a in2.txt using echo.
$ echo -e "a\tb\tc\nd\te\tf" > in.txt
$ cat in.txt
a b c
d e f
$ echo -e "x\nx\n" > in2.txt
$ paste in.txt in2.txt
a b c x
d e f x
How else can I achieve the same output for an unknown number of lines in in.txt?, e.g.
[in:]
a b c
d e f
[out:]
a b c x
d e f x
My data consist of a million lines with 3 columns in in.txt of 50-200 chars for each line, so solution needs to keep the "big" data size in mind.
One way with join:
echo | join input - -a 1 -o "1.1 1.2 1.3 2.1" -e x
Though just doing a sed replace should be much better.

Resources