Get Numeric Filename Suffix for Unix Split - unix

The Unix Split command produces filenames with the suffix aa to zz, there is no way to change the split command, the only way is to change the filenames afterward.
I would like to ask a way to change the suffix aa to zz into 001, 002, ...
Anyone could help?

typeset -i loopI
typeset -Z3 newSuf
typeset -i numFile
set -A suf aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az
numFile=`echo ${#suf[*]}`
numFile=$numFile-1
fileName=foo
split -100 $fileName.txt $fileName.
loopI=0
cont=y
while [ $cont = "y" ]; do
newSuf=`expr $loopI + 1`
mv $fileName.${suf[$loopI]} $fileName.$newSuf
loopI=$loopI+1
if [ $loopI -le $numFile ]; then
if [ ! -f $fileName.${suf[$loopI]} ]; then
cont=n
fi
else
cont=n
fi
done

Related

Cut specific columns and collapse with delimiter in Unix

Say I have 6 different columns in a text file (as shown below)
A1 B1 C1 D1 E1 F1
1 G PP GG HH GG
z T CC GG FF JJ
I would like to extract columns first, second and fourth columns as A1_B1_D1 collapsed together and the third column separated by tab.
So the result would be:
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
I tried
cut -f 1,2,4 -d$'\t' 3, but is just not what I want.
If you need to maintain your column alignment, you can check the length of the combination of fields 1, 2 and 4 and add one or two tab characters as necessary,
awk '{
printf (length($1"_"$2"_"$4) >= 8) ? "%s_%s_%s\t%s\n" : "%s_%s_%s\t\t%s\n",
$1,$2,$4,$3
}' file
Example Output
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
Could you please try following.
awk '
BEGIN{
OFS="\t"
}
{
print $1"_"$2"_"$4,$3
}
' Input_file
I've tried RavinderSingh13 code and it has the same output as mine but I don't quite know the difference, anyways, here it is:
awk -F ' ' '{print $1"_"$2"_"$4"\t"$3}' /path/to/file
This might work for you (GNU sed):
sed 's/^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+.*/\1_\2_\4\t\3/' -E file
Use pattern matching and back references.
\S+ means one or more non-white space characters.
\s+ means one or more white space characters.
\t represents a tab.
Another awk and using column -t for formatting.
$ cat cols_345.txt
A1 B1 C1 D1 E1 F1
1 G PP GG HH GG
z T CC GG FF JJ
$ awk -v OFS="_" '{ $3="\t"$3; print $1,$2,$4 $3 } ' cols_345.txt | column -t
A1_B1_D1 C1
1_G_GG PP
z_T_GG CC
$

Unix: How to group every two columns?

I have a file like:
id1 A B C T G A B C
id2 G V L P A J M T
and I would like to have:
id1 AB CT GA BC
id2 GV LP AJ MT
Any suggestion?
Simple sed approach:
sed 's/\([A-Z]\) \([A-Z]\)/\1\2/g' file
Or awk alternative:
awk '{ r=$1; for(i=2;i<=NF;i+=2) r=r FS $i$(i+1); print r }' file
The output (for both approaches):
id1 AB CT GA BC
id2 GV LP AJ MT
another sed
$ sed -r 's/\s(\w+)\s(\w+)/ \1\2/g' file
id1 AB CT GA BC
id2 GV LP AJ MT
works even if your id field has the same char set.
perl -np -e 's/ ([A-Z]) ([A-Z])/ $1$2/g' <FILENAME
Replace FILENAME with the name of the input file.
awk '{for(i=1;i<=NF;i++){printf (i%2)?$i" ":$i}printf RS}' file
If the number of the field is odd, print $i" ", else print $i

deleting repetitive columns in unix

I would like to delete multiple repetitive columns from a huge file (about 1 million).
The columns that I want to delete has the same column names: A and others has different unique name. Say:
A B2 A B3
1.1 AA 1.2 AA
2.1 AB 4.3 CT
2.2 AC 6.4 GT
so column headers are A, B2, A, B3,... .
How could I delete the columns named as A's from the data.
Another in awk:
$ awk '
NR==1 {
split($0,a)
for(i in a)
if(a[i]=="A")
delete a[i]
}
{
for(i=1;i<=NF;i++)
printf "%s",(i in a?$i OFS:"")
printf ORS
}' file
B2 B3
AA AA
AB CT
AC GT
I'm not sure I'm understanding your question correctly, but here an (GNU) awk solution to delete all duplicate columns (keeping only the first occurrence):
#!/usr/bin/awk -f
NR==1 {
seen[$1] = 1
cols[0] = 1
for (i=2; i<=NF; i++) {
if (!($i in seen)) {
seen[$i] = 1
cols[length(cols)] = i
}
}
}
{
for (i=0; i<length(cols); i++)
printf $(cols[i]) " "
printf "\n"
}
For the first line (NR==1), we find all non-duplicate columns (preserving the order), and for all the other lines, we just print out the columns (fields) we selected before (cols array holds column/field indexes we wish to keep).
$ ./filter.awk file
A B2 B3
1.1 AA AA
2.1 AB CT
2.2 AC GT
cut -d' ' -f $(head -1 filename|tr ' ' '\n'|awk '{if(!seen[$0]++) print NR}'|paste -s -d ',') filename
this will work like a charm.
The question is solved by the James Brown code.
I added
!/usr/bin/awk -f
to the first line of his code and correct tiny typo at the end of the code (simply additional -'- deleted).
I am sorry, I did not have time to try all other suggestions
with my best wishes

Compare size of each cell in Unix Scripting

I want to compare each cell size/length and change its content depending on its length.
The current table is of format
AB
CD
AB
AB
CD
155668/01
AB
1233/10
I want to replace the cells which have length more than "2" to DE.
Output
AB
CD
AB
AB
CD
DE
AB
DE
I tried
awk -F "," '{ if($(#1) > "2") {print "DE"} else {print $1 }}'
It says syntax error.
If I use wc -m in place if $(# the output is same is the input.
The easiest way is to use sed:
sed '/^..$/!s/.*/DE/' file
In awk, you could say:
awk '!/^..$/ { $0 = "DE" } 1' file
In both cases, the idea is the same: if the line does not consist of exactly two characters, replace the whole line with DE. In the case of sed, the whole line is .*, in the case of awk, it is $0.
Try this -
$ awk '{print (length($1)>2?"DE":$1)}' f
AB
CD
AB
AB
CD
DE
AB
DE
The idiomatic way would be:
awk 'length($1) > 2 { $1 = "DE" } 1'

grep a line with single number

I have a file with a few lines like this:
1 ab
11 ad
41 ac
1 af
1 ag
and I want the lines where the number is 1:
1 ab
1 af
1 ag
How can I achieve this?
If I write this:
grep "1" file.txt
then I get all the lines that contain 1, even if that's not the entire number:
1 ab
11 ad
41 ac
1 af
1 ag
The -w option tells grep to search for a pattern as a single word:
grep -w 1 file.txt
You can write:
grep '^1 ' file.txt
to get all lines that start with a 1 followed by a space. (The ^ means "start-of-line".)
grep -w ^1 file.txt
To get lines starting with a one.
This is probably very useful for regex with grep.
grep "^1[ \t]" file.txt
^ -> beginning of line
[ \t] -> whitespace after "1"

Resources